Audio Models
Collection
11 items
•
Updated
OpenAI Whisper on Axera
目前支持 C++ 和 Python 两种语言
预编译模型下载
如需自行转换请参考模型转换
apt install, pip install 等指令推荐在板上安装Miniconda管理虚拟环境,安装方法如下:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
source ~/miniconda3/bin/activate
conda init --all
安装Whisper依赖
cd python
conda create -n whisper python=3.12
conda activate whisper
pip3 install -r requirements.txt
参考 https://github.com/AXERA-TECH/pyaxengine 安装 NPU Python API
在0.1.3rc2上测试通过,可通过
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
安装,或把版本号更改为你想使用的版本
登陆开发板后
输入命令
cd python
conda activate whisper
python3 main.py --model_type small --model_path ../models-ax650 --wav ../demo.wav --language zh
输出结果
root@ax650:/mnt/qtang/whisper.axera/python# python3 main.py --wav ../demo.wav --model_type small --model_path ../models/ --language zh
[INFO] Available providers: ['AxEngineExecutionProvider']
wav: ../demo.wav
model_type: small
model_path: ../models/
language: zh
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.10.1s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
Load models take 2322.563409805298ms
Preprocess wav take 6971.68493270874ms
Run encoder take 211.52877807617188ms
Run decoder_main take 79.00094985961914ms
First token: 17556
Run decoder_loop take 101.91774368286133ms
Iter 0 Token: 20844
Run decoder_loop take 60.30416488647461ms
Iter 1 Token: 7781
Run decoder_loop take 60.22000312805176ms
Iter 2 Token: 20204
Run decoder_loop take 60.23716926574707ms
Iter 3 Token: 28455
Run decoder_loop take 60.214996337890625ms
Iter 4 Token: 31962
Run decoder_loop take 60.17565727233887ms
Iter 5 Token: 6336
Run decoder_loop take 60.94002723693848ms
Iter 6 Token: 254
Run decoder_loop take 60.71639060974121ms
Iter 7 Token: 2930
Run decoder_loop take 60.225725173950195ms
Iter 8 Token: 236
Run decoder_loop take 60.167789459228516ms
Iter 9 Token: 36135
Run decoder_loop take 60.29987335205078ms
Iter 10 Token: 15868
Run decoder_loop take 61.163902282714844ms
Iter 11 Token: 252
Run decoder_loop take 60.273170471191406ms
Iter 12 Token: 1546
Run decoder_loop take 60.23144721984863ms
Iter 13 Token: 46514
Run decoder_loop take 60.31966209411621ms
Iter 14 Token: 50257
Result: 甚至出现交易几乎停滞的情况
运行参数说明:
| 参数名称 | 说明 | 默认值 |
|---|---|---|
| --wav | 输入音频文件 | |
| --model_type/-t | 模型类型, tiny/base/small | |
| --model_path/-p | 模型所在目录 | ../models |
| --language/-l | 识别语言 | zh |
在 AX650N 设备上执行
cd cpp
./whisper -w ../demo.wav
或
cd cpp
./whisper --model_type small --model_path ../models -w ../demo.wav
输出结果
root@ax650:/mnt/qtang/whisper.axera/cpp# ./install/whisper --wav ../demo.wav --model_type small --model_path ../models/ --language zh
wav_file: ../demo.wav
model_path: ../models/
model_type: small
language: zh
Encoder run take 188.30 ms
First token: 17556 take 81.88ms
Next Token: 20844 take 29.64ms
Next Token: 7781 take 29.70ms
Next Token: 20204 take 29.64ms
Next Token: 28455 take 29.65ms
Next Token: 31962 take 29.61ms
Next Token: 6336 take 29.67ms
Next Token: 254 take 29.63ms
Next Token: 2930 take 29.61ms
Next Token: 236 take 29.56ms
Next Token: 36135 take 29.64ms
Next Token: 15868 take 29.71ms
Next Token: 252 take 29.51ms
Next Token: 1546 take 29.63ms
Next Token: 46514 take 29.51ms
Next Token: 50257 take 29.69ms
All take 801.13 ms
Result: 甚至出现交易几乎停滞的情况
cd cpp
./whisper_srv --model_type tiny --model_path ../models-ax650 --language zh --port 8080
curl命令行测试(请自行替换IP和端口):
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
-H "Content-Type: application/octet-stream" \
--data-binary @-
RTF: Real-Time Factor
CPP:
| Models | AX650N | AX630C |
|---|---|---|
| Whisper-Tiny | 0.08 | |
| Whisper-Base | 0.11 | 0.35 |
| Whisper-Small | 0.24 | |
| Whisper-Turbo | 0.48 |
Python:
| Models | AX650N | AX630C |
|---|---|---|
| Whisper-Tiny | 0.12 | |
| Whisper-Base | 0.16 | 0.35 |
| Whisper-Small | 0.50 | |
| Whisper-Turbo | 0.60 |
| Models | AX650N | AX630C |
|---|---|---|
| Whisper-Tiny | 0.24 | |
| Whisper-Base | 0.18 | |
| Whisper-Small | 0.11 | |
| Whisper-Turbo | 0.06 |
若要复现测试结果,请按照以下步骤:
解压数据集:
unzip datasets.zip
运行测试脚本:
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../datasets/ground_truth.txt --model_type tiny
Python:
| Models | CMM(MB) | OS(MB) |
|---|---|---|
| Whisper-Tiny | 332 | 512 |
| Whisper-Base | 533 | 644 |
| Whisper-Small | 1106 | 906 |
| Whisper-Turbo | 2065 | 2084 |
C++:
| Models | CMM(MB) | OS(MB) |
|---|---|---|
| Whisper-Tiny | 332 | 31 |
| Whisper-Base | 533 | 54 |
| Whisper-Small | 1106 | 146 |
| Whisper-Turbo | 2065 | 86 |