NbAiLab Norwegian Qwen3-ASR-0.6B (Beta Test Model)
⚠️ CONFIDENTIALITY NOTICE: This model is provided only to approved beta testers. Please do not redistribute the model weights or this documentation. Please delete the model from your systems once the beta-testing period has ended.
Overview
This is an early beta test release of a Norwegian ASR model from NbAiLab, based on Qwen3-ASR-0.6B.
The purpose of this release is primarily to verify:
- that testers are able to load and run the model successfully,
- that the inference stack works in real environments,
- and that deployment, batching, and serving pipelines behave as expected.
This is not a final production release, and transcription quality may still show clear weaknesses or odd formatting behavior.
Important Beta Notes
Because this is an early beta build, you should expect issues such as:
- formatting irregularities,
- unstable behavior on out-of-domain audio,
- recognition errors on difficult accents, noisy speech, or unusual content,
- and possible runtime or environment issues depending on your setup.
At this stage, the main goal is to confirm that the model can be run successfully and that the surrounding inference pipeline works reliably.
Recommended Usage
This model is intended to be used with the qwen-asr package, following the official Qwen3-ASR interface.
The qwen-asr package provides:
- a transformers backend
- a vLLM backend
Installation
We recommend using a fresh Python environment.
Minimal install (transformers backend)
pip install -U qwen-asr
vLLM backend
pip install -U "qwen-asr[vllm]"
Optional: FlashAttention 2
For lower memory use and better speed on supported GPUs:
pip install -U flash-attn --no-build-isolation
If your machine has limited RAM:
MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
Quick Start: Transformers Backend
The recommended way to run this model with the transformers backend is through qwen-asr.
import torch
from qwen_asr import Qwen3ASRModel
model = Qwen3ASRModel.from_pretrained(
"NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised",
dtype=torch.bfloat16,
device_map="cuda:0",
# attn_implementation="flash_attention_2",
max_inference_batch_size=32,
max_new_tokens=256,
)
results = model.transcribe(
audio="path/to/your_audio.wav",
language=None, # or set explicitly, e.g. "Norwegian"
)
print(results[0].language)
print(results[0].text)
Notes
audiocan typically be a local path, URL, base64 input, or waveform tuple depending on backend support.- Setting
language=Noneenables automatic language detection. - For forced language decoding, you can set
language="Norwegian"if that works well in your environment.
Quick Start: vLLM Backend
For faster inference and serving, use the vLLM backend:
from qwen_asr import Qwen3ASRModel
if __name__ == "__main__":
model = Qwen3ASRModel.LLM(
model="NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised",
gpu_memory_utilization=0.7,
max_inference_batch_size=128,
max_new_tokens=1024,
)
results = model.transcribe(
audio="path/to/your_audio.wav",
language=None,
)
print(results[0].language)
print(results[0].text)
Serving
You can launch an OpenAI-compatible serving endpoint with:
qwen-asr-serve NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised \
--gpu-memory-utilization 0.8 \
--host 0.0.0.0 \
--port 8000
Depending on your installed stack and version, standard vllm serve may also work.
Web Demo
If you want to test the model in a local web UI:
qwen-asr-demo \
--asr-checkpoint NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised \
--backend transformers \
--cuda-visible-devices 0 \
--ip 0.0.0.0 \
--port 8000
Then open:
http://<your-ip>:8000
Feedback Requested
During this beta phase, feedback is especially useful on:
- whether the model loads successfully,
- environment and installation problems,
- CUDA / OOM issues,
- inference crashes,
- batching or serving problems,
- and general compatibility with your pipeline.
If possible, please report:
- GPU type
- Python version
- package versions
- whether you used transformers backend or vLLM backend
- approximate audio duration
- error logs or stack traces
Intended Scope
This beta release is meant for technical evaluation only.
Please do not treat current results as final quality estimates for the model. Recognition behavior may change substantially in later versions.
Acknowledgements
This model is based on the open Qwen3-ASR framework and adapted by NbAiLab for Norwegian ASR experimentation and beta testing.
- Downloads last month
- 53