You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NbAiLab Norwegian Qwen3-ASR-0.6B (Beta Test Model)

⚠️ CONFIDENTIALITY NOTICE: This model is provided only to approved beta testers. Please do not redistribute the model weights or this documentation. Please delete the model from your systems once the beta-testing period has ended.


Overview

This is an early beta test release of a Norwegian ASR model from NbAiLab, based on Qwen3-ASR-0.6B.

The purpose of this release is primarily to verify:

  • that testers are able to load and run the model successfully,
  • that the inference stack works in real environments,
  • and that deployment, batching, and serving pipelines behave as expected.

This is not a final production release, and transcription quality may still show clear weaknesses or odd formatting behavior.


Important Beta Notes

Because this is an early beta build, you should expect issues such as:

  • formatting irregularities,
  • unstable behavior on out-of-domain audio,
  • recognition errors on difficult accents, noisy speech, or unusual content,
  • and possible runtime or environment issues depending on your setup.

At this stage, the main goal is to confirm that the model can be run successfully and that the surrounding inference pipeline works reliably.


Recommended Usage

This model is intended to be used with the qwen-asr package, following the official Qwen3-ASR interface.

The qwen-asr package provides:

  • a transformers backend
  • a vLLM backend

Installation

We recommend using a fresh Python environment.

Minimal install (transformers backend)

pip install -U qwen-asr

vLLM backend

pip install -U "qwen-asr[vllm]"

Optional: FlashAttention 2

For lower memory use and better speed on supported GPUs:

pip install -U flash-attn --no-build-isolation

If your machine has limited RAM:

MAX_JOBS=4 pip install -U flash-attn --no-build-isolation

Quick Start: Transformers Backend

The recommended way to run this model with the transformers backend is through qwen-asr.

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    # attn_implementation="flash_attention_2",
    max_inference_batch_size=32,
    max_new_tokens=256,
)

results = model.transcribe(
    audio="path/to/your_audio.wav",
    language=None,  # or set explicitly, e.g. "Norwegian"
)

print(results[0].language)
print(results[0].text)

Notes

  • audio can typically be a local path, URL, base64 input, or waveform tuple depending on backend support.
  • Setting language=None enables automatic language detection.
  • For forced language decoding, you can set language="Norwegian" if that works well in your environment.

Quick Start: vLLM Backend

For faster inference and serving, use the vLLM backend:

from qwen_asr import Qwen3ASRModel

if __name__ == "__main__":
    model = Qwen3ASRModel.LLM(
        model="NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised",
        gpu_memory_utilization=0.7,
        max_inference_batch_size=128,
        max_new_tokens=1024,
    )

    results = model.transcribe(
        audio="path/to/your_audio.wav",
        language=None,
    )

    print(results[0].language)
    print(results[0].text)

Serving

You can launch an OpenAI-compatible serving endpoint with:

qwen-asr-serve NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised \
  --gpu-memory-utilization 0.8 \
  --host 0.0.0.0 \
  --port 8000

Depending on your installed stack and version, standard vllm serve may also work.


Web Demo

If you want to test the model in a local web UI:

qwen-asr-demo \
  --asr-checkpoint NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised \
  --backend transformers \
  --cuda-visible-devices 0 \
  --ip 0.0.0.0 \
  --port 8000

Then open:

http://<your-ip>:8000


Feedback Requested

During this beta phase, feedback is especially useful on:

  • whether the model loads successfully,
  • environment and installation problems,
  • CUDA / OOM issues,
  • inference crashes,
  • batching or serving problems,
  • and general compatibility with your pipeline.

If possible, please report:

  • GPU type
  • Python version
  • package versions
  • whether you used transformers backend or vLLM backend
  • approximate audio duration
  • error logs or stack traces

Intended Scope

This beta release is meant for technical evaluation only.

Please do not treat current results as final quality estimates for the model. Recognition behavior may change substantially in later versions.


Acknowledgements

This model is based on the open Qwen3-ASR framework and adapted by NbAiLab for Norwegian ASR experimentation and beta testing.

Downloads last month
53
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NbAiLab/nb-asr-beta1-Qwen06B-reading-optimised