Russian Turn Detection Model (Qwen/Qwen3-0.3B Fine-Tuned)

This model is a specialized End-of-Utterance (EOU) / Turn Detection model designed for Russian spoken dialogue systems. It is fine-tuned from Qwen/Qwen/Qwen/Qwen3-0.6B to classify whether a user has finished speaking or is pausing mid-sentence.

It is optimized for real-time voice agents (like those using LiveKit) to minimize interruptions and reduce latency in conversational flows.

🎯 Model Capabilities

  • Task: Classifies text input as either COMPLETE (user finished) or CONTINUE (user is thinking/pausing).
  • Language: Russian (primary), handles mixed English/Russian technical terms.
  • Latency: Extremely fast inference (based on 0.5B parameter model), suitable for edge or cloud deployment.
  • Nuance: Correctly handles hesitation markers (e.g., "ну...", "эээ...", "как бы") as CONTINUE.

💻 Usage

Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "RAS1981/qwen3-turn-detector-merged"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def predict_turn(text):
    messages = [
        {"role": "system", "content": "Ты голосовой ассистент. Определяй, закончил ли пользователь говорить."},
        {"role": "user", "content": text}
    ]
    
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(device)

    outputs = model.generate(
        inputs, 
        max_new_tokens=2, 
        use_cache=True, 
        pad_token_id=tokenizer.eos_token_id
    )
    
    decoded = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
    return decoded.strip()

# Test Cases
print(predict_turn("Привет, я хочу заказать пиццу"))  # Output: COMPLETE
print(predict_turn("Ну я думаю что может быть..."))   # Output: CONTINUE

📊 Training Details

Dataset

  • Source: Custom dataset generated via Gemini 2.5 Flash Lite based on IlyaGusev/ru_turbo_alpaca and ss-corpus-ru.
  • Preprocessing:
    • Converted formal text to spoken Russian (added hesitations, fillers, self-corrections).
    • Normalized using NFKC and lowercased specific punctuation.
    • Balanced 50/50 split between COMPLETE and CONTINUE labels to prevent bias.
  • Size: ~400 high-quality curated examples (incremental training).

Hyperparameters

  • Framework: Unsloth + TRL (SFTTrainer)
  • Quantization: 4-bit (QLoRA)
  • Learning Rate: 2e-4
  • Epochs: 62 (Early stopping based on loss convergence)
  • Final Loss: ~0.086
  • Optimizer: AdamW 8-bit

⚠️ Limitations

  • Context: The model looks at the current utterance context. Extremely long pauses in audio might still need VAD (Voice Activity Detection) support.
  • Domain: Fine-tuned on general conversation and real-estate inquiries; may need adaptation for highly specific medical or legal jargon.

🛠 Intended Use

  • LiveKit Agents: Use as a semantic turn detector in the EOU plugin.
  • Customer Support Bots: Prevent the bot from interrupting users while they think.
  • Voice Assistants: Improve natural flow in Russian dialogue.
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RAS1981/qwen3-turn-detector-merged

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(947)
this model
Quantizations
1 model