DJ-AI ASR Grammar Corrector (T5-Small) STREAMING
A lightweight grammar correction model fine-tuned from DJ-AI ASR Grammar Corrector (T5-Small), specifically designed for streaming context to correct common errors in automatic speech recognition (ASR) outputs under low latency / sliding window streaming conditions, including homophones, verb tense issues, contractions, duplicated words, and more. Optimized for fast inference in (near) real-time ASR pipelines.
NOTE: This model is fine-tuned from DJ-AI ASR Grammar Corrector (T5-Small) with an additional 500,000 noisy/clean pairs. These pairs consist of typical streaming artifacts such as misaligned tokens in sliding window streaming, cut off tokens, duplicated tokens. Examples: A stream of audio chunks: where "This is a test to see how well this works" is spoken Chunk 1: "this is a t" Chunk 2: "is is a test" Chunk 3: "t to see how" Chunk 4: "well this work" Chunk 5: "works" Can be corrected in near real-time, expecting output like this (depending on your streaming and sliding window setup) "This is a t" "This is a test" "This is a test to see how" "This is a test to see how well this work" "This is a test to see how well this works"
In terms of general grammar correction the model does not perform as well as the small and base models, but performs better in streaming correction. All models are designed to support streaming inference, at low latency, but this model is designed to be more accurate on overlapping tokens in sliding windows.
Model Details
- Base model: DJ-AI ASR Grammar Corrector (T5-Small)
- Fine-tuned on: 90 million synthetic (noisy β clean) sentence pairs + 500,000 real streaming pairs (noisy -> clean)
- Training objective: Correct ASR-style transcription errors into clean, grammatical English
- Token count: ~60 million tokens per epoch
- Framework: Hugging Face Transformers + PyTorch
Benchmark Results
| Model | Type | Precision | Latency (s/sample) | VRAM (MB) | BLEU | ROUGE-L | Accuracy (%)ΒΉ | Token Accuracy (%)Β² | Size (MB) |
|---|---|---|---|---|---|---|---|---|---|
| dj-ai-asr-grammar-corrector-t5-base | HF | fp32 | 0.1151 | 24.98 | 78.92 | 90.31 | 44.62 | 90.39 | 5956.76 |
| dj-ai-asr-grammar-corrector-t5-small | HF | fp32 | 0.0648 | 6.27 | 76.47 | 89.54 | 39.59 | 88.76 | 1620.15 |
| dj-ai-asr-grammar-corrector-t5-small-streaming | HF | fp32 | 0.0634 | 14.77 | 76.25 | 89.61 | 39.9 | 88.54 | 1620.65 |
- Accuracy is a measure of how well the model performs across the full sentence. That is, a prediction is only counted as "correct" if the entire corrected sentence exactly matches the reference sentence. So if the model corrects 1 out of 2 errors, but the final output does not exactly match the expected sentence, it's counted as a fail.
- Token Accuracy is a measure of how well the model performs at the token level.
Intended Use
| Use Case | β Supported | π« Not Recommended |
|---|---|---|
| Post-ASR correction | β Yes | |
| Real-time ASR pipelines | β Yes | |
| Batch transcript cleanup | β Yes | |
| Grammar education tools | β Yes | |
| Formal document editing | π« | Model may be too informal |
| Multilingual input | π« | English-only fine-tuning |
Corrects Common Streaming ASR Errors:
- Homophone mistakes (
theirβthey're) - Subject-verb disagreement (
he goβhe goes) - Verb tense corruption (
i seenβi saw) - Missing auxiliaries (
you goingβare you going) - Contraction normalization (
she is notβshe isn't) - Repeated words (
i i wantβi want) - Misused articles/prepositions/pronouns
- Misaligned sliding window tokens
- Overlapping tokens
Example
DEMO: https://huggingface.co/spaces/dayyanj/dj-ai-asr-grammar-corrector-demo Input (noisy ASR): Git Repository: https://github.com/dayyanj/DJ-AI-ASR-GRAMMAR-CORRECTOR
- Downloads last month
- 6
Model tree for dayyanj/dj-ai-asr-grammar-corrector-small-streaming
Base model
google-t5/t5-small