YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

DJ-AI ASR Grammar Corrector (T5-Small) STREAMING

A lightweight grammar correction model fine-tuned from DJ-AI ASR Grammar Corrector (T5-Small), specifically designed for streaming context to correct common errors in automatic speech recognition (ASR) outputs under low latency / sliding window streaming conditions, including homophones, verb tense issues, contractions, duplicated words, and more. Optimized for fast inference in (near) real-time ASR pipelines.

NOTE: This model is fine-tuned from DJ-AI ASR Grammar Corrector (T5-Small) with an additional 500,000 noisy/clean pairs. These pairs consist of typical streaming artifacts such as misaligned tokens in sliding window streaming, cut off tokens, duplicated tokens. Examples: A stream of audio chunks: where "This is a test to see how well this works" is spoken Chunk 1: "this is a t" Chunk 2: "is is a test" Chunk 3: "t to see how" Chunk 4: "well this work" Chunk 5: "works" Can be corrected in near real-time, expecting output like this (depending on your streaming and sliding window setup) "This is a t" "This is a test" "This is a test to see how" "This is a test to see how well this work" "This is a test to see how well this works"

In terms of general grammar correction the model does not perform as well as the small and base models, but performs better in streaming correction. All models are designed to support streaming inference, at low latency, but this model is designed to be more accurate on overlapping tokens in sliding windows.

Model Details

Base model: DJ-AI ASR Grammar Corrector (T5-Small)
Fine-tuned on: 90 million synthetic (noisy → clean) sentence pairs + 500,000 real streaming pairs (noisy -> clean)
Training objective: Correct ASR-style transcription errors into clean, grammatical English
Token count: ~60 million tokens per epoch
Framework: Hugging Face Transformers + PyTorch

Benchmark Results

Model	Type	Precision	Latency (s/sample)	VRAM (MB)	BLEU	ROUGE-L	Accuracy (%)¹	Token Accuracy (%)²	Size (MB)
dj-ai-asr-grammar-corrector-t5-base	HF	fp32	0.1151	24.98	78.92	90.31	44.62	90.39	5956.76
dj-ai-asr-grammar-corrector-t5-small	HF	fp32	0.0648	6.27	76.47	89.54	39.59	88.76	1620.15
dj-ai-asr-grammar-corrector-t5-small-streaming	HF	fp32	0.0634	14.77	76.25	89.61	39.9	88.54	1620.65

Accuracy is a measure of how well the model performs across the full sentence. That is, a prediction is only counted as "correct" if the entire corrected sentence exactly matches the reference sentence. So if the model corrects 1 out of 2 errors, but the final output does not exactly match the expected sentence, it's counted as a fail.
Token Accuracy is a measure of how well the model performs at the token level. $\text{Token Accuracy (\%)} = \left( \frac{\text{Number of Matched Tokens}}{\text{Total Reference Tokens}} \right) \times 100$

Intended Use

Use Case	✅ Supported	🚫 Not Recommended
Post-ASR correction	✅ Yes
Real-time ASR pipelines	✅ Yes
Batch transcript cleanup	✅ Yes
Grammar education tools	✅ Yes
Formal document editing	🚫	Model may be too informal
Multilingual input	🚫	English-only fine-tuning

Corrects Common Streaming ASR Errors:

Homophone mistakes (their → they're)
Subject-verb disagreement (he go → he goes)
Verb tense corruption (i seen → i saw)
Missing auxiliaries (you going → are you going)
Contraction normalization (she is not → she isn't)
Repeated words (i i want → i want)
Misused articles/prepositions/pronouns
Misaligned sliding window tokens
Overlapping tokens

Example

DEMO: https://huggingface.co/spaces/dayyanj/dj-ai-asr-grammar-corrector-demo Input (noisy ASR): Git Repository: https://github.com/dayyanj/DJ-AI-ASR-GRAMMAR-CORRECTOR

Downloads last month: 6

Safetensors

Model size

60.5M params

Tensor type

F32

Model tree for dayyanj/dj-ai-asr-grammar-corrector-small-streaming

Base model

google-t5/t5-small

Finetuned

(2212)

this model

dayyanj
/

dj-ai-asr-grammar-corrector-small-streaming