YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

DJ-AI ASR Grammar Corrector (T5-Small) STREAMING

A lightweight grammar correction model fine-tuned from DJ-AI ASR Grammar Corrector (T5-Small), specifically designed for streaming context to correct common errors in automatic speech recognition (ASR) outputs under low latency / sliding window streaming conditions, including homophones, verb tense issues, contractions, duplicated words, and more. Optimized for fast inference in (near) real-time ASR pipelines.

NOTE: This model is fine-tuned from DJ-AI ASR Grammar Corrector (T5-Small) with an additional 500,000 noisy/clean pairs. These pairs consist of typical streaming artifacts such as misaligned tokens in sliding window streaming, cut off tokens, duplicated tokens. Examples: A stream of audio chunks: where "This is a test to see how well this works" is spoken Chunk 1: "this is a t" Chunk 2: "is is a test" Chunk 3: "t to see how" Chunk 4: "well this work" Chunk 5: "works" Can be corrected in near real-time, expecting output like this (depending on your streaming and sliding window setup) "This is a t" "This is a test" "This is a test to see how" "This is a test to see how well this work" "This is a test to see how well this works"

In terms of general grammar correction the model does not perform as well as the small and base models, but performs better in streaming correction. All models are designed to support streaming inference, at low latency, but this model is designed to be more accurate on overlapping tokens in sliding windows.


Model Details

  • Base model: DJ-AI ASR Grammar Corrector (T5-Small)
  • Fine-tuned on: 90 million synthetic (noisy β†’ clean) sentence pairs + 500,000 real streaming pairs (noisy -> clean)
  • Training objective: Correct ASR-style transcription errors into clean, grammatical English
  • Token count: ~60 million tokens per epoch
  • Framework: Hugging Face Transformers + PyTorch

Benchmark Results

Model Type Precision Latency (s/sample) VRAM (MB) BLEU ROUGE-L Accuracy (%)ΒΉ Token Accuracy (%)Β² Size (MB)
dj-ai-asr-grammar-corrector-t5-base HF fp32 0.1151 24.98 78.92 90.31 44.62 90.39 5956.76
dj-ai-asr-grammar-corrector-t5-small HF fp32 0.0648 6.27 76.47 89.54 39.59 88.76 1620.15
dj-ai-asr-grammar-corrector-t5-small-streaming HF fp32 0.0634 14.77 76.25 89.61 39.9 88.54 1620.65
  1. Accuracy is a measure of how well the model performs across the full sentence. That is, a prediction is only counted as "correct" if the entire corrected sentence exactly matches the reference sentence. So if the model corrects 1 out of 2 errors, but the final output does not exactly match the expected sentence, it's counted as a fail.
  2. Token Accuracy is a measure of how well the model performs at the token level. Token Accuracy (%)=(Number of Matched TokensTotal Reference Tokens)Γ—100\text{Token Accuracy (\%)} = \left( \frac{\text{Number of Matched Tokens}}{\text{Total Reference Tokens}} \right) \times 100

Intended Use

Use Case βœ… Supported 🚫 Not Recommended
Post-ASR correction βœ… Yes
Real-time ASR pipelines βœ… Yes
Batch transcript cleanup βœ… Yes
Grammar education tools βœ… Yes
Formal document editing 🚫 Model may be too informal
Multilingual input 🚫 English-only fine-tuning

Corrects Common Streaming ASR Errors:

  • Homophone mistakes (their β†’ they're)
  • Subject-verb disagreement (he go β†’ he goes)
  • Verb tense corruption (i seen β†’ i saw)
  • Missing auxiliaries (you going β†’ are you going)
  • Contraction normalization (she is not β†’ she isn't)
  • Repeated words (i i want β†’ i want)
  • Misused articles/prepositions/pronouns
  • Misaligned sliding window tokens
  • Overlapping tokens

Example

DEMO: https://huggingface.co/spaces/dayyanj/dj-ai-asr-grammar-corrector-demo Input (noisy ASR): Git Repository: https://github.com/dayyanj/DJ-AI-ASR-GRAMMAR-CORRECTOR

Downloads last month
6
Safetensors
Model size
60.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dayyanj/dj-ai-asr-grammar-corrector-small-streaming

Base model

google-t5/t5-small
Finetuned
(2212)
this model

Space using dayyanj/dj-ai-asr-grammar-corrector-small-streaming 1