T5-Base News Summarizer (Multi-Style)
This model is a fine-tuned version of google/flan-t5-base trained on samples from CNN/DailyMail and XSum.
It generates news summaries in three styles: Harsh (Concise), Standard, and Detailed.
Model Description
- Model Type: Sequence-to-Sequence Transformer (T5)
- Language: English
- Base Model:
google/flan-t5-base - Training Data: ~9k mixed samples from CNN/DailyMail & XSum
Key Features
This model supports a Style Prompt that determines summary length and density:
Harsh
- Very concise
- Headline-like
- Trained mostly on XSum
Standard
- Balanced, general-purpose summarization
Detailed
- Longer, more contextual summaries
- Trained with CNN/DailyMail
Usage
from transformers import pipeline
summarizer = pipeline("summarization", model="Hiratax/t5-news-summarizer")
text = """
The James Webb Space Telescope (JWST) has captured a lush landscape of stellar birth.
The new image shows the Cosmic Cliffs, which are the edge of a giant gaseous cavity within the star-forming region NGC 3324.
"""
# 1. Standard
print(summarizer("summarize standard: " + text))
# 2. Harsh (Headline)
print(summarizer("summarize harsh: " + text))
# 3. Detailed
print(summarizer("summarize detailed: " + text))
Recommended Inference Parameters
| Style | Min Length | Max Length | Length Penalty | Repetition Penalty | N-Gram Block |
|---|---|---|---|---|---|
| Harsh | 10 | 35% of input | 1.0 | 2.0 | 3 |
| Standard | 60 | 150 | 2.0 | 1.5 | 3 |
| Detailed | 50% input | 150% of input | 1.5 | 1.2 | 4 |
Tip:
"Detailed" style benefits from no_repeat_ngram_size=4 to avoid repeated openings.
Training Procedure
Hyperparameters
- Epochs: 8
- Learning Rate: 1e-4
- Batch Size: 4
- Gradient Accumulation: 2
- Weight Decay: 0.01
- Optimizer: AdamW
- Precision: FP16
Data Strategy
- Harsh โ XSum (abstractive, short)
- Detailed โ CNN/DailyMail (longer, higher detail)
- Safety: Removed cases where summary > article length to reduce hallucinations
Limitations
- May occasionally output the typo "occupys" (training noise).
- Max input length: 512 tokens (longer text is truncated).
- Model performance decreases on extremely long or highly technical articles.
License
Apache 2.0
- Downloads last month
- 26
Model tree for Hiratax/t5-news-summarizer
Base model
google/flan-t5-base