flan-t5-base-mixed-1-1-catastrophic
Model trained as part of: "Mitigating Catastrophic Forgetting in Mathematical Reasoning Finetuning through Mixed Training"
This model investigates catastrophic forgetting when finetuning language models for specialized tasks. We demonstrate that math-only training causes severe NLI degradation (81% → 16.5%), while mixed training eliminates this forgetting while maintaining equivalent mathematical performance.
Quick Links
- 📄 Paper: arXiv (to be updated after submission)
- 💻 Code: GitHub Repository
- 🤗 Model Collection: All experiment checkpoints
Model Description
This is the final checkpoint after completing 3 epochs of training.
This checkpoint represents the mixed-1-1-final training configuration from our systematic study of catastrophic forgetting mitigation strategies.
Training Configuration
- Base Model: google/flan-t5-base (250M parameters)
- Training Type: Mixed (1:1 Math:NLI)
- Math Dataset: DeepMind Mathematics dataset (algebra__linear_1d subset), 392,702 training examples
- NLI Dataset: MultiNLI (matched + mismatched splits), 392,702 training examples
- Training Details:
- Learning rate: 3e-4 with cosine decay
- Warmup: 6% of total steps
- Epochs: 3
- Effective batch size: 256 examples
- Precision: bfloat16
- Optimizer: FusedAdam
- Hardware: Single NVIDIA A100 (40GB)
This model was trained with a 1:1 mixing ratio, meaning 50.0% math examples and 50.0% NLI examples in each batch.
Performance
Evaluation Protocol: Final evaluation on complete validation sets
- Math: 10,000 examples (DeepMind Mathematics linear algebra 1D)
- NLI: 9,815 examples (MultiNLI matched split)
| Task | Accuracy | Baseline | Δ from Baseline |
|---|---|---|---|
| Mathematical Reasoning | 12.0% | 3.1% | +8.9pp |
| Natural Language Inference | 86.2% | 81.0% | +5.2pp |
Key Findings from Our Study
- Catastrophic Forgetting is Severe: Math-only training drops NLI accuracy from 81% to 16.5% (−64.5pp)
- Mixed Training Eliminates Forgetting: Balanced 1:1 ratio maintains 86.2% NLI while achieving 12.0% math
- No Performance Trade-off: Mixed training matches math-only performance (12.0% vs 12.0%)
- Minimal Exposure Suffices: Even 6.25% NLI exposure (15:1 ratio) prevents catastrophic collapse
Usage
Basic Inference
from transformers import T5ForConditionalGeneration, T5Tokenizer
# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("MarioBarbeque/flan-t5-base-mixed-1-1-catastrophic")
tokenizer = T5Tokenizer.from_pretrained("MarioBarbeque/flan-t5-base-mixed-1-1-catastrophic")
# Mathematical reasoning example
math_input = "Solve 24 = 1601*c - 1605*c for c."
inputs = tokenizer(math_input, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "-6"
# NLI example
nli_input = "mnli premise: The cat sat on the mat. hypothesis: An animal was on the mat."
inputs = tokenizer(nli_input, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "yes" (entailment)
Batch Processing
import torch
# Batch of linear algebra problems
math_problems = [
"Solve 24 = 1601*c - 1605*c for c.",
"Solve 657 = -220*t + 1086*t + 22307 for t.",
"Solve -11*y - 263*y + 3162 = -88*y for y."
]
inputs = tokenizer(math_problems, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_new_tokens=8)
results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(results)
Training Code
The complete training code, evaluation scripts, and experiment configurations are available in our GitHub repository.
Related Models
- Larger Scale: CyberSolve-LinAlg-1.2 - Flan-T5-Large (780M) achieving 90.8% on math (8× improvement over this 250M model)
- Other Experiments: See all checkpoints from this study at MarioBarbeque's models
Citation
If you use this model in your research, please cite:
@article{reynolds2024catastrophic,
title={Mitigating Catastrophic Forgetting in Mathematical Reasoning Finetuning through Mixed Training},
author={Reynolds, John Graham},
journal={arXiv preprint},
year={2024},
url={https://github.com/johngrahamreynolds/mathematical_catastrophe_mitigation}
}
For the CyberSolve-LinAlg model (Flan-T5-Large baseline):
@misc{cybersolve2024,
author={Reynolds, John Graham},
title={CyberSolve-LinAlg: Flan-T5-Large Finetuned for Linear Algebra Problem Solving},
year={2024},
howpublished={\url{https://huggingface.co/MarioBarbeque/CyberSolve-LinAlg-1.2}}
}
License
This model is released under the Apache 2.0 license, following the base model (google/flan-t5-base).
Model Card Authors
John Graham Reynolds (@MarioBarbeque)
Contact
- Email: [email protected]
- GitHub: @johngrahamreynolds
Acknowledgments
This research would not have been possible without the wonderful instruction of Greg Durrett. The author would also like to thank John Jumper for motivating this research during his visit to Vanderbilt University.
- Downloads last month
- 22
Model tree for MarioBarbeque/flan-t5-base-mixed-1-1-catastrophic
Base model
google/flan-t5-base