flan-t5-base-mixed-15-1-catastrophic

Model trained as part of: "Mitigating Catastrophic Forgetting in Mathematical Reasoning Finetuning through Mixed Training"

This model investigates catastrophic forgetting when finetuning language models for specialized tasks. We demonstrate that math-only training causes severe NLI degradation (81% → 16.5%), while mixed training eliminates this forgetting while maintaining equivalent mathematical performance.

Quick Links

Model Description

This is the final checkpoint after completing 3 epochs of training.

This checkpoint represents the mixed-15-1-final training configuration from our systematic study of catastrophic forgetting mitigation strategies.

Training Configuration

  • Base Model: google/flan-t5-base (250M parameters)
  • Training Type: Mixed (15:1 Math:NLI)
  • Math Dataset: DeepMind Mathematics dataset (algebra__linear_1d subset), 392,702 training examples
  • NLI Dataset: MultiNLI (matched + mismatched splits), 392,702 training examples
  • Training Details:
    • Learning rate: 3e-4 with cosine decay
    • Warmup: 6% of total steps
    • Epochs: 3
    • Effective batch size: 256 examples
    • Precision: bfloat16
    • Optimizer: FusedAdam
    • Hardware: Single NVIDIA A100 (40GB)

This model was trained with a 15:1 mixing ratio, meaning 93.8% math examples and 6.2% NLI examples in each batch.

Performance

Evaluation Protocol: Final evaluation on complete validation sets

  • Math: 10,000 examples (DeepMind Mathematics linear algebra 1D)
  • NLI: 9,815 examples (MultiNLI matched split)
Task Accuracy Baseline Δ from Baseline
Mathematical Reasoning 11.7% 3.1% +8.6pp
Natural Language Inference 83.8% 81.0% +2.8pp

Key Findings from Our Study

  1. Catastrophic Forgetting is Severe: Math-only training drops NLI accuracy from 81% to 16.5% (−64.5pp)
  2. Mixed Training Eliminates Forgetting: Balanced 1:1 ratio maintains 86.2% NLI while achieving 12.0% math
  3. No Performance Trade-off: Mixed training matches math-only performance (12.0% vs 12.0%)
  4. Minimal Exposure Suffices: Even 6.25% NLI exposure (15:1 ratio) prevents catastrophic collapse

Usage

Basic Inference

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("MarioBarbeque/flan-t5-base-mixed-15-1-catastrophic")
tokenizer = T5Tokenizer.from_pretrained("MarioBarbeque/flan-t5-base-mixed-15-1-catastrophic")

# Mathematical reasoning example
math_input = "Solve 24 = 1601*c - 1605*c for c."
inputs = tokenizer(math_input, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "-6"

# NLI example  
nli_input = "mnli premise: The cat sat on the mat. hypothesis: An animal was on the mat."
inputs = tokenizer(nli_input, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "yes" (entailment)

Batch Processing

import torch

# Batch of linear algebra problems
math_problems = [
    "Solve 24 = 1601*c - 1605*c for c.",
    "Solve 657 = -220*t + 1086*t + 22307 for t.",
    "Solve -11*y - 263*y + 3162 = -88*y for y."
]

inputs = tokenizer(math_problems, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_new_tokens=8)
results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(results)

Training Code

The complete training code, evaluation scripts, and experiment configurations are available in our GitHub repository.

Related Models

Citation

If you use this model in your research, please cite:

@article{reynolds2024catastrophic,
  title={Mitigating Catastrophic Forgetting in Mathematical Reasoning Finetuning through Mixed Training},
  author={Reynolds, John Graham},
  journal={arXiv preprint},
  year={2024},
  url={https://github.com/johngrahamreynolds/mathematical_catastrophe_mitigation}
}

For the CyberSolve-LinAlg model (Flan-T5-Large baseline):

@misc{cybersolve2024,
  author={Reynolds, John Graham},
  title={CyberSolve-LinAlg: Flan-T5-Large Finetuned for Linear Algebra Problem Solving},
  year={2024},
  howpublished={\url{https://huggingface.co/MarioBarbeque/CyberSolve-LinAlg-1.2}}
}

License

This model is released under the Apache 2.0 license, following the base model (google/flan-t5-base).

Model Card Authors

John Graham Reynolds (@MarioBarbeque)

Contact

Acknowledgments

This research would not have been possible without the wonderful instruction of Greg Durrett. The author would also like to thank John Jumper for motivating this research during his visit to Vanderbilt University.

Downloads last month
22
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MarioBarbeque/flan-t5-base-mixed-15-1-catastrophic

Finetuned
(868)
this model

Datasets used to train MarioBarbeque/flan-t5-base-mixed-15-1-catastrophic

Collection including MarioBarbeque/flan-t5-base-mixed-15-1-catastrophic