mrinaalarora/mrinaal-124m-base
Updated
Checkpoints from my first 124M LLM pre-training project, covering scratch training, continued pre-training, and SFT experiments.
Note 123.6M-parameter decoder-only LM trained from scratch on 2B FineWeb-Edu tokens. RoPE, RMSNorm, SwiGLU, tied embeddings.
Note Continued pre-training checkpoint: 1B additional tokens on a mixed recipe atop mrinaal-124m-base.
Note Instruction-tuned checkpoint: mrinaal-124m-base-v2 SFT on the first 50k valid Smol-SmolTalk examples.
Note Continued pre-training checkpoint: 1.5B additional math-heavy mixed-recipe tokens atop mrinaal-124m-base-v2; best val loss 2.6333.
Note Instruction-tuned checkpoint: mrinaal-124m-base-v3-mathmix SFT on 150k valid Smol-SmolTalk examples; best val loss 1.6581.