Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts Paper • 2404.05019 • Published Apr 7, 2024 • 2
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published Oct 9, 2025 • 27
view article Article Arc Virtual Cell Challenge: A Primer FL33TW00D-HF, abhinadduri • Jul 18, 2025 • 66
view article Article The Transformers Library: standardizing model definitions +2 lysandre, ArthurZ, pcuenq, julien-c • May 15, 2025 • 121
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 478
view article Article Welcome Llama 4 Maverick & Scout on Hugging Face +5 burtenshaw, reach-vb, pcuenq, clem, rajatarya, jsulz, lysandre • Apr 5, 2025 • 149
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 207
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! medmekk, marcsun13 • Mar 7, 2025 • 98
view article Article Open-source DeepResearch – Freeing our search agents +3 m-ric, albertvillanova, merve, thomwolf, clefourrier • Feb 4, 2025 • 1.32k
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Paper • 2409.15241 • Published Sep 23, 2024 • 1
Scaling Laws for Floating Point Quantization Training Paper • 2501.02423 • Published Jan 5, 2025 • 26
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 260
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 22
view article Article How NuminaMath Won the 1st AIMO Progress Prize +6 yfleureau, liyongsea, edbeeching, lewtun, benlipkin, romansoletskyi, vwxyzjn, kashif • Jul 11, 2024 • 128
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Paper • 2201.02177 • Published Jan 6, 2022 • 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? +1 neuralink, lvwerra, thomwolf • Aug 14, 2024 • 76