Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Paper • 2601.20833 • Published Jan 28 • 183
view article Article Supercharge your OCR Pipelines with Open Models +5 merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq • Oct 21, 2025 • 313
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 43 items • Updated Mar 2 • 720
Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark Paper • 2311.09122 • Published Nov 15, 2023 • 8
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model EuroBERT • Mar 10, 2025 • 147
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5, 2025 • 308
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 455
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated May 5, 2025 • 251
view article Article Docmatix - a huge dataset for Document Visual Question Answering andito, HugoLaurencon • Jul 18, 2024 • 78
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality +8 evijit, frimelle, yjernite, meg, irenesolaiman, dvilasuero, fdaudens, BrigitteTousi, giadap, sasha • Jun 24, 2024 • 34
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 88 items • Updated Mar 2 • 119