view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23 • 62
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model Mar 10 • 146
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 151
view article Article ArabicWeb24: Creating a High Quality Arabic Web-only Pre-training Dataset Aug 8, 2024 • 11