MURAD: A Large-Scale Multi-Domain Unified Reverse Arabic Dictionary Dataset Paper • 2601.21512 • Published 15 days ago • 1
SARD: A Large-Scale Synthetic Arabic OCR Dataset for Book-Style Text Recognition Paper • 2505.24600 • Published May 30, 2025 • 2
ARCADE: A City-Scale Corpus for Fine-Grained Arabic Dialect Tagging Paper • 2601.02209 • Published Jan 5 • 3
ArabianLLM Series | Native Arabic Large Language Models Collection This collection is related to native Arabic Large Language Models.. It represent different sizes of GPT trained Model for Test Generative • 8 items • Updated Aug 26, 2024 • 5
Aranizer | Arabic Tokenization with SentencePiece & PBE Collection Collection of Arabic Tokenizers with different sizes based on SentencePiece & PBE Encodings suitable for training LLMs • 6 items • Updated Aug 25, 2024 • 3