view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 285
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 Feb 20 • 498
HuggingFaceTB/SmolLM2-135M-Instruct Text Generation • 0.1B • Updated Sep 22, 2025 • 917k • 302
view article Article Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model Feb 4 • 28
view article Article I built a spot market for bare metal GPUs (and how to get A100s for $0.38/hr) Dec 16, 2025 • 2