Running Featured 192 Gemma 4 WebGPU 🚀 192 Run Gemma 4 locally in-browser on WebGPU w/ Transformers.js
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Paper • 2604.08455 • Published 11 days ago • 47
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 16
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16, 2025 • 70
arcee-ai/Trinity-Large-Thinking Text Generation • 399B • Updated 11 days ago • 20.1k • • 159
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 299