Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 5 days ago • 67
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE Paper • 2502.06282 • Published Feb 10, 2025 • 6