Mixture of Experts
updated
BlackMamba: Mixture of Experts for State-Space Models
Paper
• 2402.01771
• Published
• 25
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published
• 28
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
• 2401.15947
• Published
• 53
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
Paper
• 2401.04088
• Published
• 160
Scaling Laws for Fine-Grained Mixture of Experts
Paper
• 2402.07871
• Published
• 13
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
• 2402.08609
• Published
• 36
Multi-Head Mixture-of-Experts
Paper
• 2404.15045
• Published
• 60
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published
• 107
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published
• 25