How is the 2.6b model better than this one in literally every use case I have???

by cinnybun02 - opened 13 days ago

13 days ago

The benchmarks show this moe variant is better and it should be but that's not the case. Hell the Q4_km version of 2.6b performs better somehow.

mlabonne

Liquid AI org 13 days ago

Yes, this model is stronger overall (especially in code), but maybe not for your particular use cases. Could you tell us more about them?

OMS23

11 days ago

•

edited 11 days ago

@mlabonne

There is a theory that said MoE model aren't that good in small scale, basically at small parameters count it will be worse in some way than the dense counter part.

But it still gonna be better at big scale.

I could be wrong but it's seems like that

mlabonne

Liquid AI org 11 days ago

Do you have a reference for this theory? I believe the trade-offs between dense and MoE models are well understood overall. In this specific case, LFM2-2.6B is a very deep model, unlike this MoE. It means that reasoning-heavy tasks might work better with the 2.6B, but that is very use-case-dependent. Overall, LFM2-8B-A1B is a stronger (and faster) model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment