FINAL_Bench

community

AI & ML interests

None defined yet.

Recent Activity

SeaWolf-AI updated a model 11 minutes ago

FINAL-Bench/Darwin-28B-REASON

SeaWolf-AI published a model 11 minutes ago

FINAL-Bench/Darwin-28B-REASON

SeaWolf-AI updated a dataset about 9 hours ago

FINAL-Bench/service-urls

View all activity

Papers

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

View all Papers

Articles

Training-Free Reasoning at 88.89% on GPQA Diamond: How Darwin Family Hit Frontier Scores Without a Single Gradient Step

Darwin-TTS: We Gave a TTS Model 3% of an LLM's Brain — It Started Showing Emotion

"Darwin-27B-Opus: Surpassing the Foundation Model Without Training"

Darwin V6: Diagnostic-Guided Evolutionary Model Merging

"The Child That Surpassed Both Parents Through MRI-Guided Evolutionary Merge"

Introducing WM Bench: A Benchmark for Cognitive Intelligence in World Models

🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do

MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning

Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework

Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism?

FINAL Bench: The Real Bottleneck to AGI Is Self-Correction

View all articles

updated a model 11 minutes ago

FINAL-Bench/Darwin-28B-REASON

Text Generation • Updated 6 minutes ago • 3

published a model 11 minutes ago

FINAL-Bench/Darwin-28B-REASON

Text Generation • Updated 6 minutes ago • 3

updated a dataset about 9 hours ago

FINAL-Bench/service-urls

Viewer • Updated 2 minutes ago • 1 • 4.68k

in FINAL-Bench/Darwin-36B-Opus about 12 hours ago

Fantastic Model

#5 opened 2 days ago by

updated a collection about 13 hours ago

DARWIN-Family

비드래프트 • 48 items • Updated about 13 hours ago • 18

updated a Space about 13 hours ago

Darwin 9B NEG

Darwin-9B-NEG reasoning model — API-served chat demo

published a Space about 13 hours ago

Darwin 9B NEG

Darwin-9B-NEG reasoning model — API-served chat demo

updated a collection 1 day ago

DARWIN-Family

비드래프트 • 48 items • Updated about 13 hours ago • 18

in FINAL-Bench/Darwin-TTS-1.7B-Cross 1 day ago

Add paper link, author information, and citation

#1 opened 1 day ago by

in FINAL-Bench/Darwin-4B-Genesis 1 day ago

Improve model card metadata and link to paper

#3 opened 1 day ago by

in FINAL-Bench/Darwin-36B-Opus 2 days ago

Added vision: Qwen3.6-35B-A3B-DarleyQuinn

#4 opened 4 days ago by

APEX Quant Request + Real World Performance

#3 opened 18 days ago by

published an article 2 days ago

Article

Training-Free Reasoning at 88.89% on GPQA Diamond: How Darwin Family Hit Frontier Scores Without a Single Gradient Step

FINAL-Bench

•

2 days ago

• 12

posted an update 2 days ago

Post

5125

🧬 Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%

How far can we push LLM reasoning *without* training?

Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's
currently #3. Huge thanks to everyone who upvoted — sharing the core ideas below.

🔗 Paper: Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning (2605.14386)
🔗 arXiv: https://arxiv.org/abs/2605.14386
🔗 Model: FINAL-Bench/Darwin-28B-Opus

---

TL;DR

Darwin Family is a training-free evolutionary merging framework.
By recombining the weight spaces of existing LLM checkpoints — with zero
gradient-based training — it reaches frontier-level reasoning.

- 🏆 Darwin-28B-Opus: GPQA Diamond 88.89%
- 💸 Zero gradient steps — not a single B200 or H200 hour needed
- 🧬 Consistent gains across 4B → 35B scale
- 🔀 Cross-architecture breeding between Transformer and Mamba families
- 🔁 Stable recursive multi-generation evolution

#Three Core Mechanisms

① 14-dim Adaptive Merge Genome — fine-grained recombination at both
component level (Attention / FFN / MLP / LayerNorm / Embedding) and block
level, expanding the prior evolutionary-merge search space.

② MRI-Trust Fusion — we diagnose each layer's reasoning contribution
via an **MRI (Model Reasoning Importance)** signal and fuse it with
evolutionary search through a **learnable trust parameter**. Trust the
diagnostic too much and search collapses; ignore it and search becomes
inefficient — Darwin learns the balance from data.

③ Architecture Mapper — weight-space breeding across heterogeneous
families. Attention × SSM crossover actually works.

Why It Matters
> Diagnose latent capabilities already encoded in open checkpoints,
> and recombine them — no gradients required.

Replies and critiques welcome 🙌

3 replies

·

submitted a paper to Daily Papers 2 days ago

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Paper • 2605.14386 • Published 3 days ago • 50

updated a model 2 days ago

FINAL-Bench/Darwin-28B-KR-Legal

Text Generation • 27B • Updated 2 days ago • 342 • 12

updated a Space 2 days ago

Model Galaxy

Darwin family + 2026 trending models on the HF galaxy

updated 3 models 2 days ago

FINAL-Bench/Darwin-28B-KR

Text Generation • 27B • Updated 2 days ago • 416 • 10

FINAL-Bench/Darwin-2B-Opus-LoRA

Updated 2 days ago • 31 • 4

FINAL-Bench/Darwin-36B-Opus

Text Generation • 35B • Updated 2 days ago • 2.85k • 61