XLOverflow
/

qwen3-eagle3-accrate

@@ -18,6 +18,8 @@ Part of a course project evaluating per-step weighted loss functions for trainin
 EAGLE3 draft models. Full pipeline and source:
 **https://github.com/XLOverflow/anlp_course_project**
 ## Training
 - **Framework:** [SpecForge](https://github.com/sgl-project/SpecForge) (our fork: https://github.com/XLOverflow/SpecForge)
@@ -29,15 +31,23 @@ EAGLE3 draft models. Full pipeline and source:
 - Additional epochs: 1
 - β_s profiled offline via `scripts/train/profile_beta.py`
 ## Files
 - `model.safetensors` — draft model weights (~763 MB)
 - `config.json` — model config
-- Checkpoint corresponds to: `outputs/eagle3-accrate/epoch_0_step_17026` in the original training output
-Optimizer state (`training_state.pt`, ~3 GB) is not uploaded — use the project
-repo's training scripts to resume from scratch if needed.
 ## Usage

 EAGLE3 draft models. Full pipeline and source:
 **https://github.com/XLOverflow/anlp_course_project**
+Collection: [Qwen3 EAGLE3 — Weighted Loss Variants](https://huggingface.co/collections/XLOverflow/qwen3-eagle3-weighted-loss-variants)
 ## Training
 - **Framework:** [SpecForge](https://github.com/sgl-project/SpecForge) (our fork: https://github.com/XLOverflow/SpecForge)
 - Additional epochs: 1
 - β_s profiled offline via `scripts/train/profile_beta.py`
+## Evaluation (Qwen3-8B target)
+| Dataset | τ (accept. length) | Speedup | Accuracy |
+|---|---|---|---|
+| GSM8K | 7.359 | 4.588× | 95.15% |
+| MATH500 | 7.326 | 4.606× | 95.20% |
+Baselines for reference: Vanilla ≈ 1× speedup, EAGLE-orig ≈ 2× speedup.
 ## Files
 - `model.safetensors` — draft model weights (~763 MB)
 - `config.json` — model config
+- Corresponds to: `outputs/eagle3-accrate/epoch_0_step_17026` in the original training output
+Optimizer state (~3 GB) is not uploaded — use the project repo's training scripts to resume from scratch if needed.
 ## Usage