zihanliu commited on
Commit
a053194
·
verified ·
1 Parent(s): 2d6a853

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ license_name: nvidia-open-model-license
5
+ license_link: >-
6
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
7
+ pipeline_tag: text-generation
8
+ language:
9
+ - en
10
+ tags:
11
+ - nvidia
12
+ - Nemotron-Cascade
13
+ - reasoning
14
+ - general-purpose
15
+ - SFT
16
+ - RL
17
+ - pytorch
18
+ ---
19
+
20
+ # Nemotron-Cascade-8B Intermediate ckpts
21
+
22
+ <p align="center">
23
+
24
+ [![Technical Report](https://img.shields.io/badge/2512.13607-Technical_Report-blue)](https://arxiv.org/abs/2512.13607)
25
+ [![SFT Dataset](https://img.shields.io/badge/🤗-SFT_Datset-blue)](https://huggingface.co/collections/nvidia/nemotron-cascade)
26
+ [![RL Dataset](https://img.shields.io/badge/🤗-RL_Datset-blue)](https://huggingface.co/collections/nvidia/nemotron-cascade)
27
+ [![Models](https://img.shields.io/badge/🤗-Models-blue)](https://huggingface.co/collections/nvidia/nemotron-cascade)
28
+ </p>
29
+
30
+
31
+ ## Introduction
32
+
33
+ This repository releases the intermediate checkpoints produced during the development of [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B). Nemotron-Cascade-8B is a general-purpose model trained using a sequential, domain-wise reinforcement learning pipeline, illustrated in the figure below.
34
+
35
+ <img src="fig/pipeline.png" alt="train_pipeline_fig" style="width: 1000px; max-width: 100%;" />
36
+
37
+ We release checkpoints corresponding to each major stage of training:
38
+
39
+ - **Nemotron-Cascade-8B-SFT** (completed multi-stage SFT)
40
+ - **Nemotron-Cascade-8B-RLHF** (completed RLHF)
41
+ - **Nemotron-Cascade-8B-IFRL** (completed instruction following RL)
42
+ - **Nemotron-Cascade-8B-MathRL** (completed Math RL)
43
+ - **Nemotron-Cascade-8B-CodeRL** (completed Code RL)
44
+
45
+ The final model, [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), is obtained after the concluding SWE RL stage.
46
+
47
+
48
+ ## Usage Recommendations
49
+
50
+ We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method to better support contexts longer than 32K. This can be enabled by updating the model’s `config.json` as shown below:
51
+ ```json
52
+ {
53
+ ...,
54
+ "rope_scaling": {
55
+ "rope_type": "yarn",
56
+ "factor": 2.0,
57
+ "original_max_position_embeddings": 32768
58
+ }
59
+ }
60
+ ```
61
+
62
+
63
+ ## Results
64
+
65
+ Same as [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), we use a maximum output length of 64K tokens for evaluation, with the temperature set to 0.6 and top-p to 0.95. We also apply RoPE scaling using the YaRN method with a scaling factor of 2.0.
66
+
67
+ | **Benchmark<br>Metric: Pass@1** | **Nemotron-<br>Cascade-8B-SFT** | **Nemotron-<br>Cascade-8B-RLHF** | **Nemotron-<br>Cascade-8B-IFRL** | **Nemotron-<br>Cascade-8B-MathRL** | **Nemotron-<br>Cascade-8B-CodeRL** | **Nemotron-<br>Cascade-8B** |
68
+ | :---- | :---: | :---: | :---: | :---: | :---: | :---: |
69
+ | ***Knowledge Reasoning*** |
70
+ | MMLU | 83.0 | 83.1 | 83.4 | 83.4 | 83.7 | 83.7 |
71
+ | MMLU Pro | 74.4 | 77.8 | 74.5 | 75.0 | 75.3 | 75.7 |
72
+ | GPQA-Diamond | 63.5 | 66.8 | 66.1 | 65.7 | 67.4 | 66.5 |
73
+ | ***Alignment*** |
74
+ | ArenaHard | 70.0 | 90.1 | 88.0 | 87.0 | 87.8 | 87.9 |
75
+ | IFEval (Strict Prompt) | 70.8 | 50.1 | 90.4 | 92.1 | 90.7 | 90.2 |
76
+ | IFBench | 21.2 | 24.5 | 40.5 | 40.4 | 38.1 | 40.8 |
77
+ | ***Math*** |
78
+ | AIME 2024 | 83.6 | 86.1 | 86.2 | 90.2 | 89.1 | 89.5 |
79
+ | AIME 2025 | 72.8 | 75.0 | 75.2 | 81.9 | 80.5 | 80.1 |
80
+ | ***Code*** |
81
+ | LCB v5 (08/24-02/25) | 59.2 | 70.2 | 70.2 | 70.6 | 75.3 | 74.3 |
82
+ | LCB v6 (08/24-05/25) | 56.7 | 67.2 | 66.7 | 67.4 | 71.5 | 71.1 |
83
+ | SWE Verified (Agentless) | 26.1 | 28.2 | 28.3 | 30.6 | 31.6 | 37.2 |
84
+
85
+
86
+ ## Chat Template
87
+
88
+ All intermediate checkpoints use the same chat template as [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B). Each is a unified model supporting both ***thinking*** and ***instruct*** (non-reasoning) modes. To switch between these two modes, simply append the `" /think"` (for ***thinking***) or the `" /no_think"` (for ***instruct***) tag to the end of the user input. See [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B) for additional details.
89
+
90
+
91
+ ## Release Date
92
+ Dec 19, 2025
93
+
94
+
95
+ ## License
96
+ Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
97
+
98
+
99
+ ## Citation
100
+ ```
101
+ @article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
102
+ title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
103
+ author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
104
+ year={2025}
105
+ }
106
+ ```
107
+