YuchenLi01/GSM8K_Llama-3.2-1B-Instruct_Score_DPO_Qwen2.5MathRM72B_hard0soft16_all_soft_random_unfiltered Viewer • Updated Sep 5, 2025 • 120k • 4
YuchenLi01/GSM8K_Llama-3.2-1B-Instruct_Score_DPO_Qwen2.5MathRM72B_hard0soft8_all_soft_random_unfiltered Viewer • Updated Sep 5, 2025 • 59.8k • 2
YuchenLi01/GSM8K_Llama-3.2-1B-Instruct_Score_DPO_Qwen2.5MathRM72B_hard0soft4_all_soft_random_unfiltered Viewer • Updated Sep 5, 2025 • 29.9k • 2
YuchenLi01/GSM8K_Llama-3.2-1B-Instruct_Score_DPO_Qwen2.5MathRM72B_hard0soft2_all_soft_random_unfiltered Viewer • Updated Sep 5, 2025 • 14.9k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_hard0soft4_v4_all_soft_random_unfiltered Viewer • Updated Aug 25, 2025 • 29.9k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_hard0soft16_v4_all_soft_random_unfiltered Viewer • Updated Aug 22, 2025 • 120k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_hard2soft2_v7_hardsoftrand Viewer • Updated Aug 21, 2025 • 14.9k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_gt-2th2_hard2soft2_v6_hardchosenrejectedrand Viewer • Updated Aug 21, 2025 • 14.9k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_gt-2th2_hard2soft2_v5_hardchosenhigh_rejectedrand Viewer • Updated Aug 21, 2025 • 14.9k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_hard0soft2_v4_all_soft_random_unfiltered Viewer • Updated Aug 21, 2025 • 14.9k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_hard0soft8_v4_all_soft_random_unfiltered Viewer • Updated Aug 21, 2025 • 59.8k • 2
YuchenLi01/GSM8K_1.5Bsft_Score_DPO_Qwen2.5MathRM72B_gt-2th2_hard2soft2_v3_hardchosenhighreward Viewer • Updated Aug 20, 2025 • 14.3k • 2