genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch8.0_42
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:
- Loss: 0.5376
- Rewards/chosen: 0.7846
- Rewards/rejected: 0.0
- Rewards/accuracies: 0.875
- Rewards/margins: 0.7846
- Logps/rejected: -41.1047
- Logps/chosen: -22.2238
- Logits/rejected: -3.5338
- Logits/chosen: -3.4095
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 8.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6894 | 0.1117 | 20 | 0.6905 | 0.0099 | 0.0 | 0.5 | 0.0099 | -41.5868 | -29.9713 | -2.2354 | -2.3818 |
| 0.64 | 0.2235 | 40 | 0.6394 | 0.1094 | 0.0 | 0.9750 | 0.1094 | -40.4877 | -28.9760 | -2.3022 | -2.4396 |
| 0.5438 | 0.3352 | 60 | 0.5322 | 0.3576 | 0.0 | 1.0 | 0.3576 | -37.7918 | -26.4941 | -2.5104 | -2.6190 |
| 0.4281 | 0.4469 | 80 | 0.4239 | 0.6785 | 0.0 | 0.9750 | 0.6785 | -34.2644 | -23.2855 | -2.8776 | -2.9361 |
| 0.3182 | 0.5587 | 100 | 0.3871 | 0.8410 | 0.0 | 0.9750 | 0.8410 | -33.0697 | -21.6598 | -3.0829 | -3.1073 |
| 0.3615 | 0.6704 | 120 | 0.3660 | 0.9449 | 0.0 | 0.9750 | 0.9449 | -32.1668 | -20.6208 | -3.2298 | -3.2239 |
| 0.3887 | 0.7821 | 140 | 0.3573 | 0.9953 | 0.0 | 0.9750 | 0.9953 | -31.6459 | -20.1170 | -3.2462 | -3.2391 |
| 0.3456 | 0.8939 | 160 | 0.3513 | 1.0109 | 0.0 | 0.9750 | 1.0109 | -31.5973 | -19.9609 | -3.2623 | -3.2451 |
| 0.3038 | 1.0056 | 180 | 0.3480 | 1.0406 | 0.0 | 0.9750 | 1.0406 | -31.4452 | -19.6639 | -3.2867 | -3.2693 |
| 0.2996 | 1.1173 | 200 | 0.3482 | 1.0503 | 0.0 | 0.9750 | 1.0503 | -31.2129 | -19.5670 | -3.3491 | -3.3129 |
| 0.2947 | 1.2291 | 220 | 0.3455 | 1.0613 | 0.0 | 0.9750 | 1.0613 | -31.1605 | -19.4574 | -3.3765 | -3.3360 |
| 0.2807 | 1.3408 | 240 | 0.3446 | 1.0742 | 0.0 | 0.9750 | 1.0742 | -31.1609 | -19.3284 | -3.3844 | -3.3389 |
| 0.2892 | 1.4525 | 260 | 0.3436 | 1.0974 | 0.0 | 0.9750 | 1.0974 | -31.2525 | -19.0959 | -3.4174 | -3.3675 |
| 0.2755 | 1.5642 | 280 | 0.3415 | 1.0894 | 0.0 | 0.9750 | 1.0894 | -31.1768 | -19.1764 | -3.3915 | -3.3449 |
| 0.3311 | 1.6760 | 300 | 0.3440 | 1.0970 | 0.0 | 0.9750 | 1.0970 | -31.1934 | -19.1000 | -3.4187 | -3.3663 |
| 0.308 | 1.7877 | 320 | 0.3380 | 1.0940 | 0.0 | 0.9750 | 1.0940 | -31.0993 | -19.1305 | -3.3899 | -3.3421 |
| 0.3006 | 1.8994 | 340 | 0.3376 | 1.1069 | 0.0 | 0.9750 | 1.1069 | -31.1959 | -19.0009 | -3.4089 | -3.3610 |
| 0.235 | 2.0112 | 360 | 0.3396 | 1.1140 | 0.0 | 0.9750 | 1.1140 | -31.1403 | -18.9303 | -3.4337 | -3.3788 |
| 0.2266 | 2.1229 | 380 | 0.3574 | 1.1053 | 0.0 | 0.9000 | 1.1053 | -32.1879 | -19.0168 | -3.4916 | -3.4137 |
| 0.201 | 2.2346 | 400 | 0.3570 | 1.1130 | 0.0 | 0.9250 | 1.1130 | -31.9785 | -18.9396 | -3.4926 | -3.4119 |
| 0.2431 | 2.3464 | 420 | 0.3618 | 1.1126 | 0.0 | 0.9250 | 1.1126 | -32.5464 | -18.9440 | -3.5039 | -3.4203 |
| 0.1932 | 2.4581 | 440 | 0.3558 | 1.1157 | 0.0 | 0.9250 | 1.1157 | -32.1669 | -18.9132 | -3.4950 | -3.4113 |
| 0.2296 | 2.5698 | 460 | 0.3598 | 1.1063 | 0.0 | 0.9250 | 1.1063 | -32.2979 | -19.0069 | -3.5134 | -3.4279 |
| 0.212 | 2.6816 | 480 | 0.3591 | 1.1226 | 0.0 | 0.9250 | 1.1226 | -32.1543 | -18.8444 | -3.5250 | -3.4411 |
| 0.1894 | 2.7933 | 500 | 0.3594 | 1.1202 | 0.0 | 0.9250 | 1.1202 | -32.3537 | -18.8678 | -3.5194 | -3.4356 |
| 0.2067 | 2.9050 | 520 | 0.3609 | 1.1099 | 0.0 | 0.9500 | 1.1099 | -32.5710 | -18.9707 | -3.5225 | -3.4337 |
| 0.1881 | 3.0168 | 540 | 0.3589 | 1.1291 | 0.0 | 0.9250 | 1.1291 | -32.5229 | -18.7786 | -3.5296 | -3.4387 |
| 0.17 | 3.1285 | 560 | 0.3970 | 1.0512 | 0.0 | 0.9000 | 1.0512 | -34.5200 | -19.5579 | -3.5542 | -3.4475 |
| 0.1873 | 3.2402 | 580 | 0.3993 | 1.0404 | 0.0 | 0.9000 | 1.0404 | -34.5039 | -19.6665 | -3.5730 | -3.4683 |
| 0.1644 | 3.3520 | 600 | 0.3881 | 1.0645 | 0.0 | 0.9000 | 1.0645 | -33.7163 | -19.4250 | -3.5589 | -3.4537 |
| 0.1464 | 3.4637 | 620 | 0.4029 | 1.0201 | 0.0 | 0.9000 | 1.0201 | -34.4937 | -19.8686 | -3.5646 | -3.4565 |
| 0.1617 | 3.5754 | 640 | 0.4082 | 0.9922 | 0.0 | 0.9000 | 0.9922 | -34.7634 | -20.1483 | -3.5579 | -3.4499 |
| 0.18 | 3.6872 | 660 | 0.4077 | 0.9961 | 0.0 | 0.9000 | 0.9961 | -34.9782 | -20.1087 | -3.5564 | -3.4476 |
| 0.2045 | 3.7989 | 680 | 0.4052 | 1.0121 | 0.0 | 0.9000 | 1.0121 | -34.9717 | -19.9491 | -3.5722 | -3.4640 |
| 0.142 | 3.9106 | 700 | 0.4050 | 1.0185 | 0.0 | 0.9000 | 1.0185 | -35.1007 | -19.8855 | -3.5645 | -3.4541 |
| 0.1421 | 4.0223 | 720 | 0.4067 | 1.0457 | 0.0 | 0.9000 | 1.0457 | -35.2827 | -19.6135 | -3.5662 | -3.4560 |
| 0.1318 | 4.1341 | 740 | 0.4443 | 0.9731 | 0.0 | 0.9000 | 0.9731 | -36.9992 | -20.3389 | -3.5594 | -3.4419 |
| 0.1356 | 4.2458 | 760 | 0.4473 | 0.9657 | 0.0 | 0.9000 | 0.9657 | -37.2294 | -20.4132 | -3.5715 | -3.4548 |
| 0.1253 | 4.3575 | 780 | 0.4582 | 0.9353 | 0.0 | 0.9000 | 0.9353 | -37.5374 | -20.7167 | -3.5529 | -3.4340 |
| 0.1686 | 4.4693 | 800 | 0.4477 | 0.9578 | 0.0 | 0.9000 | 0.9578 | -37.2330 | -20.4917 | -3.5556 | -3.4378 |
| 0.1635 | 4.5810 | 820 | 0.4505 | 0.9534 | 0.0 | 0.9000 | 0.9534 | -37.2043 | -20.5356 | -3.5566 | -3.4383 |
| 0.1932 | 4.6927 | 840 | 0.4553 | 0.9423 | 0.0 | 0.9000 | 0.9423 | -37.5706 | -20.6470 | -3.5602 | -3.4423 |
| 0.1662 | 4.8045 | 860 | 0.4572 | 0.9450 | 0.0 | 0.9000 | 0.9450 | -37.4698 | -20.6204 | -3.5555 | -3.4374 |
| 0.1418 | 4.9162 | 880 | 0.4498 | 0.9483 | 0.0 | 0.9000 | 0.9483 | -36.9635 | -20.5873 | -3.5514 | -3.4320 |
| 0.1116 | 5.0279 | 900 | 0.4688 | 0.9130 | 0.0 | 0.9000 | 0.9130 | -37.8485 | -20.9396 | -3.5580 | -3.4391 |
| 0.1144 | 5.1397 | 920 | 0.4856 | 0.8822 | 0.0 | 0.9000 | 0.8822 | -38.4848 | -21.2482 | -3.5577 | -3.4373 |
| 0.1212 | 5.2514 | 940 | 0.4955 | 0.8696 | 0.0 | 0.9000 | 0.8696 | -39.3209 | -21.3736 | -3.5563 | -3.4348 |
| 0.1241 | 5.3631 | 960 | 0.4987 | 0.8617 | 0.0 | 0.9000 | 0.8617 | -39.4275 | -21.4527 | -3.5473 | -3.4247 |
| 0.121 | 5.4749 | 980 | 0.4950 | 0.8748 | 0.0 | 0.9000 | 0.8748 | -39.2344 | -21.3220 | -3.5495 | -3.4283 |
| 0.1258 | 5.5866 | 1000 | 0.4909 | 0.8913 | 0.0 | 0.9000 | 0.8913 | -39.3551 | -21.1572 | -3.5464 | -3.4257 |
| 0.1172 | 5.6983 | 1020 | 0.4984 | 0.8604 | 0.0 | 0.9000 | 0.8604 | -39.5203 | -21.4664 | -3.5465 | -3.4245 |
| 0.1481 | 5.8101 | 1040 | 0.5025 | 0.8535 | 0.0 | 0.9000 | 0.8535 | -39.4845 | -21.5354 | -3.5547 | -3.4339 |
| 0.1425 | 5.9218 | 1060 | 0.4979 | 0.8644 | 0.0 | 0.9000 | 0.8644 | -39.4574 | -21.4261 | -3.5476 | -3.4238 |
| 0.0962 | 6.0335 | 1080 | 0.5070 | 0.8577 | 0.0 | 0.9000 | 0.8577 | -39.7203 | -21.4932 | -3.5502 | -3.4286 |
| 0.0939 | 6.1453 | 1100 | 0.5194 | 0.8028 | 0.0 | 0.9000 | 0.8028 | -40.2542 | -22.0425 | -3.5441 | -3.4206 |
| 0.1183 | 6.2570 | 1120 | 0.5260 | 0.8047 | 0.0 | 0.9000 | 0.8047 | -40.7338 | -22.0233 | -3.5434 | -3.4186 |
| 0.134 | 6.3687 | 1140 | 0.5284 | 0.7928 | 0.0 | 0.9000 | 0.7928 | -40.4894 | -22.1415 | -3.5363 | -3.4121 |
| 0.1158 | 6.4804 | 1160 | 0.5297 | 0.8030 | 0.0 | 0.875 | 0.8030 | -40.8462 | -22.0401 | -3.5343 | -3.4097 |
| 0.1146 | 6.5922 | 1180 | 0.5282 | 0.7974 | 0.0 | 0.875 | 0.7974 | -40.7946 | -22.0961 | -3.5395 | -3.4171 |
| 0.1021 | 6.7039 | 1200 | 0.5285 | 0.8007 | 0.0 | 0.9000 | 0.8007 | -40.5579 | -22.0631 | -3.5420 | -3.4198 |
| 0.1042 | 6.8156 | 1220 | 0.5341 | 0.7938 | 0.0 | 0.875 | 0.7938 | -40.9048 | -22.1316 | -3.5360 | -3.4120 |
| 0.1573 | 6.9274 | 1240 | 0.5307 | 0.8025 | 0.0 | 0.875 | 0.8025 | -40.9753 | -22.0449 | -3.5377 | -3.4141 |
| 0.1073 | 7.0391 | 1260 | 0.5314 | 0.8157 | 0.0 | 0.875 | 0.8157 | -40.6665 | -21.9129 | -3.5336 | -3.4091 |
| 0.1295 | 7.1508 | 1280 | 0.5355 | 0.7990 | 0.0 | 0.875 | 0.7990 | -40.7752 | -22.0800 | -3.5337 | -3.4090 |
| 0.0971 | 7.2626 | 1300 | 0.5361 | 0.7889 | 0.0 | 0.875 | 0.7889 | -41.1770 | -22.1811 | -3.5350 | -3.4111 |
| 0.103 | 7.3743 | 1320 | 0.5375 | 0.7940 | 0.0 | 0.9000 | 0.7940 | -41.0920 | -22.1303 | -3.5350 | -3.4107 |
| 0.1412 | 7.4860 | 1340 | 0.5357 | 0.8072 | 0.0 | 0.9000 | 0.8072 | -41.1868 | -21.9983 | -3.5389 | -3.4164 |
| 0.1338 | 7.5978 | 1360 | 0.5380 | 0.7942 | 0.0 | 0.9000 | 0.7942 | -41.1676 | -22.1284 | -3.5333 | -3.4091 |
| 0.0997 | 7.7095 | 1380 | 0.5406 | 0.7809 | 0.0 | 0.875 | 0.7809 | -41.1453 | -22.2606 | -3.5312 | -3.4058 |
| 0.1268 | 7.8212 | 1400 | 0.5390 | 0.8027 | 0.0 | 0.9000 | 0.8027 | -41.0950 | -22.0435 | -3.5375 | -3.4145 |
| 0.1248 | 7.9330 | 1420 | 0.5388 | 0.7945 | 0.0 | 0.9000 | 0.7945 | -41.1839 | -22.1253 | -3.5337 | -3.4094 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu121
- Datasets 3.5.0
- Tokenizers 0.20.3
- Downloads last month
- 6