genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch8.0_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5376
  • Rewards/chosen: 0.7846
  • Rewards/rejected: 0.0
  • Rewards/accuracies: 0.875
  • Rewards/margins: 0.7846
  • Logps/rejected: -41.1047
  • Logps/chosen: -22.2238
  • Logits/rejected: -3.5338
  • Logits/chosen: -3.4095

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 8.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6894 0.1117 20 0.6905 0.0099 0.0 0.5 0.0099 -41.5868 -29.9713 -2.2354 -2.3818
0.64 0.2235 40 0.6394 0.1094 0.0 0.9750 0.1094 -40.4877 -28.9760 -2.3022 -2.4396
0.5438 0.3352 60 0.5322 0.3576 0.0 1.0 0.3576 -37.7918 -26.4941 -2.5104 -2.6190
0.4281 0.4469 80 0.4239 0.6785 0.0 0.9750 0.6785 -34.2644 -23.2855 -2.8776 -2.9361
0.3182 0.5587 100 0.3871 0.8410 0.0 0.9750 0.8410 -33.0697 -21.6598 -3.0829 -3.1073
0.3615 0.6704 120 0.3660 0.9449 0.0 0.9750 0.9449 -32.1668 -20.6208 -3.2298 -3.2239
0.3887 0.7821 140 0.3573 0.9953 0.0 0.9750 0.9953 -31.6459 -20.1170 -3.2462 -3.2391
0.3456 0.8939 160 0.3513 1.0109 0.0 0.9750 1.0109 -31.5973 -19.9609 -3.2623 -3.2451
0.3038 1.0056 180 0.3480 1.0406 0.0 0.9750 1.0406 -31.4452 -19.6639 -3.2867 -3.2693
0.2996 1.1173 200 0.3482 1.0503 0.0 0.9750 1.0503 -31.2129 -19.5670 -3.3491 -3.3129
0.2947 1.2291 220 0.3455 1.0613 0.0 0.9750 1.0613 -31.1605 -19.4574 -3.3765 -3.3360
0.2807 1.3408 240 0.3446 1.0742 0.0 0.9750 1.0742 -31.1609 -19.3284 -3.3844 -3.3389
0.2892 1.4525 260 0.3436 1.0974 0.0 0.9750 1.0974 -31.2525 -19.0959 -3.4174 -3.3675
0.2755 1.5642 280 0.3415 1.0894 0.0 0.9750 1.0894 -31.1768 -19.1764 -3.3915 -3.3449
0.3311 1.6760 300 0.3440 1.0970 0.0 0.9750 1.0970 -31.1934 -19.1000 -3.4187 -3.3663
0.308 1.7877 320 0.3380 1.0940 0.0 0.9750 1.0940 -31.0993 -19.1305 -3.3899 -3.3421
0.3006 1.8994 340 0.3376 1.1069 0.0 0.9750 1.1069 -31.1959 -19.0009 -3.4089 -3.3610
0.235 2.0112 360 0.3396 1.1140 0.0 0.9750 1.1140 -31.1403 -18.9303 -3.4337 -3.3788
0.2266 2.1229 380 0.3574 1.1053 0.0 0.9000 1.1053 -32.1879 -19.0168 -3.4916 -3.4137
0.201 2.2346 400 0.3570 1.1130 0.0 0.9250 1.1130 -31.9785 -18.9396 -3.4926 -3.4119
0.2431 2.3464 420 0.3618 1.1126 0.0 0.9250 1.1126 -32.5464 -18.9440 -3.5039 -3.4203
0.1932 2.4581 440 0.3558 1.1157 0.0 0.9250 1.1157 -32.1669 -18.9132 -3.4950 -3.4113
0.2296 2.5698 460 0.3598 1.1063 0.0 0.9250 1.1063 -32.2979 -19.0069 -3.5134 -3.4279
0.212 2.6816 480 0.3591 1.1226 0.0 0.9250 1.1226 -32.1543 -18.8444 -3.5250 -3.4411
0.1894 2.7933 500 0.3594 1.1202 0.0 0.9250 1.1202 -32.3537 -18.8678 -3.5194 -3.4356
0.2067 2.9050 520 0.3609 1.1099 0.0 0.9500 1.1099 -32.5710 -18.9707 -3.5225 -3.4337
0.1881 3.0168 540 0.3589 1.1291 0.0 0.9250 1.1291 -32.5229 -18.7786 -3.5296 -3.4387
0.17 3.1285 560 0.3970 1.0512 0.0 0.9000 1.0512 -34.5200 -19.5579 -3.5542 -3.4475
0.1873 3.2402 580 0.3993 1.0404 0.0 0.9000 1.0404 -34.5039 -19.6665 -3.5730 -3.4683
0.1644 3.3520 600 0.3881 1.0645 0.0 0.9000 1.0645 -33.7163 -19.4250 -3.5589 -3.4537
0.1464 3.4637 620 0.4029 1.0201 0.0 0.9000 1.0201 -34.4937 -19.8686 -3.5646 -3.4565
0.1617 3.5754 640 0.4082 0.9922 0.0 0.9000 0.9922 -34.7634 -20.1483 -3.5579 -3.4499
0.18 3.6872 660 0.4077 0.9961 0.0 0.9000 0.9961 -34.9782 -20.1087 -3.5564 -3.4476
0.2045 3.7989 680 0.4052 1.0121 0.0 0.9000 1.0121 -34.9717 -19.9491 -3.5722 -3.4640
0.142 3.9106 700 0.4050 1.0185 0.0 0.9000 1.0185 -35.1007 -19.8855 -3.5645 -3.4541
0.1421 4.0223 720 0.4067 1.0457 0.0 0.9000 1.0457 -35.2827 -19.6135 -3.5662 -3.4560
0.1318 4.1341 740 0.4443 0.9731 0.0 0.9000 0.9731 -36.9992 -20.3389 -3.5594 -3.4419
0.1356 4.2458 760 0.4473 0.9657 0.0 0.9000 0.9657 -37.2294 -20.4132 -3.5715 -3.4548
0.1253 4.3575 780 0.4582 0.9353 0.0 0.9000 0.9353 -37.5374 -20.7167 -3.5529 -3.4340
0.1686 4.4693 800 0.4477 0.9578 0.0 0.9000 0.9578 -37.2330 -20.4917 -3.5556 -3.4378
0.1635 4.5810 820 0.4505 0.9534 0.0 0.9000 0.9534 -37.2043 -20.5356 -3.5566 -3.4383
0.1932 4.6927 840 0.4553 0.9423 0.0 0.9000 0.9423 -37.5706 -20.6470 -3.5602 -3.4423
0.1662 4.8045 860 0.4572 0.9450 0.0 0.9000 0.9450 -37.4698 -20.6204 -3.5555 -3.4374
0.1418 4.9162 880 0.4498 0.9483 0.0 0.9000 0.9483 -36.9635 -20.5873 -3.5514 -3.4320
0.1116 5.0279 900 0.4688 0.9130 0.0 0.9000 0.9130 -37.8485 -20.9396 -3.5580 -3.4391
0.1144 5.1397 920 0.4856 0.8822 0.0 0.9000 0.8822 -38.4848 -21.2482 -3.5577 -3.4373
0.1212 5.2514 940 0.4955 0.8696 0.0 0.9000 0.8696 -39.3209 -21.3736 -3.5563 -3.4348
0.1241 5.3631 960 0.4987 0.8617 0.0 0.9000 0.8617 -39.4275 -21.4527 -3.5473 -3.4247
0.121 5.4749 980 0.4950 0.8748 0.0 0.9000 0.8748 -39.2344 -21.3220 -3.5495 -3.4283
0.1258 5.5866 1000 0.4909 0.8913 0.0 0.9000 0.8913 -39.3551 -21.1572 -3.5464 -3.4257
0.1172 5.6983 1020 0.4984 0.8604 0.0 0.9000 0.8604 -39.5203 -21.4664 -3.5465 -3.4245
0.1481 5.8101 1040 0.5025 0.8535 0.0 0.9000 0.8535 -39.4845 -21.5354 -3.5547 -3.4339
0.1425 5.9218 1060 0.4979 0.8644 0.0 0.9000 0.8644 -39.4574 -21.4261 -3.5476 -3.4238
0.0962 6.0335 1080 0.5070 0.8577 0.0 0.9000 0.8577 -39.7203 -21.4932 -3.5502 -3.4286
0.0939 6.1453 1100 0.5194 0.8028 0.0 0.9000 0.8028 -40.2542 -22.0425 -3.5441 -3.4206
0.1183 6.2570 1120 0.5260 0.8047 0.0 0.9000 0.8047 -40.7338 -22.0233 -3.5434 -3.4186
0.134 6.3687 1140 0.5284 0.7928 0.0 0.9000 0.7928 -40.4894 -22.1415 -3.5363 -3.4121
0.1158 6.4804 1160 0.5297 0.8030 0.0 0.875 0.8030 -40.8462 -22.0401 -3.5343 -3.4097
0.1146 6.5922 1180 0.5282 0.7974 0.0 0.875 0.7974 -40.7946 -22.0961 -3.5395 -3.4171
0.1021 6.7039 1200 0.5285 0.8007 0.0 0.9000 0.8007 -40.5579 -22.0631 -3.5420 -3.4198
0.1042 6.8156 1220 0.5341 0.7938 0.0 0.875 0.7938 -40.9048 -22.1316 -3.5360 -3.4120
0.1573 6.9274 1240 0.5307 0.8025 0.0 0.875 0.8025 -40.9753 -22.0449 -3.5377 -3.4141
0.1073 7.0391 1260 0.5314 0.8157 0.0 0.875 0.8157 -40.6665 -21.9129 -3.5336 -3.4091
0.1295 7.1508 1280 0.5355 0.7990 0.0 0.875 0.7990 -40.7752 -22.0800 -3.5337 -3.4090
0.0971 7.2626 1300 0.5361 0.7889 0.0 0.875 0.7889 -41.1770 -22.1811 -3.5350 -3.4111
0.103 7.3743 1320 0.5375 0.7940 0.0 0.9000 0.7940 -41.0920 -22.1303 -3.5350 -3.4107
0.1412 7.4860 1340 0.5357 0.8072 0.0 0.9000 0.8072 -41.1868 -21.9983 -3.5389 -3.4164
0.1338 7.5978 1360 0.5380 0.7942 0.0 0.9000 0.7942 -41.1676 -22.1284 -3.5333 -3.4091
0.0997 7.7095 1380 0.5406 0.7809 0.0 0.875 0.7809 -41.1453 -22.2606 -3.5312 -3.4058
0.1268 7.8212 1400 0.5390 0.8027 0.0 0.9000 0.8027 -41.0950 -22.0435 -3.5375 -3.4145
0.1248 7.9330 1420 0.5388 0.7945 0.0 0.9000 0.7945 -41.1839 -22.1253 -3.5337 -3.4094

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.20.3
Downloads last month
6
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch8.0_42

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1323)
this model