Edit model card

smollm-1.7b-instruct-simpo-v2

This model is a fine-tuned version of HuggingFaceTB/SmolLM-1.7B-Instruct on the BAAI/Infinity-Preference dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0877
  • Rewards/chosen: -22.8949
  • Rewards/rejected: -24.4444
  • Rewards/accuracies: 0.6300
  • Rewards/margins: 1.5495
  • Logps/rejected: -2.4444
  • Logps/chosen: -2.2895
  • Logits/rejected: -2.4913
  • Logits/chosen: -2.3131

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
3.2871 0.0135 400 3.4379 -16.5537 -16.5135 0.4700 -0.0402 -1.6513 -1.6554 -0.7019 -0.7007
3.4746 0.0270 800 3.4370 -16.5561 -16.5146 0.4700 -0.0415 -1.6515 -1.6556 -0.7002 -0.6988
2.8856 0.0404 1200 3.4399 -16.5623 -16.5160 0.4700 -0.0464 -1.6516 -1.6562 -0.6997 -0.6984
3.8819 0.0539 1600 3.4374 -16.5639 -16.5248 0.4700 -0.0391 -1.6525 -1.6564 -0.7012 -0.6998
3.622 0.0674 2000 3.4319 -16.5838 -16.5551 0.4700 -0.0288 -1.6555 -1.6584 -0.7089 -0.7069
3.6924 0.0809 2400 3.4273 -16.6109 -16.5901 0.4700 -0.0208 -1.6590 -1.6611 -0.7032 -0.7007
3.0591 0.0944 2800 3.4161 -16.6863 -16.6979 0.4600 0.0117 -1.6698 -1.6686 -0.7295 -0.7253
3.4937 0.1079 3200 3.4013 -16.7982 -16.8590 0.4700 0.0608 -1.6859 -1.6798 -0.7483 -0.7412
3.1565 0.1213 3600 3.3852 -16.8542 -16.9385 0.4700 0.0843 -1.6939 -1.6854 -0.7618 -0.7526
2.7504 0.1348 4000 3.3711 -16.9128 -17.0175 0.4800 0.1047 -1.7018 -1.6913 -0.7684 -0.7574
3.0312 0.1483 4400 3.3606 -16.9720 -17.0910 0.4900 0.1190 -1.7091 -1.6972 -0.7754 -0.7629
4.145 0.1618 4800 3.3407 -17.0816 -17.2375 0.5100 0.1559 -1.7238 -1.7082 -0.7902 -0.7746
3.9514 0.1753 5200 3.3126 -17.1952 -17.3924 0.5100 0.1972 -1.7392 -1.7195 -0.8201 -0.8001
2.4942 0.1887 5600 3.2864 -17.2731 -17.4955 0.5100 0.2223 -1.7495 -1.7273 -0.8187 -0.7960
2.6757 0.2022 6000 3.2615 -17.3603 -17.6063 0.5200 0.2460 -1.7606 -1.7360 -0.7977 -0.7735
2.8576 0.2157 6400 3.2382 -17.5060 -17.8132 0.5500 0.3072 -1.7813 -1.7506 -0.8562 -0.8260
3.7483 0.2292 6800 3.2140 -17.5965 -17.9376 0.5700 0.3411 -1.7938 -1.7596 -0.8751 -0.8407
3.5349 0.2427 7200 3.2035 -17.6663 -18.0193 0.5800 0.3530 -1.8019 -1.7666 -0.8780 -0.8417
2.0604 0.2562 7600 3.1925 -17.7393 -18.1045 0.6100 0.3652 -1.8104 -1.7739 -0.9017 -0.8602
5.7031 0.2696 8000 3.1672 -18.0175 -18.4936 0.6100 0.4760 -1.8494 -1.8018 -0.9982 -0.9467
2.6005 0.2831 8400 3.1475 -18.1162 -18.6283 0.6100 0.5121 -1.8628 -1.8116 -1.0732 -1.0161
1.9787 0.2966 8800 3.1226 -18.3260 -18.9198 0.6100 0.5938 -1.8920 -1.8326 -1.1691 -1.1062
2.8347 0.3101 9200 3.1156 -18.4632 -19.0934 0.6100 0.6301 -1.9093 -1.8463 -1.2592 -1.1910
2.701 0.3236 9600 3.1022 -18.5083 -19.1346 0.6100 0.6264 -1.9135 -1.8508 -1.2785 -1.2073
3.772 0.3371 10000 3.0772 -18.5843 -19.2491 0.6100 0.6649 -1.9249 -1.8584 -1.3345 -1.2587
2.7414 0.3505 10400 3.0551 -18.8305 -19.5946 0.6100 0.7641 -1.9595 -1.8830 -1.3824 -1.3004
2.0287 0.3640 10800 3.0534 -18.9934 -19.7985 0.6200 0.8051 -1.9798 -1.8993 -1.4355 -1.3467
1.0473 0.3775 11200 3.0528 -19.1581 -19.9858 0.6100 0.8277 -1.9986 -1.9158 -1.5109 -1.4173
2.8106 0.3910 11600 3.0436 -19.1763 -19.9989 0.6100 0.8226 -1.9999 -1.9176 -1.5138 -1.4206
3.0344 0.4045 12000 3.0333 -19.2526 -20.1079 0.6100 0.8553 -2.0108 -1.9253 -1.5628 -1.4657
2.1886 0.4179 12400 3.0187 -19.4500 -20.3818 0.6300 0.9318 -2.0382 -1.9450 -1.6246 -1.5217
4.1181 0.4314 12800 3.0086 -19.6204 -20.6104 0.6300 0.9900 -2.0610 -1.9620 -1.6886 -1.5818
1.6647 0.4449 13200 3.0126 -19.7773 -20.7949 0.6300 1.0176 -2.0795 -1.9777 -1.7307 -1.6181
4.8533 0.4584 13600 3.0012 -19.9001 -20.9633 0.6300 1.0632 -2.0963 -1.9900 -1.7437 -1.6288
2.9945 0.4719 14000 3.0071 -19.9831 -21.0361 0.6300 1.0529 -2.1036 -1.9983 -1.7839 -1.6667
2.9377 0.4854 14400 2.9946 -20.1165 -21.2172 0.6400 1.1007 -2.1217 -2.0117 -1.8386 -1.7178
2.7856 0.4988 14800 2.9908 -20.2830 -21.4151 0.6300 1.1322 -2.1415 -2.0283 -1.8720 -1.7468
4.9446 0.5123 15200 2.9905 -20.4144 -21.5669 0.6300 1.1525 -2.1567 -2.0414 -1.9057 -1.7760
3.2834 0.5258 15600 2.9858 -20.4428 -21.5993 0.6300 1.1565 -2.1599 -2.0443 -1.8928 -1.7633
1.8705 0.5393 16000 2.9888 -20.5922 -21.7774 0.6300 1.1853 -2.1777 -2.0592 -1.9340 -1.8009
4.0587 0.5528 16400 2.9925 -20.8812 -22.1359 0.6300 1.2547 -2.2136 -2.0881 -2.0019 -1.8627
3.0706 0.5662 16800 2.9946 -21.1005 -22.4176 0.6300 1.3171 -2.2418 -2.1101 -2.0533 -1.9104
3.152 0.5797 17200 2.9916 -21.2937 -22.6723 0.6200 1.3786 -2.2672 -2.1294 -2.1094 -1.9627
1.8856 0.5932 17600 2.9847 -21.2727 -22.6463 0.6200 1.3736 -2.2646 -2.1273 -2.1108 -1.9637
1.1291 0.6067 18000 2.9981 -21.5313 -22.9507 0.6200 1.4194 -2.2951 -2.1531 -2.1736 -2.0212
2.9894 0.6202 18400 3.0033 -21.6191 -23.0276 0.6200 1.4085 -2.3028 -2.1619 -2.2089 -2.0543
3.497 0.6337 18800 3.0252 -21.8198 -23.2426 0.6200 1.4228 -2.3243 -2.1820 -2.2285 -2.0714
3.18 0.6471 19200 3.0307 -21.8887 -23.3005 0.6200 1.4117 -2.3300 -2.1889 -2.2462 -2.0862
1.9522 0.6606 19600 3.0391 -21.9179 -23.3214 0.6300 1.4035 -2.3321 -2.1918 -2.2476 -2.0875
2.4878 0.6741 20000 3.0431 -22.1021 -23.5543 0.6300 1.4522 -2.3554 -2.2102 -2.2969 -2.1333
2.3506 0.6876 20400 3.0453 -22.2379 -23.7220 0.6300 1.4841 -2.3722 -2.2238 -2.3258 -2.1603
3.9719 0.7011 20800 3.0591 -22.2718 -23.7317 0.6300 1.4599 -2.3732 -2.2272 -2.3263 -2.1600
1.4942 0.7146 21200 3.0574 -22.3226 -23.8044 0.6300 1.4819 -2.3804 -2.2323 -2.3352 -2.1680
0.8797 0.7280 21600 3.0616 -22.3419 -23.8235 0.6300 1.4816 -2.3823 -2.2342 -2.3394 -2.1721
2.8176 0.7415 22000 3.0751 -22.4788 -23.9643 0.6300 1.4855 -2.3964 -2.2479 -2.3767 -2.2073
3.3744 0.7550 22400 3.0775 -22.6028 -24.1137 0.6300 1.5109 -2.4114 -2.2603 -2.4146 -2.2423
1.9708 0.7685 22800 3.0768 -22.6249 -24.1479 0.6300 1.5231 -2.4148 -2.2625 -2.4216 -2.2482
2.1589 0.7820 23200 3.0697 -22.6570 -24.1936 0.6300 1.5367 -2.4194 -2.2657 -2.4323 -2.2591
3.0872 0.7954 23600 3.0813 -22.7174 -24.2489 0.6300 1.5315 -2.4249 -2.2717 -2.4430 -2.2683
3.9705 0.8089 24000 3.0806 -22.7644 -24.3076 0.6300 1.5432 -2.4308 -2.2764 -2.4598 -2.2840
3.5691 0.8224 24400 3.0807 -22.7627 -24.2931 0.6300 1.5304 -2.4293 -2.2763 -2.4621 -2.2857
1.4467 0.8359 24800 3.0854 -22.8132 -24.3525 0.6300 1.5393 -2.4353 -2.2813 -2.4742 -2.2963
2.7241 0.8494 25200 3.0862 -22.8300 -24.3745 0.6300 1.5445 -2.4375 -2.2830 -2.4770 -2.2988
2.7441 0.8629 25600 3.0866 -22.8450 -24.3876 0.6300 1.5427 -2.4388 -2.2845 -2.4823 -2.3048
1.4801 0.8763 26000 3.0839 -22.8522 -24.4010 0.6300 1.5488 -2.4401 -2.2852 -2.4827 -2.3057
2.5965 0.8898 26400 3.0841 -22.8629 -24.4169 0.6300 1.5540 -2.4417 -2.2863 -2.4877 -2.3095
3.6415 0.9033 26800 3.0893 -22.8830 -24.4340 0.6300 1.5510 -2.4434 -2.2883 -2.4894 -2.3114
2.0584 0.9168 27200 3.0894 -22.8879 -24.4268 0.6300 1.5389 -2.4427 -2.2888 -2.4917 -2.3134
2.5068 0.9303 27600 3.0896 -22.8936 -24.4408 0.6300 1.5472 -2.4441 -2.2894 -2.4922 -2.3134
0.677 0.9437 28000 3.0835 -22.8876 -24.4472 0.6300 1.5596 -2.4447 -2.2888 -2.4919 -2.3134
2.5931 0.9572 28400 3.0875 -22.8938 -24.4419 0.6300 1.5481 -2.4442 -2.2894 -2.4907 -2.3117
4.4413 0.9707 28800 3.0893 -22.8952 -24.4383 0.6300 1.5431 -2.4438 -2.2895 -2.4914 -2.3131
2.7584 0.9842 29200 3.0874 -22.8946 -24.4410 0.6300 1.5464 -2.4441 -2.2895 -2.4894 -2.3112
4.4406 0.9977 29600 3.0877 -22.8949 -24.4444 0.6300 1.5495 -2.4444 -2.2895 -2.4913 -2.3131

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.2.2
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
9
Safetensors
Model size
1.71B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for focuzz8/smollm-1.7b-instruct-simpo-v2

Finetuned
(16)
this model

Dataset used to train focuzz8/smollm-1.7b-instruct-simpo-v2