yakazimir's picture
End of training
74c4548 verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_fUNL_entropy_0_01
    results: []

qwen_fUNL_entropy_0_01

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0504
  • Sft Loss: 4.0281
  • Rewards/chosen: -4.4231
  • Rewards/rejected: -5.1418
  • Rewards/accuracies: 0.6862
  • Rewards/margins: 0.7187
  • Logps/rejected: -5.1418
  • Logps/chosen: -4.4231
  • Logits/rejected: -0.2955
  • Logits/chosen: -0.3687

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Sft Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0548 0.2141 400 0.0557 4.8295 -5.3467 -5.4723 0.5326 0.1256 -5.4723 -5.3467 0.1095 -0.0277
0.0537 0.4282 800 0.0529 4.1330 -4.6614 -4.9903 0.6024 0.3289 -4.9903 -4.6614 0.2188 0.0763
0.0545 0.6422 1200 0.0523 4.2856 -4.6580 -5.0486 0.6350 0.3906 -5.0486 -4.6580 0.0914 -0.0257
0.0518 0.8563 1600 0.0519 4.0636 -4.5007 -4.9176 0.6313 0.4169 -4.9176 -4.5007 0.0782 -0.0290
0.0537 1.0704 2000 0.0517 3.9662 -4.4270 -4.8924 0.6469 0.4654 -4.8924 -4.4270 -0.1550 -0.2400
0.0533 1.2845 2400 0.0514 4.4069 -4.8229 -5.4257 0.6632 0.6028 -5.4257 -4.8229 -0.1556 -0.2460
0.0522 1.4986 2800 0.0511 4.2244 -4.5446 -5.1374 0.6803 0.5928 -5.1374 -4.5446 -0.2984 -0.3849
0.053 1.7127 3200 0.0508 4.1193 -4.4960 -5.1073 0.6691 0.6113 -5.1073 -4.4960 -0.2032 -0.2947
0.0538 1.9267 3600 0.0505 4.0434 -4.4193 -5.0638 0.6847 0.6445 -5.0638 -4.4193 -0.2476 -0.3292
0.0504 2.1408 4000 0.0505 4.0585 -4.4646 -5.1658 0.6840 0.7011 -5.1658 -4.4646 -0.2103 -0.2919
0.053 2.3549 4400 0.0505 4.0905 -4.4767 -5.1722 0.6840 0.6956 -5.1722 -4.4767 -0.2850 -0.3632
0.0525 2.5690 4800 0.0504 4.0700 -4.4483 -5.1426 0.6832 0.6943 -5.1426 -4.4483 -0.1890 -0.2741
0.0509 2.7831 5200 0.0504 4.0135 -4.3932 -5.0993 0.6855 0.7061 -5.0993 -4.3932 -0.1516 -0.2376
0.0504 2.9972 5600 0.0504 4.0281 -4.4231 -5.1418 0.6862 0.7187 -5.1418 -4.4231 -0.2955 -0.3687

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1