LBK95's picture
End of training
7ba3d99 verified
metadata
library_name: peft
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V4
    results: []

Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V4

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6849
  • Rewards/chosen: -1.3255
  • Rewards/rejected: -1.6674
  • Rewards/accuracies: 0.5
  • Rewards/margins: 0.3419
  • Logps/rejected: -134.9012
  • Logps/chosen: -95.5139
  • Logits/rejected: 0.0033
  • Logits/chosen: 0.1072

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7204 0.3043 63 0.6808 0.0801 0.0519 0.7000 0.0281 -117.7079 -81.4587 0.4903 0.5915
0.6989 0.6087 126 0.6930 0.0550 0.0726 0.6000 -0.0176 -117.5013 -81.7093 0.4748 0.5762
0.6896 0.9130 189 0.6579 0.1170 0.0536 0.5 0.0633 -117.6909 -81.0896 0.4569 0.5574
0.3332 1.2174 252 0.6831 -0.2141 -0.2394 0.5 0.0253 -120.6211 -84.4000 0.3842 0.4834
0.3687 1.5217 315 0.7069 -0.6436 -0.7406 0.5 0.0970 -125.6332 -88.6952 0.2816 0.3799
0.2083 1.8261 378 0.6389 -0.4156 -0.5567 0.5 0.1411 -123.7943 -86.4158 0.2329 0.3317
0.1191 2.1304 441 0.6451 -0.8600 -1.1248 0.5 0.2648 -129.4748 -90.8590 0.1067 0.2079
0.1435 2.4348 504 0.6878 -1.2620 -1.5788 0.5 0.3168 -134.0153 -94.8793 0.0284 0.1320
0.0848 2.7391 567 0.6849 -1.3255 -1.6674 0.5 0.3419 -134.9012 -95.5139 0.0033 0.1072

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3