LBK95's picture
End of training
ac5a987 verified
metadata
library_name: peft
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V2
    results: []

Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9352
  • Rewards/chosen: -1.7573
  • Rewards/rejected: -1.5576
  • Rewards/accuracies: 0.5
  • Rewards/margins: -0.1997
  • Logps/rejected: -110.7931
  • Logps/chosen: -134.9680
  • Logits/rejected: 0.0983
  • Logits/chosen: 0.0721

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7029 0.3026 77 0.6933 -0.0162 -0.0163 0.3333 0.0001 -95.3805 -117.5575 0.5079 0.4933
0.6605 0.6051 154 0.6804 0.0594 0.0318 0.6667 0.0276 -94.8997 -116.8017 0.4988 0.4837
0.6291 0.9077 231 0.6684 0.2040 0.1302 0.75 0.0738 -93.9156 -115.3556 0.4931 0.4757
0.3149 1.2102 308 0.6806 -0.2081 -0.3152 0.5833 0.1071 -98.3691 -119.4764 0.4810 0.4619
0.3251 1.5128 385 0.7502 -0.4333 -0.4100 0.5833 -0.0233 -99.3170 -121.7279 0.4258 0.4057
0.2002 1.8153 462 0.8816 -1.2398 -1.0499 0.5 -0.1899 -105.7162 -129.7932 0.3036 0.2813
0.0182 2.1179 539 0.9166 -1.4380 -1.2371 0.5 -0.2010 -107.5881 -131.7757 0.1946 0.1703
0.2002 2.4204 616 0.9190 -1.5677 -1.4004 0.5 -0.1673 -109.2209 -133.0719 0.1338 0.1085
0.1982 2.7230 693 0.9352 -1.7573 -1.5576 0.5 -0.1997 -110.7931 -134.9680 0.0983 0.0721

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3