LBK95's picture
End of training
c3d4053 verified
metadata
library_name: peft
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V7
    results: []

Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V7

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4540
  • Rewards/chosen: -1.9538
  • Rewards/rejected: -2.8279
  • Rewards/accuracies: 0.9000
  • Rewards/margins: 0.8741
  • Logps/rejected: -178.4166
  • Logps/chosen: -140.8054
  • Logits/rejected: -0.0659
  • Logits/chosen: -0.0598

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6759 0.3023 60 0.6970 0.0373 0.0397 0.6000 -0.0024 -149.7405 -120.8940 0.4429 0.4532
0.6811 0.6045 120 0.6723 -0.0412 -0.0677 0.5 0.0265 -150.8149 -121.6795 0.4688 0.4793
0.5824 0.9068 180 0.6747 0.0390 -0.0060 0.8000 0.0450 -150.1981 -120.8773 0.4537 0.4631
0.3049 1.2091 240 0.5606 -0.3769 -0.6960 0.7000 0.3191 -157.0981 -125.0365 0.3873 0.3966
0.3915 1.5113 300 0.5289 -0.4550 -0.8493 0.9000 0.3943 -158.6304 -125.8171 0.3314 0.3395
0.476 1.8136 360 0.5109 -0.7144 -1.1970 0.9000 0.4826 -162.1081 -128.4113 0.2160 0.2235
0.1137 2.1159 420 0.5121 -1.1098 -1.6334 0.8000 0.5236 -166.4716 -132.3654 0.0934 0.1001
0.3063 2.4181 480 0.4482 -1.9206 -2.8102 0.9000 0.8895 -178.2394 -140.4735 -0.0433 -0.0370
0.2409 2.7204 540 0.4540 -1.9538 -2.8279 0.9000 0.8741 -178.4166 -140.8054 -0.0659 -0.0598

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3