LBK95's picture
End of training
5dafc25 verified
metadata
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - trl
  - dpo
  - generated_from_trainer
library_name: peft
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V1
    results: []

Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2865
  • Rewards/chosen: -2.3072
  • Rewards/rejected: -1.8542
  • Rewards/accuracies: 0.5
  • Rewards/margins: -0.4531
  • Logps/rejected: -121.6291
  • Logps/chosen: -184.9154
  • Logits/rejected: 0.2000
  • Logits/chosen: 0.1668

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7102 0.3017 70 0.6900 0.0340 0.0258 0.7000 0.0082 -102.8295 -161.5030 0.6083 0.5744
0.7024 0.6034 140 0.7276 0.0806 0.1382 0.3000 -0.0575 -101.7058 -161.0370 0.6015 0.5681
0.6653 0.9052 210 0.7362 0.0303 0.0739 0.4000 -0.0435 -102.3490 -161.5399 0.6196 0.5858
0.488 1.2069 280 0.8450 -0.5919 -0.4398 0.4000 -0.1521 -107.4859 -167.7624 0.5482 0.5148
0.5839 1.5086 350 0.8971 -0.8481 -0.6497 0.4000 -0.1984 -109.5843 -170.3242 0.5183 0.4846
0.503 1.8103 420 1.0273 -1.1487 -0.8225 0.4000 -0.3262 -111.3127 -173.3304 0.4207 0.3883
0.2083 2.1121 490 1.1693 -1.6401 -1.2436 0.4000 -0.3965 -115.5236 -178.2447 0.2902 0.2576
0.1395 2.4138 560 1.2310 -2.1881 -1.7991 0.6000 -0.3890 -121.0787 -183.7240 0.2345 0.2015
0.1618 2.7155 630 1.2865 -2.3072 -1.8542 0.5 -0.4531 -121.6291 -184.9154 0.2000 0.1668

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1