Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7581
  • Rewards/chosen: -1.7006
  • Rewards/rejected: -1.8759
  • Rewards/accuracies: 0.6000
  • Rewards/margins: 0.1753
  • Logps/rejected: -131.1391
  • Logps/chosen: -96.3973
  • Logits/rejected: -0.0046
  • Logits/chosen: 0.0503

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7096 0.3004 67 0.6970 -0.0217 -0.0183 0.5 -0.0034 -112.5630 -79.6082 0.6024 0.6487
0.6684 0.6009 134 0.6829 -0.0429 -0.0704 0.8000 0.0275 -113.0842 -79.8203 0.5780 0.6246
0.7283 0.9013 201 0.6982 0.0550 0.0616 0.6000 -0.0067 -111.7634 -78.8413 0.5848 0.6319
0.2339 1.2018 268 0.6630 -0.1631 -0.2504 0.7000 0.0873 -114.8840 -81.0225 0.4681 0.5163
0.3526 1.5022 335 0.6523 -0.5545 -0.6837 0.6000 0.1292 -119.2165 -84.9362 0.3518 0.4006
0.2787 1.8027 402 0.6181 -0.4772 -0.6749 0.6000 0.1977 -119.1291 -84.1633 0.3107 0.3615
0.2577 2.1031 469 0.6856 -1.0419 -1.1941 0.5 0.1522 -124.3209 -89.8106 0.1666 0.2190
0.0942 2.4036 536 0.7344 -1.5330 -1.7182 0.6000 0.1852 -129.5615 -94.7212 0.0278 0.0822
0.0952 2.7040 603 0.7581 -1.7006 -1.8759 0.6000 0.1753 -131.1391 -96.3973 -0.0046 0.0503

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3
Downloads last month
15
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7_FAIL_Q2

Adapter
(1771)
this model