LBK95
/

Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7

+---
+library_name: peft
+license: llama2
+base_model: meta-llama/Llama-2-7b-hf
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7
+This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.4437
+- Rewards/chosen: -1.5898
+- Rewards/rejected: -2.7509
+- Rewards/accuracies: 0.7000
+- Rewards/margins: 1.1611
+- Logps/rejected: -114.1047
+- Logps/chosen: -92.5540
+- Logits/rejected: -0.0729
+- Logits/chosen: -0.0526
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 10
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6702        | 0.2993 | 66   | 0.6613          | 0.0837         | -0.0035          | 0.7000             | 0.0872          | -86.6308       | -75.8190     | 0.3314          | 0.3469        |
+| 0.686         | 0.5986 | 132  | 0.5646          | 0.0172         | -0.3322          | 0.8000             | 0.3494          | -89.9173       | -76.4838     | 0.3494          | 0.3651        |
+| 0.7758        | 0.8980 | 198  | 0.5747          | 0.0543         | -0.2153          | 0.9000             | 0.2696          | -88.7488       | -76.1133     | 0.3694          | 0.3845        |
+| 0.6695        | 1.1973 | 264  | 0.5693          | -0.2661        | -0.6699          | 0.7000             | 0.4038          | -93.2946       | -79.3173     | 0.3321          | 0.3466        |
+| 0.5453        | 1.4966 | 330  | 0.5472          | -0.6038        | -1.1332          | 0.6000             | 0.5294          | -97.9278       | -82.6945     | 0.2266          | 0.2424        |
+| 0.5922        | 1.7959 | 396  | 0.5142          | -0.9005        | -1.6462          | 0.6000             | 0.7457          | -103.0579      | -85.6614     | 0.1303          | 0.1477        |
+| 0.2128        | 2.0952 | 462  | 0.4825          | -1.1082        | -1.9752          | 0.8000             | 0.8670          | -106.3474      | -87.7384     | 0.0713          | 0.0898        |
+| 0.1372        | 2.3946 | 528  | 0.4425          | -1.4160        | -2.5347          | 0.8000             | 1.1187          | -111.9428      | -90.8164     | -0.0224         | -0.0028       |
+| 0.3622        | 2.6939 | 594  | 0.4437          | -1.5113        | -2.6570          | 0.8000             | 1.1457          | -113.1660      | -91.7698     | -0.0636         | -0.0435       |
+| 0.1555        | 2.9932 | 660  | 0.4437          | -1.5898        | -2.7509          | 0.7000             | 1.1611          | -114.1047      | -92.5540     | -0.0729         | -0.0526       |
+### Framework versions
+- PEFT 0.12.0
+- Transformers 4.45.2
+- Pytorch 2.4.0+cu121
+- Datasets 3.2.0
+- Tokenizers 0.20.3