LBK95
/

Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V4

+---
+base_model: meta-llama/Llama-2-7b-hf
+library_name: peft
+license: llama2
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V4
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V4
+This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.2125
+- Rewards/chosen: -3.3104
+- Rewards/rejected: -2.9319
+- Rewards/accuracies: 0.4167
+- Rewards/margins: -0.3786
+- Logps/rejected: -192.9225
+- Logps/chosen: -170.2794
+- Logits/rejected: 0.1199
+- Logits/chosen: 0.1595
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 10
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6179        | 0.3027 | 79   | 0.7115          | -0.1031        | -0.0593          | 0.25               | -0.0438         | -164.1966      | -138.2057    | 0.5429          | 0.5748        |
+| 0.6065        | 0.6054 | 158  | 0.7348          | -0.0751        | 0.0129           | 0.25               | -0.0879         | -163.4753      | -137.9259    | 0.5242          | 0.5565        |
+| 0.621         | 0.9080 | 237  | 0.7932          | -0.0433        | 0.1366           | 0.5                | -0.1800         | -162.2375      | -137.6083    | 0.4932          | 0.5259        |
+| 0.4714        | 1.2107 | 316  | 0.7928          | -0.6963        | -0.5927          | 0.5                | -0.1037         | -169.5308      | -144.1387    | 0.4698          | 0.5037        |
+| 0.3829        | 1.5134 | 395  | 0.8637          | -1.6604        | -1.5528          | 0.3333             | -0.1075         | -179.1323      | -153.7787    | 0.3664          | 0.4026        |
+| 0.3589        | 1.8161 | 474  | 0.9222          | -1.4397        | -1.1360          | 0.25               | -0.3037         | -174.9637      | -151.5720    | 0.3400          | 0.3770        |
+| 0.2138        | 2.1188 | 553  | 0.9860          | -1.9991        | -1.6486          | 0.3333             | -0.3505         | -180.0903      | -157.1666    | 0.2605          | 0.2992        |
+| 0.0437        | 2.4215 | 632  | 1.1781          | -3.1628        | -2.7961          | 0.4167             | -0.3666         | -191.5652      | -168.8030    | 0.1441          | 0.1838        |
+| 0.1667        | 2.7241 | 711  | 1.2125          | -3.3104        | -2.9319          | 0.4167             | -0.3786         | -192.9225      | -170.2794    | 0.1199          | 0.1595        |
+### Framework versions
+- PEFT 0.12.0
+- Transformers 4.44.0
+- Pytorch 2.4.0+cu121
+- Datasets 3.1.0
+- Tokenizers 0.19.1