---
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: llama2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V4
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V4

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2125
- Rewards/chosen: -3.3104
- Rewards/rejected: -2.9319
- Rewards/accuracies: 0.4167
- Rewards/margins: -0.3786
- Logps/rejected: -192.9225
- Logps/chosen: -170.2794
- Logits/rejected: 0.1199
- Logits/chosen: 0.1595

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6179        | 0.3027 | 79   | 0.7115          | -0.1031        | -0.0593          | 0.25               | -0.0438         | -164.1966      | -138.2057    | 0.5429          | 0.5748        |
| 0.6065        | 0.6054 | 158  | 0.7348          | -0.0751        | 0.0129           | 0.25               | -0.0879         | -163.4753      | -137.9259    | 0.5242          | 0.5565        |
| 0.621         | 0.9080 | 237  | 0.7932          | -0.0433        | 0.1366           | 0.5                | -0.1800         | -162.2375      | -137.6083    | 0.4932          | 0.5259        |
| 0.4714        | 1.2107 | 316  | 0.7928          | -0.6963        | -0.5927          | 0.5                | -0.1037         | -169.5308      | -144.1387    | 0.4698          | 0.5037        |
| 0.3829        | 1.5134 | 395  | 0.8637          | -1.6604        | -1.5528          | 0.3333             | -0.1075         | -179.1323      | -153.7787    | 0.3664          | 0.4026        |
| 0.3589        | 1.8161 | 474  | 0.9222          | -1.4397        | -1.1360          | 0.25               | -0.3037         | -174.9637      | -151.5720    | 0.3400          | 0.3770        |
| 0.2138        | 2.1188 | 553  | 0.9860          | -1.9991        | -1.6486          | 0.3333             | -0.3505         | -180.0903      | -157.1666    | 0.2605          | 0.2992        |
| 0.0437        | 2.4215 | 632  | 1.1781          | -3.1628        | -2.7961          | 0.4167             | -0.3666         | -191.5652      | -168.8030    | 0.1441          | 0.1838        |
| 0.1667        | 2.7241 | 711  | 1.2125          | -3.3104        | -2.9319          | 0.4167             | -0.3786         | -192.9225      | -170.2794    | 0.1199          | 0.1595        |


### Framework versions

- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.19.1