metadata

library_name: peft
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7
    results: []

Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4437
Rewards/chosen: -1.5898
Rewards/rejected: -2.7509
Rewards/accuracies: 0.7000
Rewards/margins: 1.1611
Logps/rejected: -114.1047
Logps/chosen: -92.5540
Logits/rejected: -0.0729
Logits/chosen: -0.0526

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6702	0.2993	66	0.6613	0.0837	-0.0035	0.7000	0.0872	-86.6308	-75.8190	0.3314	0.3469
0.686	0.5986	132	0.5646	0.0172	-0.3322	0.8000	0.3494	-89.9173	-76.4838	0.3494	0.3651
0.7758	0.8980	198	0.5747	0.0543	-0.2153	0.9000	0.2696	-88.7488	-76.1133	0.3694	0.3845
0.6695	1.1973	264	0.5693	-0.2661	-0.6699	0.7000	0.4038	-93.2946	-79.3173	0.3321	0.3466
0.5453	1.4966	330	0.5472	-0.6038	-1.1332	0.6000	0.5294	-97.9278	-82.6945	0.2266	0.2424
0.5922	1.7959	396	0.5142	-0.9005	-1.6462	0.6000	0.7457	-103.0579	-85.6614	0.1303	0.1477
0.2128	2.0952	462	0.4825	-1.1082	-1.9752	0.8000	0.8670	-106.3474	-87.7384	0.0713	0.0898
0.1372	2.3946	528	0.4425	-1.4160	-2.5347	0.8000	1.1187	-111.9428	-90.8164	-0.0224	-0.0028
0.3622	2.6939	594	0.4437	-1.5113	-2.6570	0.8000	1.1457	-113.1660	-91.7698	-0.0636	-0.0435
0.1555	2.9932	660	0.4437	-1.5898	-2.7509	0.7000	1.1611	-114.1047	-92.5540	-0.0729	-0.0526

Framework versions

PEFT 0.12.0
Transformers 4.45.2
Pytorch 2.4.0+cu121
Datasets 3.2.0
Tokenizers 0.20.3