metadata

library_name: peft
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V7
    results: []

Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V7

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.1519
Rewards/chosen: -2.8728
Rewards/rejected: -2.9359
Rewards/accuracies: 0.4000
Rewards/margins: 0.0631
Logps/rejected: -141.0865
Logps/chosen: -142.4955
Logits/rejected: 0.0425
Logits/chosen: -0.0160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7133	0.3	63	0.6946	0.0868	0.0595	0.6000	0.0274	-111.1324	-112.8988	0.5159	0.4902
0.5044	0.6	126	0.6814	0.2402	0.0924	0.6000	0.1478	-110.8034	-111.3656	0.5007	0.4738
0.6555	0.9	189	0.6392	-0.0496	-0.2815	0.7000	0.2319	-114.5420	-114.2632	0.5375	0.5056
0.2983	1.2	252	0.6671	-0.8670	-1.3823	0.5	0.5153	-125.5504	-122.4372	0.4453	0.4053
0.287	1.5	315	0.6743	-1.0040	-1.5229	0.4000	0.5189	-126.9560	-123.8071	0.3434	0.2980
0.313	1.8	378	0.7727	-1.1663	-1.4516	0.4000	0.2853	-126.2434	-125.4304	0.3244	0.2767
0.1026	2.1	441	0.8556	-1.5616	-1.8026	0.4000	0.2410	-129.7528	-129.3835	0.2187	0.1675
0.1738	2.4	504	1.1593	-2.7915	-2.8593	0.4000	0.0677	-140.3199	-141.6827	0.0630	0.0046
0.2095	2.7	567	1.1725	-2.9060	-2.9579	0.4000	0.0519	-141.3057	-142.8270	0.0427	-0.0158
0.0235	3.0	630	1.1519	-2.8728	-2.9359	0.4000	0.0631	-141.0865	-142.4955	0.0425	-0.0160

Framework versions

PEFT 0.12.0
Transformers 4.45.2
Pytorch 2.4.0+cu121
Datasets 3.2.0
Tokenizers 0.20.3