LBK95
/

Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V1

Generated from Trainer

Model card Files Files and versions Community

Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V1 / README.md

LBK95's picture

End of training

5dafc25 verified about 1 month ago

|

history blame contribute delete

3.71 kB

	---
	license: llama2
	base_model: meta-llama/Llama-2-7b-hf
	tags:
	- trl
	- dpo
	- generated_from_trainer
	library_name: peft
	model-index:
	- name: Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V1

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.2865
	- Rewards/chosen: -2.3072
	- Rewards/rejected: -1.8542
	- Rewards/accuracies: 0.5
	- Rewards/margins: -0.4531
	- Logps/rejected: -121.6291
	- Logps/chosen: -184.9154
	- Logits/rejected: 0.2000
	- Logits/chosen: 0.1668

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.7102 \| 0.3017 \| 70 \| 0.6900 \| 0.0340 \| 0.0258 \| 0.7000 \| 0.0082 \| -102.8295 \| -161.5030 \| 0.6083 \| 0.5744 \|
	\| 0.7024 \| 0.6034 \| 140 \| 0.7276 \| 0.0806 \| 0.1382 \| 0.3000 \| -0.0575 \| -101.7058 \| -161.0370 \| 0.6015 \| 0.5681 \|
	\| 0.6653 \| 0.9052 \| 210 \| 0.7362 \| 0.0303 \| 0.0739 \| 0.4000 \| -0.0435 \| -102.3490 \| -161.5399 \| 0.6196 \| 0.5858 \|
	\| 0.488 \| 1.2069 \| 280 \| 0.8450 \| -0.5919 \| -0.4398 \| 0.4000 \| -0.1521 \| -107.4859 \| -167.7624 \| 0.5482 \| 0.5148 \|
	\| 0.5839 \| 1.5086 \| 350 \| 0.8971 \| -0.8481 \| -0.6497 \| 0.4000 \| -0.1984 \| -109.5843 \| -170.3242 \| 0.5183 \| 0.4846 \|
	\| 0.503 \| 1.8103 \| 420 \| 1.0273 \| -1.1487 \| -0.8225 \| 0.4000 \| -0.3262 \| -111.3127 \| -173.3304 \| 0.4207 \| 0.3883 \|
	\| 0.2083 \| 2.1121 \| 490 \| 1.1693 \| -1.6401 \| -1.2436 \| 0.4000 \| -0.3965 \| -115.5236 \| -178.2447 \| 0.2902 \| 0.2576 \|
	\| 0.1395 \| 2.4138 \| 560 \| 1.2310 \| -2.1881 \| -1.7991 \| 0.6000 \| -0.3890 \| -121.0787 \| -183.7240 \| 0.2345 \| 0.2015 \|
	\| 0.1618 \| 2.7155 \| 630 \| 1.2865 \| -2.3072 \| -1.8542 \| 0.5 \| -0.4531 \| -121.6291 \| -184.9154 \| 0.2000 \| 0.1668 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 3.2.0
	- Tokenizers 0.19.1