clite-500m / README.md

Upload folder using huggingface_hub

5df3bc6 verified 6 months ago

5.44 kB

	---
	license: apache-2.0
	base_model: h2oai/h2o-danube3-500m-base
	tags:
	- axolotl
	- generated_from_trainer
	model-index:
	- name: clite7-500m-test-ckpts
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.1`
	```yaml
	# Weights and Biases logging config
	wandb_project: clite
	wandb_entity:
	wandb_watch:
	wandb_name: v7
	wandb_log_model:

	# Model architecture config
	base_model: h2oai/h2o-danube3-500m-base
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer
	chat_template: anthropic

	# Hugging Face saving config
	hub_model_id:
	hub_strategy:
	push_dataset_to_hub:
	hf_use_auth_token:

	# Model checkpointing config
	output_dir: ./lora-out
	resume_from_checkpoint:
	save_steps:
	saves_per_epoch: 5
	save_safetensors: true
	save_total_limit: 2

	# Mixed precision training config
	bf16: true
	fp16: false
	tf32: false

	# Model loading config
	load_in_8bit: false
	load_in_4bit: false
	strict: false

	# Sequence config
	sequence_len: 8192
	s2_attention: false
	sample_packing: true
	eval_sample_packing: true
	pad_to_sequence_len: true
	train_on_inputs: true
	group_by_length: false

	# Dataset config
	datasets:
	- path: kalomaze/Opus_Instruct_3k
	type: chat_template
	val_set_size: 0.1
	evaluation_strategy:
	eval_steps:
	evals_per_epoch: 10
	test_datasets:
	dataset_prepared_path: ./last-preped-dataset
	shuffle_merged_datasets: true

	# Training hyperparameters
	num_epochs: 3
	gradient_accumulation_steps: 2
	micro_batch_size: 8
	eval_batch_size: 8
	warmup_steps: 10
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	learning_rate: 0.00004
	cosine_min_lr_ratio: 0.1
	weight_decay: 0.1
	max_grad_norm: 1
	logging_steps: 1

	# Model optimization
	gradient_checkpointing: unsloth
	xformers_attention: false
	flash_attention: true
	sdp_attention: false
	unsloth_cross_entropy_loss: false
	unsloth_lora_mlp: false
	unsloth_lora_qkv: false
	unsloth_lora_o: false

	# Loss monitoring config
	early_stopping_patience: false
	loss_watchdog_threshold: 100.0
	loss_watchdog_patience: 3

	# Debug config
	debug: true
	seed: 02496

	# DeepSpeed and FSDP config
	deepspeed:
	fsdp:
	fsdp_config:

	# Token config
	special_tokens:
	tokens: # these are delimiters
	- "<EOT>"

	# Checkpoint backing up
	hub_model_id: Fizzarolli/clite7-500m-test-ckpts
	hub_strategy: all_checkpoints

	```

	</details><br>

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/ruthenic/clite/runs/diil6zl9)
	# clite7-500m-test-ckpts

	This model is a fine-tuned version of [h2oai/h2o-danube3-500m-base](https://huggingface.co/h2oai/h2o-danube3-500m-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3765

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 4e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 2496
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.9517 \| 0.0952 \| 1 \| 3.7616 \|
	\| 2.9796 \| 0.1905 \| 2 \| 3.6462 \|
	\| 2.9632 \| 0.2857 \| 3 \| 3.3357 \|
	\| 2.6639 \| 0.3810 \| 4 \| 3.0408 \|
	\| 2.5048 \| 0.4762 \| 5 \| 2.7322 \|
	\| 2.4911 \| 0.5714 \| 6 \| 2.5094 \|
	\| 2.1291 \| 0.6667 \| 7 \| 2.3554 \|
	\| 4.8452 \| 0.7619 \| 8 \| 1.6418 \|
	\| 1.6902 \| 0.8571 \| 9 \| 1.6067 \|
	\| 1.6166 \| 0.9524 \| 10 \| 1.5581 \|
	\| 1.5985 \| 1.0476 \| 11 \| 1.5162 \|
	\| 1.5001 \| 1.0476 \| 12 \| 1.4847 \|
	\| 1.4679 \| 1.1429 \| 13 \| 1.4601 \|
	\| 1.4981 \| 1.2381 \| 14 \| 1.4440 \|
	\| 1.4864 \| 1.3333 \| 15 \| 1.4293 \|
	\| 1.4895 \| 1.4286 \| 16 \| 1.4174 \|
	\| 1.4653 \| 1.5238 \| 17 \| 1.4061 \|
	\| 1.4447 \| 1.6190 \| 18 \| 1.3988 \|
	\| 1.4492 \| 1.7143 \| 19 \| 1.3937 \|
	\| 1.4244 \| 1.8095 \| 20 \| 1.3896 \|
	\| 1.4319 \| 1.9048 \| 21 \| 1.3858 \|
	\| 1.4238 \| 2.0 \| 22 \| 1.3830 \|
	\| 1.4725 \| 2.0952 \| 23 \| 1.3810 \|
	\| 1.3862 \| 2.0952 \| 24 \| 1.3794 \|
	\| 1.3526 \| 2.1905 \| 25 \| 1.3783 \|
	\| 1.4134 \| 2.2857 \| 26 \| 1.3776 \|
	\| 1.3909 \| 2.3810 \| 27 \| 1.3771 \|
	\| 1.4016 \| 2.4762 \| 28 \| 1.3769 \|
	\| 1.3494 \| 2.5714 \| 29 \| 1.3766 \|
	\| 1.3783 \| 2.6667 \| 30 \| 1.3765 \|


	### Framework versions

	- Transformers 4.42.4
	- Pytorch 2.1.2+cu118
	- Datasets 2.19.1
	- Tokenizers 0.19.1