dnoever commited on
Commit
d952d0f
·
1 Parent(s): 991aff3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -96
README.md CHANGED
@@ -7,100 +7,7 @@ dtype: bfloat16
7
  ---
8
 
9
 
10
- # Results:
11
- T: 🟦
12
- Model: CultriX/MistralTrix-v1 📑
13
- Average: 73.39
14
- ARC: 72.27
15
- HellaSwag: 88.33
16
- MMLU: 65.24
17
- TruthfulQA: 70.73
18
- Winogrande: 80.98
19
- GSM8K: 62.77
20
 
21
- # Edit/Disclaimer:
22
- Currently the #1 ranked 7B LLM on the LLM Leaderboards, woah!
23
- I did not expect that result at all and am in no way a professional when it comes to LLM's or computer science in general,
24
- just a guy that likes to nerd about and tinker around.
25
-
26
- For those wondering how I achieved this, the answer is that I simply attempted to apply the techniques outlined in this amazing article myself: https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac
27
- Therefore, all credit basically goes to the guy who wrote that.
28
- He offers the exact Colab notebook I used to train this model for free, as well as a really nice GitHub page I hope he doesn't mind me sharing: https://github.com/mlabonne/llm-course/
29
- So huge thank you to him for sharing his knowledge and learning me a thing or two in the process!
30
-
31
- # GGUF
32
- I attempted to quantisize the model myself, which again I pretty much have no clue about, but it seems to run fine for me when I test them:
33
- https://huggingface.co/CultriX/MistralTrix-v1-GGUF
34
-
35
- I'll say it one more time though:
36
- "I am a complete beginner to all of this, so if these do end up sucking don't be surprised."
37
-
38
- You have been warned :)
39
-
40
- # Description:
41
- (trained on a single Colab GPU in less than a few hours)
42
-
43
- MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
44
- It surpasses the original model on several benchmarks (see results).
45
-
46
- It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3-1's authors to improve performance.
47
- I used the same dataset and reformatted it to apply the ChatML template.
48
-
49
- The code to train this model is available on Google Colab and GitHub.
50
- Fine-tuning took about an hour on Google Colab A-1000 GPU with 40GB VRAM.
51
-
52
- # TRAINING SPECIFICATIONS
53
- > LoRA configuration
54
- peft_config = LoraConfig(
55
- r=16,
56
- lora_alpha=16,
57
- lora_dropout=0.05,
58
- bias="none",
59
- task_type="CAUSAL_LM",
60
- target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
61
- )
62
-
63
- > Model to fine-tune
64
- model = AutoModelForCausalLM.from_pretrained(
65
- model_name,
66
- torch_dtype=torch.float16,
67
- load_in_4bit=True
68
- )
69
- model.config.use_cache = False
70
-
71
- > Reference model
72
- ref_model = AutoModelForCausalLM.from_pretrained(
73
- model_name,
74
- torch_dtype=torch.float16,
75
- load_in_4bit=True
76
- )
77
-
78
- > Training arguments
79
- training_args = TrainingArguments(
80
- per_device_train_batch_size=4,
81
- gradient_accumulation_steps=4,
82
- gradient_checkpointing=True,
83
- learning_rate=5e-5,
84
- lr_scheduler_type="cosine",
85
- max_steps=200,
86
- save_strategy="no",
87
- logging_steps=1,
88
- output_dir=new_model,
89
- optim="paged_adamw_32bit",
90
- warmup_steps=100,
91
- bf16=True,
92
- report_to="wandb",
93
- )
94
-
95
- > Create DPO trainer
96
- dpo_trainer = DPOTrainer(
97
- model,
98
- ref_model,
99
- args=training_args,
100
- train_dataset=dataset,
101
- tokenizer=tokenizer,
102
- peft_config=peft_config,
103
- beta=0.1,
104
- max_prompt_length=1024,
105
- max_length=1536,
106
- )
 
7
  ---
8
 
9
 
10
+ Currently the #2 ranked 7B LLM on the LLM Leaderboards, woah!
11
+ https://huggingface.co/v1olet/v1olet_merged_dpo_7B
 
 
 
 
 
 
 
 
12
 
13
+ converted to exl2 5.0 bpw