dnoever commited on
Commit
8fa651b
·
1 Parent(s): 573fd8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -19
README.md CHANGED
@@ -19,35 +19,19 @@ Winogrande: 80.98
19
  GSM8K: 62.77
20
 
21
  # Edit/Disclaimer:
22
- Currently the #1 ranked 7B LLM on the LLM Leaderboards, woah!
23
- I did not expect that result at all and am in no way a professional when it comes to LLM's or computer science in general,
24
- just a guy that likes to nerd about and tinker around.
25
 
26
- For those wondering how I achieved this, the answer is that I simply attempted to apply the techniques outlined in this amazing article myself: https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac
27
- Therefore, all credit basically goes to the guy who wrote that.
28
- He offers the exact Colab notebook I used to train this model for free, as well as a really nice GitHub page I hope he doesn't mind me sharing: https://github.com/mlabonne/llm-course/
29
- So huge thank you to him for sharing his knowledge and learning me a thing or two in the process!
30
 
31
- # GGUF
32
- I attempted to quantisize the model myself, which again I pretty much have no clue about, but it seems to run fine for me when I test them:
33
- https://huggingface.co/CultriX/MistralTrix-v1-GGUF
34
-
35
- I'll say it one more time though:
36
- "I am a complete beginner to all of this, so if these do end up sucking don't be surprised."
37
-
38
- You have been warned :)
39
 
40
  # Description:
41
- (trained on a single Colab GPU in less than a few hours)
 
42
 
43
  MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
44
  It surpasses the original model on several benchmarks (see results).
45
 
46
  It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3-1's authors to improve performance.
47
- I used the same dataset and reformatted it to apply the ChatML template.
48
 
49
- The code to train this model is available on Google Colab and GitHub.
50
- Fine-tuning took about an hour on Google Colab A-1000 GPU with 40GB VRAM.
51
 
52
  # TRAINING SPECIFICATIONS
53
  > LoRA configuration
 
19
  GSM8K: 62.77
20
 
21
  # Edit/Disclaimer:
22
+ Currently the #1 ranked 7B LLM on the LLM Leaderboards, converted with exl2 quantization
 
 
23
 
 
 
 
 
24
 
 
 
 
 
 
 
 
 
25
 
26
  # Description:
27
+
28
+ Model: CultriX/MistralTrix-v1
29
 
30
  MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
31
  It surpasses the original model on several benchmarks (see results).
32
 
33
  It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3-1's authors to improve performance.
 
34
 
 
 
35
 
36
  # TRAINING SPECIFICATIONS
37
  > LoRA configuration