dnoever
/

MistralTrix-v1-5.0bpw-exl2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

dnoever commited on Jan 6, 2024

Commit

8fa651b

·

1 Parent(s): 573fd8a

Update README.md

Files changed (1) hide show

README.md +3 -19

README.md CHANGED Viewed

@@ -19,35 +19,19 @@ Winogrande: 80.98
 GSM8K: 62.77
 # Edit/Disclaimer:
-Currently the #1 ranked 7B LLM on the LLM Leaderboards, woah!
-I did not expect that result at all and am in no way a professional when it comes to LLM's or computer science in general,
-just a guy that likes to nerd about and tinker around.
-For those wondering how I achieved this, the answer is that I simply attempted to apply the techniques outlined in this amazing article myself: https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac
-Therefore, all credit basically goes to the guy who wrote that.
-He offers the exact Colab notebook I used to train this model for free, as well as a really nice GitHub page I hope he doesn't mind me sharing: https://github.com/mlabonne/llm-course/
-So huge thank you to him for sharing his knowledge and learning me a thing or two in the process!
-# GGUF
-I attempted to quantisize the model myself, which again I pretty much have no clue about, but it seems to run fine for me when I test them:
-https://huggingface.co/CultriX/MistralTrix-v1-GGUF
-I'll say it one more time though:
-"I am a complete beginner to all of this, so if these do end up sucking don't be surprised."
-You have been warned :)
 # Description:
-(trained on a single Colab GPU in less than a few hours)
 MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
 It surpasses the original model on several benchmarks (see results).
 It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3-1's authors to improve performance.
-I used the same dataset and reformatted it to apply the ChatML template.
-The code to train this model is available on Google Colab and GitHub.
-Fine-tuning took about an hour on Google Colab A-1000 GPU with 40GB VRAM.
 # TRAINING SPECIFICATIONS
 > LoRA configuration

 GSM8K: 62.77
 # Edit/Disclaimer:
+Currently the #1 ranked 7B LLM on the LLM Leaderboards, converted with exl2 quantization
 # Description:
+Model: CultriX/MistralTrix-v1
 MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
 It surpasses the original model on several benchmarks (see results).
 It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3-1's authors to improve performance.
 # TRAINING SPECIFICATIONS
 > LoRA configuration