TheBloke commited on
Commit
7c136ef
·
1 Parent(s): 3fa23cf

Initial GGML model commit

Browse files
Files changed (1) hide show
  1. README.md +207 -0
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ license: other
4
+ ---
5
+
6
+ <!-- header start -->
7
+ <div style="width: 100%;">
8
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
9
+ </div>
10
+ <div style="display: flex; justify-content: space-between; width: 100%;">
11
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
12
+ <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
13
+ </div>
14
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
15
+ <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
16
+ </div>
17
+ </div>
18
+ <!-- header end -->
19
+
20
+ # NousResearch's Redmond Hermes Coder GGML
21
+
22
+ These files are GGML format model files for [NousResearch's Redmond Hermes Coder](https://huggingface.co/NousResearch/Redmond-Hermes-Coder).
23
+
24
+ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as:
25
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
26
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp)
27
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui) using the `c_transformers` backend.
28
+ * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
29
+ * [ctransformers](https://github.com/marella/ctransformers)
30
+
31
+ ## Repositories available
32
+
33
+ * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/Redmond-Hermes-Coder-GPTQ)
34
+ * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/Redmond-Hermes-Coder-GGML)
35
+ * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/NousResearch/Redmond-Hermes-Coder)
36
+
37
+ ## Prompt template: Alpaca
38
+
39
+ ```
40
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
41
+
42
+ ### Instruction: PROMPT
43
+
44
+ ### Response:
45
+
46
+ ```
47
+
48
+ <!-- compatibility_ggml start -->
49
+ ## Compatibilty
50
+
51
+ These files are **not** compatible with llama.cpp.
52
+
53
+ Currently they can be used with:
54
+ * KoboldCpp, a powerful inference engine based on llama.cpp, with good UI: [KoboldCpp](https://github.com/LostRuins/koboldcpp)
55
+ * The ctransformers Python library, which includes LangChain support: [ctransformers](https://github.com/marella/ctransformers)
56
+ * The GPT4All-UI which uses ctransformers: [GPT4All-UI](https://github.com/ParisNeo/gpt4all-ui)
57
+ * [rustformers' llm](https://github.com/rustformers/llm)
58
+ * The example `starcoder` binary provided with [ggml](https://github.com/ggerganov/ggml)
59
+
60
+ As other options become available I will endeavour to update them here (do let me know in the Community tab if I've missed something!)
61
+
62
+ ## Tutorial for using GPT4All-UI
63
+
64
+ * [Text tutorial, written by **Lucas3DCG**](https://huggingface.co/TheBloke/MPT-7B-Storywriter-GGML/discussions/2#6475d914e9b57ce0caa68888)
65
+ * [Video tutorial, by GPT4All-UI's author **ParisNeo**](https://www.youtube.com/watch?v=ds_U0TDzbzI)
66
+ <!-- compatibility_ggml end -->
67
+
68
+ ## Provided files
69
+ | Name | Quant method | Bits | Size | Max RAM required | Use case |
70
+ | ---- | ---- | ---- | ---- | ---- | ----- |
71
+ | redmond-hermes-coder.ggmlv3.q4_0.bin | q4_0 | 4 | 10.75 GB| 13.25 GB | 4-bit. |
72
+ | redmond-hermes-coder.ggmlv3.q4_1.bin | q4_1 | 4 | 11.92 GB| 14.42 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
73
+ | redmond-hermes-coder.ggmlv3.q5_0.bin | q5_0 | 5 | 13.09 GB| 15.59 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
74
+ | redmond-hermes-coder.ggmlv3.q5_1.bin | q5_1 | 5 | 14.26 GB| 16.76 GB | 5-bit. Even higher accuracy, resource usage and slower inference. |
75
+ | redmond-hermes-coder.ggmlv3.q8_0.bin | q8_0 | 8 | 20.11 GB| 22.61 GB | 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
76
+
77
+
78
+ **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
79
+
80
+ <!-- footer start -->
81
+ ## Discord
82
+
83
+ For further support, and discussions on these models and AI in general, join us at:
84
+
85
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
86
+
87
+ ## Thanks, and how to contribute.
88
+
89
+ Thanks to the [chirper.ai](https://chirper.ai) team!
90
+
91
+ I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
92
+
93
+ If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
94
+
95
+ Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
96
+
97
+ * Patreon: https://patreon.com/TheBlokeAI
98
+ * Ko-Fi: https://ko-fi.com/TheBlokeAI
99
+
100
+ **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
101
+
102
+ **Patreon special mentions**: zynix , ya boyyy, Trenton Dambrowitz, Imad Khwaja, Alps Aficionado, chris gileta, John Detwiler, Willem Michiel, RoA, Mano Prime, Rainer Wilmers, Fred von Graf, Matthew Berman, Ghost , Nathan LeClaire, Iucharbius , Ai Maven, Illia Dulskyi, Joseph William Delisle, Space Cruiser, Lone Striker, Karl Bernard, Eugene Pentland, Greatston Gnanesh, Jonathan Leane, Randy H, Pierre Kircher, Willian Hasse, Stephen Murray, Alex , terasurfer , Edmond Seymore, Oscar Rangel, Luke Pendergrass, Asp the Wyvern, Junyu Yang, David Flickinger, Luke, Spiking Neurons AB, subjectnull, Pyrater, Nikolai Manek, senxiiz, Ajan Kanaga, Johann-Peter Hartmann, Artur Olbinski, Kevin Schuppel, Derek Yates, Kalila, K, Talal Aujan, Khalefa Al-Ahmad, Gabriel Puliatti, John Villwock, WelcomeToTheClub, Daniel P. Andersen, Preetika Verma, Deep Realms, Fen Risland, trip7s trip, webtim, Sean Connelly, Michael Levine, Chris McCloskey, biorpg, vamX, Viktor Bowallius, Cory Kujawski.
103
+
104
+ Thank you to all my generous patrons and donaters!
105
+
106
+ <!-- footer end -->
107
+
108
+ # Original model card: NousResearch's Redmond Hermes Coder
109
+
110
+
111
+ # Model Card: Redmond-Hermes-Coder 15B
112
+
113
+ ## Model Description
114
+
115
+ Redmond-Hermes-Coder 15B is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.
116
+
117
+ This model was trained with a WizardCoder base, which itself uses a StarCoder base model.
118
+
119
+ The model is truly great at code, but, it does come with a tradeoff though. While far better at code than the original Nous-Hermes built on Llama, it is worse than WizardCoder at pure code benchmarks, like HumanEval.
120
+
121
+ It comes in at 39% on HumanEval, with WizardCoder at 57%. This is a preliminary experiment, and we are exploring improvements now.
122
+
123
+ However, it does seem better at non-code than WizardCoder on a variety of things, including writing tasks.
124
+
125
+ ## Model Training
126
+
127
+ The model was trained almost entirely on synthetic GPT-4 outputs. This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), CodeAlpaca, Evol_Instruct Uncensored, GPT4-LLM, and Unnatural Instructions.
128
+
129
+ Additional data inputs came from Camel-AI's Biology/Physics/Chemistry and Math Datasets, Airoboros' (v1) GPT-4 Dataset, and more from CodeAlpaca. The total volume of data encompassed over 300,000 instructions.
130
+
131
+ ## Collaborators
132
+ The model fine-tuning and the datasets were a collaboration of efforts and resources from members of Nous Research, includingTeknium, Karan4D, Huemin Art, and Redmond AI's generous compute grants.
133
+
134
+ Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
135
+
136
+ Among the contributors of datasets, GPTeacher was made available by Teknium, Wizard LM by nlpxucan, and the Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
137
+ The GPT4-LLM and Unnatural Instructions were provided by Microsoft, Airoboros dataset by jondurbin, Camel-AI datasets are from Camel-AI, and CodeAlpaca dataset by Sahil 2801.
138
+ If anyone was left out, please open a thread in the community tab.
139
+
140
+ ## Prompt Format
141
+
142
+ The model follows the Alpaca prompt format:
143
+ ```
144
+ ### Instruction:
145
+
146
+ ### Response:
147
+ ```
148
+
149
+ or
150
+
151
+ ```
152
+ ### Instruction:
153
+
154
+ ### Input:
155
+
156
+ ### Response:
157
+ ```
158
+
159
+ ## Resources for Applied Use Cases:
160
+ For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
161
+ For an example of a roleplaying discord bot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
162
+
163
+ ## Future Plans
164
+ The model is currently being uploaded in FP16 format, and there are plans to convert the model to GGML and GPTQ 4bit quantizations. The team is also working on a full benchmark, similar to what was done for GPT4-x-Vicuna. We will try to get in discussions to get the model included in the GPT4All.
165
+
166
+ ## Benchmark Results
167
+ ```
168
+ HumanEval: 39%
169
+ | Task |Version| Metric |Value | |Stderr|
170
+ |------------------------------------------------|------:|---------------------|-----:|---|-----:|
171
+ |arc_challenge | 0|acc |0.2858|± |0.0132|
172
+ | | |acc_norm |0.3148|± |0.0136|
173
+ |arc_easy | 0|acc |0.5349|± |0.0102|
174
+ | | |acc_norm |0.5097|± |0.0103|
175
+ |bigbench_causal_judgement | 0|multiple_choice_grade|0.5158|± |0.0364|
176
+ |bigbench_date_understanding | 0|multiple_choice_grade|0.5230|± |0.0260|
177
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3295|± |0.0293|
178
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|0.1003|± |0.0159|
179
+ | | |exact_str_match |0.0000|± |0.0000|
180
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2260|± |0.0187|
181
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.1957|± |0.0150|
182
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.3733|± |0.0280|
183
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|0.3200|± |0.0209|
184
+ |bigbench_navigate | 0|multiple_choice_grade|0.4830|± |0.0158|
185
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.4150|± |0.0110|
186
+ |bigbench_ruin_names | 0|multiple_choice_grade|0.2143|± |0.0194|
187
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2926|± |0.0144|
188
+ |bigbench_snarks | 0|multiple_choice_grade|0.5249|± |0.0372|
189
+ |bigbench_sports_understanding | 0|multiple_choice_grade|0.4817|± |0.0159|
190
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|0.2700|± |0.0140|
191
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.1864|± |0.0110|
192
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1349|± |0.0082|
193
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.3733|± |0.0280|
194
+ |boolq | 1|acc |0.5498|± |0.0087|
195
+ |hellaswag | 0|acc |0.3814|± |0.0048|
196
+ | | |acc_norm |0.4677|± |0.0050|
197
+ |openbookqa | 0|acc |0.1960|± |0.0178|
198
+ | | |acc_norm |0.3100|± |0.0207|
199
+ |piqa | 0|acc |0.6600|± |0.0111|
200
+ | | |acc_norm |0.6610|± |0.0110|
201
+ |winogrande | 0|acc |0.5343|± |0.0140|
202
+ ```
203
+
204
+ ## Model Usage
205
+ The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
206
+
207
+ Compute provided by our project sponsor Redmond AI, thank you!!