Crataco
/

stablelm-2-1_6b-chat-imatrix-GGUF

Inference Endpoints

Model card Files Files and versions Community

Crataco commited on Apr 10, 2024

Commit

15092aa

·

verified ·

1 Parent(s): 48a1101

Update README.md

Files changed (1) hide show

README.md +16 -0

README.md CHANGED Viewed

@@ -25,6 +25,22 @@ license: other
 This is [StableLM 2 Chat 1.6B](https://huggingface.co/stabilityai/stablelm-2-1_6b-chat), quantized with the help of imatrix so it could offer better performance for being quantized, and have quantization levels available for lower-memory devices to run. [Kalomaze's "groups_merged.txt"](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) was used for the importance matrix, with context set to 4,096 (the context length according to [their paper](https://drive.google.com/file/d/1JYJHszhS8EFChTbNAf8xmqhKjogWRrQF/view)).
 Original model card below.
 ***

 This is [StableLM 2 Chat 1.6B](https://huggingface.co/stabilityai/stablelm-2-1_6b-chat), quantized with the help of imatrix so it could offer better performance for being quantized, and have quantization levels available for lower-memory devices to run. [Kalomaze's "groups_merged.txt"](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) was used for the importance matrix, with context set to 4,096 (the context length according to [their paper](https://drive.google.com/file/d/1JYJHszhS8EFChTbNAf8xmqhKjogWRrQF/view)).
+Here's a chart that provides an approximation of the HellaSwag score (out of 1,000 tasks). Thanks to the randomization of tasks, it may be slightly unprecise:
+|Quantization|HellaSwag|
+|------------|---------|
+|IQ1_S       |35.4%    |
+|IQ1_M       |38.7%    |
+|IQ2_XXS     |51.2%    |
+|IQ2_XS      |51.8%    |
+|IQ2_S       |56.8%    |
+|IQ2_M       |59.3%    |
+|Q2_K_S      |55.2%    |
+|Q2_K        |59.0%    |
+|IQ3_XXS     |60.8%    |
+|Q4_0        |64.0%    |
+|Q4_K_M      |66.0%    |
+|Q5_K_M      |65.8%    |
 Original model card below.
 ***