Good underappreciated model. Using Q4_K_L quant with great results.
I know you asked a while back if people could let you know about the quant variants like _L. I've been using the Q4_K_L variant of this model for some time now and I feel like it shows very good knowledge and decent logic and reasoning for a model of its size and capabilities. It isn't perfect of course, but I don't think any model gets it right all the time.
Once I setup a good instruct/prompt and good author's notes and lowered temperature to 1.15 instead of the usually recommended value (I forget what people told me, but I think it was something like 1.8) I've been getting surprisingly good results from this model. I got the modified instruct/prompt from Gemma 2's default in SillyTavern and it seems to work fine. I feel like this model is a bit underappreciated actually. It still has some of the usual ministrations (shivers down the spine in particular) but it seems more inclined to produce decent quality writing for most genres with a pretty decent amount of variance in writing so it doesn't all come out the same every time. I think the main thing keeping people from liking it is most probably have the temperature too high. I still have to fight it on it going stupid sometimes, and it definitely likes to write for {{user}} at times (possibly related to my novel formatting but chat-style formatting too often degenerates into Discord style chat RP,) but all in all I'm pretty happy and a good author's note combined with a reminder in the character card has helped a lot with that. Especially the Q4_K_L seems to not have lost too much versus the larger sizes which I'm glad of. (Some models turn really stupid at lower quants.) I'm able to get 24K context at a speed I can tolerate and this model seems to follow my character cards extremely well despite the quant.
That's awesome, I'm glad to hear it's working well for you! 1.8 is a crazy temperature haha, 1.15 sounds much more reasonable :D
Good use cases and that's awesome!
I'm not really sure this one really needs to be abliterated. It's practically uncensored out of the box. In RP/story mode it goes nuts with no holds barred. Even in assistant mode it was pretty darned close to zero refusals for me ever.
no support yet for Cohere2ForCausalLM, need this PR: https://github.com/ggerganov/llama.cpp/pull/10900
no support yet for Cohere2ForCausalLM, need this PR: https://github.com/ggerganov/llama.cpp/pull/10900
Are you sure it's not working with llama.cpp itself or other tools that are build around llama.cpp? I'm pretty sure that I remember being able to load it in kobold.cpp and having an underwhelming experience in the short test I did. Maybe it was able to load with semi-suitable settings somehow and that's why the experience wasn't what I expected it to be because I had way better results with their previous smaller models.