Prompt/token adjust to stop "Overthinking" in unnescissary cases

#6
by fuzzy-mittenz - opened

I was using the model to great effect inside GPT4ALL with it's new Analyze feature. I was hoping you might be able to shed some light on possibly a method of keeping it from being so verbose in it's responses when it isn't necessary. Some of the questions most models are great at after llama 3.2/qwen 2.5 for example,

"If Philip walks into a bar and orders a round of drinks for all, there being 12 other customers in the bar and drinks being 5 smeckles a piece, and then later on in the night, during happy hour, after a woman with a dog comes into the bar joining the original customers Phil buys another round for all at happy hour prices, half off, how much would Phil spend with a healthy tip? "

even the 1.5B model usually gets this right. your model tends to go through to many steps to maintain coherence. Even though I can use the JavaScript_Interpreter and Code_execution to execute things like the factorial of 101 or use the haversine function to measure distance from any 2 points in the world the model seems to lack the solid long form single response.

PowerInfer org

Thank you for your suggestion. In fact, we have also noticed that the issue of overthinking is relatively prominent. We are currently trying to use some methods to alleviate this problem or to differentiate the level of thinking based on the difficulty of the question. One approach we are considering is incorporating an assessment of the question's difficulty into the response and then customizing the complexity of the response based on the difficulty level.

Well for smaller models I've found using a simple 2 step reasoning method works well when using Qwen models, if that helps at all but all and all but I've been working pretty hard to get the tokenizer to do what you guys have it accomplishing so I really cant complain. thanks for the awesome model.

Sign up or log in to comment