TANGU Quant a QwenStar/GPT4ALL/PowerInfer (o#/QwQ) series Reasoner

Final Small reasoning for CPU using SmallThinker-3B-Preview-Q8_0-GGUF We are labeling it Tangu 3B-for our GPT4ALL Community(a fallen star bound to Earth)

(Our efforts to create a pure and CPU friendly local test time compute model were realized by the PowerInfer team before we were able to realize a more advanced reasoning base model after a month of merging and training in our "QwenStar" project. It seems as if the universe provides, or at least Huggingface has. Offering more Test time reasoning than our other models it may use more tokens to come to many of the same conclusions but this makes it more accurate overall. If you're looking for something similar,faster yet slightly less effective I'd point you to our Reasoning-Rabbit or Replicant models. if you don't need tool use and simply need something solid and small go with the Kaiju or THOTH models. This model was converted to GGUF format from PowerInfer/SmallThinker-3B-Preview using llama.cpp Refer to the original model card for more details on the model.

The Model is Renamed Tangu for personal use and has not undergone any Importance matrix Quantization yet for lack of responce exploration but is so far very functional and other sizes can be found in Bartowski's repository bartowski/SmallThinker-3B-Preview-GGUF and following the original model tree. Our QwenStar project is mostly for the users of GPT4ALL and Offering resources for applying tool use to reasoning models like this, offering a recursive thought method with not just code inference but actual execution and calculation. Things like factorals or distance estimation as well as many other information non existent in an LLM(or SLM) is now available so you can, without a GPU compete with the likes of o1 and o3 inside the GPT4ALL env with it's new behind the scenes “Analyzing” function. Also using RAG/embedding we believe these powerful features are revolutionary. We also believe to restrict someone's freedoms and opportunities for how they "Might" be used is both Jealous and Unjust. Just as the founders and philosophers which brought forth this age of abundance did. Please comment with unique use cases and other information you find either here or on our X/Discord (both offer set-up instructions)

Use with GPT4ALL

Jinja "Chat Template"

{{- '<|im_start|>system\n' }}
{% if toolList|length > 0 %}You have access to the following functions:
{% for tool in toolList %}
Use the function '{{tool.function}}' to: '{{tool.description}}'
{% if tool.parameters|length > 0 %}
parameters:
{% for info in tool.parameters %}
  {{info.name}}:
    type: {{info.type}}
    description: {{info.description}}
    required: {{info.required}}
{% endfor %}
{% endif %}
# Tool Instructions
If you CHOOSE to call this function ONLY reply with the following format:
'{{tool.symbolicFormat}}'
Here is an example. If the user says, '{{tool.examplePrompt}}', then you reply
'{{tool.exampleCall}}'
After the result you might reply with, '{{tool.exampleReply}}'
{% endfor %}
You MUST include both the start and end tags when you use a function.

You are a helpful aware AI assistant made by Intelligent Estate who uses the functions to break down, analyze, perform, and verify complex reasoning tasks. You use your functions to verify your answers using the functions where possible.
{% endif %}
{{- '<|im_end|>\n' }}
{% for message in messages %}
{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}
{% endfor %}
{% if add_generation_prompt %}
{{ '<|im_start|>assistant\n' }}
{% endif %}

GPT4ALL "System Message"

So far not neccisary but may be tuned as needed for suggestions refer to Reasoning-Rabbit and Replicant models

Other models

This should work well on other UIs the original model has usage instructions for them

Benchmark Performance/Without the JavaScript/code_interpreter in GPT4ALL Should easily obtain o1 levals even without GPU. Though you may need patience depending on your setup.

Model	AIME24	AMC23	GAOKAO2024_I	GAOKAO2024_II	MMLU_STEM	AMPS_Hard	math_comp
Qwen2.5-3B-Instruct	6.67	45	50	35.8	59.8	-	-
SmallThinker	16.667	57.5	64.2	57.1	68.2	70	46.8
GPT-4o	9.3	-	-	-	64.2	57	50

Example

Downloads last month: 27

GGUF

Model size

3.4B params

Architecture

qwen2

8-bit

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for IntelligentEstate/Tangu-3B-Qwenstar-Q8-GGUF

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

PowerInfer/SmallThinker-3B-Preview

Quantized

(17)

this model

IntelligentEstate
/

Tangu-3B-Qwenstar-Q8-GGUF