File size: 16,715 Bytes
e1640d7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
<div align="center">
<img src="figures/MiniMaxLogo.png" width="60%" alt="MiniMax-Text-01" />
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.minimaxi.com/en" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MinMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://www.hailuo.ai/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/Chat-_Hailuo AI-FF4040?style=flat-square&labelColor=2C3E50&logo=&logoWidth=16" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://intl.minimaxi.com" style="margin: 2px;">
<img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/📜_License-Model_Agreement-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
# MiniMax-Text-01
## 1. Introduction
MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.
<p align="center">
<img width="100%" src="figures/TextBench.png">
</p>
## 2. Model Architecture
The architecture of MiniMax-Text-01 is briefly described as follows:
- Total Parameters: 456B
- Activated Parameters per Token: 45.9B
- Number Layers: 80
- Hybrid Attention: a softmax attention is positioned after every 7 lightning attention.
- Number of attention heads: 64
- Attention head dimension: 128
- Mixture of Experts:
- Number of experts: 32
- Expert hidden dimension: 9216
- Top-2 routing strategy
- Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000
- Hidden Size: 6144
- Vocab Size: 200,064
## 3. Evaluation
### Core Academic Benchmarks
| **Tasks** | **GPT-4o (11-20)** | **Claude-3.5-Sonnet (10-22)** | **Gemini-1.5-Pro (002)** | **Gemini-2.0-Flash (exp)** | **Qwen2.5-72B-Inst.** | **DeepSeek-V3** | **Llama-3.1-405B-Inst.** | **MiniMax-Text-01** |
|-------------------------------|--------------------|-------------------------------|--------------------------|----------------------------|-----------------------|-----------------|--------------------------|---------------------|
| **General** | | | | | | | | |
| MMLU<sup>*</sup> | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | **88.6** | 88.5 |
| MMLU-Pro<sup>*</sup> | 74.4 | **78.0** | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 |
| SimpleQA | **39.0** | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 |
| C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | **67.4** |
| IFEval _(avg)_ | 84.1 | **90.1** | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 |
| Arena-Hard | **92.4** | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 |
| **Reasoning** | | | | | | | | |
| GPQA<sup>*</sup> _(diamond)_ | 46.0 | **65.0** | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 |
| DROP<sup>*</sup> _(F1)_ | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | **92.5** | 87.8 |
| **Mathematics** | | | | | | | | |
| GSM8k<sup>*</sup> | 95.6 | **96.9** | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 |
| MATH<sup>*</sup> | 76.6 | 74.1 | **84.6** | 83.9 | 81.8 | **84.6** | 73.8 | 77.4 |
| **Coding** | | | | | | | | |
| MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | **78.8** | 73.0 | 71.7 |
| HumanEval | 90.2 | **93.7** | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 |
<sup>*</sup> Evaluated following a _0-shot CoT_ setting.
### Long Benchmarks
#### 4M Needle In A Haystack Test
<p align="center">
<img width="90%" src="figures/niah.png">
</p>
#### Ruler
| Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M |
|-------|----|----|-----|-----|-----|------|------|------|----|
| **GPT-4o (11-20)** | **0.970** | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - |
| **Claude-3.5-Sonnet (10-22)** | 0.965 | 0.960 | 0.957 | 0.950 | **0.952** | 0.938 | - | - | - |
| **Gemini-1.5-Pro (002)** | 0.962 | 0.960 | **0.960** | **0.958** | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 |
| **Gemini-2.0-Flash (exp)** | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - |
| **MiniMax-Text-01** | 0.963 | **0.961** | 0.953 | 0.954 | 0.943 | **0.947** | **0.945** | **0.928** | **0.910** |
#### LongBench v2
| **Model** | **overall** | **easy** | **hard** | **short** | **medium** | **long** |
|----------------------------|-------------|----------|----------|------------|------------|----------|
| Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 |
| **w/ CoT** | | | | | | |
| GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 |
| Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 |
| Deepseek-V3 | - | - | - | - | - | - |
| Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 |
| **MiniMax-Text-01** | **56.5** | **66.1** | **50.5** | **61.7** | **56.7** | **47.2** |
| **w/o CoT** | | | | | | |
| GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 |
| Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 |
| Deepseek-V3 | 48.7 | - | - | - | - | - |
| Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | **44.4** |
| **MiniMax-Text-01** | **52.9** | **60.9** | **47.9** | **58.9** | **52.6** | 43.5 |
#### MTOB
| **Context Type** | **no context** | **half book** | **full book** | **Δ half book** | **Δ full book** |
|------------------|----------------|---------------|---------------|------------------|-----------------|
| **eng → kalam (ChrF)** | | | | | |
| GPT-4o (11-20) | 9.90 | **54.30** | - | 44.40 | - |
| Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 |
| Gemini-1.5-Pro (002) | 16.79 | 53.68 | **57.90** | 36.89 | 41.11 |
| Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 |
| Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 |
| **MiniMax-Text-01** | 6.0 | 51.74 | 51.60 | **45.7** | **45.6** |
| **kalam → eng (BLEURT)** | | | | | |
| GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - |
| Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 |
| Gemini-1.5-Pro (002) | 32.02 | **61.52** | **63.09** | **29.50** | **31.07** |
| Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 |
| Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 |
| **MiniMax-Text-01** | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 |
## 4. Quickstart
Here we provide a simple example of loading the tokenizer and model to generate content.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig
# load hf config
hf_config = AutoConfig.from_pretrained("MiniMax-Text-01", trust_remote_code=True)
# quantization config, int8 is recommended
quantization_config = QuantoConfig(
weights="int8",
modules_to_not_convert=[
"lm_head",
"embed_tokens",
] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
+ [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
)
# set device map
device_map = {
'model.embed_tokens': 'cuda:0',
'model.norm': f'cuda:{world_size - 1}',
'lm_head': f'cuda:{world_size - 1}'
}
# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
for i in range(world_size):
for j in range(layers_per_device):
device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MiniMax-Text-01")
prompt = "Hello!"
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
{"role": "user", "content": [{"type": "text", "text": prompt}]},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# tokenize and move to device
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
# load bfloat16 model, move to device, and apply quantization
quantized_model = AutoModelForCausalLM.from_pretrained(
"MiniMax-Text-01",
torch_dtype="bfloat16",
device_map=device_map,
quantization_config=quantization_config,
trust_remote_code=True,
offload_buffers=True,
)
# generate response
generation_config = GenerationConfig(
max_new_tokens=20,
eos_token_id=200020,
use_cache=True,
)
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
print(f"generated_ids: {generated_ids}")
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
## 5. Chatbot & API
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://intl.minimaxi.com) for developers.
Contact us at [[email protected]](mailto:[email protected]).
|