File size: 16,715 Bytes

e1640d7

<div align="center">
  <img src="figures/MiniMaxLogo.png" width="60%" alt="MiniMax-Text-01" />
</div>
<hr>

<div align="center" style="line-height: 1;">
  <a href="https://www.minimaxi.com/en" target="_blank" style="margin: 2px;">
    <img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
    <img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MinMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>
<div align="center" style="line-height: 1;">
  <a href="https://www.hailuo.ai/" target="_blank" style="margin: 2px;">
    <img alt="Chat" src="https://img.shields.io/badge/Chat-_Hailuo AI-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgMzc1LjE0IDM3NS4xNCI+PGRlZnM+PHN0eWxlPi5jbHMtMXtmaWxsOnVybCgjdW5uYW1lZC1ncmFkaWVudCk7fTwvc3R5bGU+PGxpbmVhckdyYWRpZW50IGlkPSJ1bm5hbWVkLWdyYWRpZW50IiB4MT0iOC40MiIgeTE9IjEzLjgxIiB4Mj0iNDI5LjY1IiB5Mj0iNDIyLjM3IiBncmFkaWVudFVuaXRzPSJ1c2VyU3BhY2VPblVzZSI+PHN0b3Agb2Zmc2V0PSIwLjA5IiBzdG9wLWNvbG9yPSIjZmZhYjBjIi8+PHN0b3Agb2Zmc2V0PSIwLjMxIiBzdG9wLWNvbG9yPSIjZmY1NTM4Ii8+PHN0b3Agb2Zmc2V0PSIwLjQ2IiBzdG9wLWNvbG9yPSIjZTk0MDVkIi8+PHN0b3Agb2Zmc2V0PSIwLjc1IiBzdG9wLWNvbG9yPSIjZDI2NmRhIi8+PHN0b3Agb2Zmc2V0PSIwLjg5IiBzdG9wLWNvbG9yPSIjZDU4NGVmIi8+PC9saW5lYXJHcmFkaWVudD48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMzc1LjE0LDE4Ny41N0MzNzUuMTQsODQsMjkwLjc0LS4yNiwxODcuMDksMCw4NC4yNi4yNi4yNiw4NC4yNSwwLDE4Ny4wOWMtLjI2LDEwMy42NSw4NCwxODgsMTg3LjU3LDE4OEgzMTAuODJBNjQuMjEsNjQuMjEsMCwwLDAsMzc1LDMxMC45M1YxOTMuODJoMEMzNzUuMDksMTkxLjc5LDM3NS4xNCwxODkuNjcsMzc1LjE0LDE4Ny41N1ptLTI4NCwxMDQuMTdjLTI5Ljg2LTI1LjQ5LTQ4LjI2LTY2LjI3LTQ3LjQtMTA3Ljg1cS4wOS00LjM4LjQ2LTguNzNWMTc1YzQuMzItNDkuNiwzNi4zNy05NS44OCw4MS4yOS0xMTcuMzZTMjI2LjUyLDQwLjIxLDI2Ny44NSw2OHM2Ni4zMiw3OC4yMSw2My40LDEyNy45MmExNzgsMTc4LDAsMCwxLTUuMTQsMzIuMjVjLTEsNC4yLTIuMyw4LjU3LTUuMjgsMTEuNzJzLTguMiw0LjYtMTEuNzMsMi4wOWMtMy4zNy0yLjQxLTMuODctNy4xMi00LjE2LTExLjI1LTIuMzMtMzMuMzctMTEuMjQtNjcuNzYtMzMuNzktOTIuNDdhMTAzLjY3LDEwMy42NywwLDAsMC02Ni4zOC0zMi44NEExMDcuMTksMTA3LjE5LDAsMCwwLDEzMy4yMiwxMjVDMTE2LDEzNy4yNywxMDIuNTUsMTU0Ljg4LDk2LDE3NXMtNS44Niw0Mi42MSwyLjcxLDYxLjkzYTgxLjg5LDgxLjg5LDAsMCwwLDI5LjcxLDM1YzIyLjk0LDE1LjA2LDU0LjMxLDE3LjIsNzguMTQsMy42czM4LjA3LTQzLjEsMzItNjkuODZTMjA1LjQsMTU4LDE3OC4xMSwxNjAuODRjLTQuMTYuNDMtMTAuMTMsMC0xMC4yOC00LjIxLS4xMi0zLjI0LDMuNzctNC45NCw3LTUuNTIsMjcuNjgtNSw1Ny4zNCw5LjA5LDcyLjUzLDMyLjc3czE2LDU1LjQxLDMuNTYsODAuNjYtMzcsNDMuNjktNjQuMzYsNTAuMzVDMTQ5LjY4LDMyMy44NywxMTYuMzEsMzEzLjI1LDkxLjExLDI5MS43NFoiLz48L3N2Zz4=&logoWidth=16" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://intl.minimaxi.com" style="margin: 2px;">
    <img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>
<div align="center" style="line-height: 1;">
  <a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/LICENSE" style="margin: 2px;">
    <img alt="License" src="https://img.shields.io/badge/📜_License-Model_Agreement-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>


# MiniMax-Text-01

## 1. Introduction

MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.

<p align="center">
  <img width="100%" src="figures/TextBench.png">
</p>

## 2. Model Architecture

The architecture of MiniMax-Text-01 is briefly described as follows:
- Total Parameters: 456B
- Activated Parameters per Token: 45.9B
- Number Layers: 80
- Hybrid Attention: a softmax attention is positioned after every 7 lightning attention.
  - Number of attention heads: 64
  - Attention head dimension: 128
- Mixture of Experts:
  - Number of experts: 32
  - Expert hidden dimension: 9216
  - Top-2 routing strategy
- Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000
- Hidden Size: 6144
- Vocab Size: 200,064

## 3. Evaluation

### Core Academic Benchmarks

| **Tasks**                     | **GPT-4o (11-20)** | **Claude-3.5-Sonnet (10-22)** | **Gemini-1.5-Pro (002)** | **Gemini-2.0-Flash (exp)** | **Qwen2.5-72B-Inst.** | **DeepSeek-V3** | **Llama-3.1-405B-Inst.** | **MiniMax-Text-01** |
|-------------------------------|--------------------|-------------------------------|--------------------------|----------------------------|-----------------------|-----------------|--------------------------|---------------------|
| **General**                   |                    |                               |                          |                            |                       |                 |                          |                     |
| MMLU<sup>*</sup>                      | 85.7               | 88.3                          | 86.8                     | 86.5                       | 86.1                  | 88.5        | **88.6**                 | 88.5                |
| MMLU-Pro<sup>*</sup>                  | 74.4               | **78.0**                      | 75.8                     | 76.4                       | 71.1                  | 75.9            | 73.3                     | 75.7                |
| SimpleQA                      | **39.0**           | 28.1                          | 23.4                     | 26.6                       | 10.3                  | 24.9            | 23.2                     | 23.7                |
| C-SimpleQA                    | 64.6               | 56.8                          | 59.4                     | 63.3                       | 52.2                  | 64.8            | 54.7                     | **67.4**            |
| IFEval _(avg)_                | 84.1               | **90.1**                      | 89.4                     | 88.4                       | 87.2                  | 87.3            | 86.4                     | 89.1                |
| Arena-Hard                    | **92.4**           | 87.6                          | 85.3                     | 72.7                       | 81.2                  | 91.4            | 63.5                     | 89.1                |
| **Reasoning**                 |                    |                               |                          |                            |                       |                 |                          |                     |
| GPQA<sup>*</sup> _(diamond)_          | 46.0               | **65.0**                      | 59.1                     | 62.1                       | 49.0                  | 59.1            | 50.7                     | 54.4                |
| DROP<sup>*</sup> _(F1)_               | 89.2               | 88.8                          | 89.2                     | 89.3                       | 85.0                  | 91.0        | **92.5**                 | 87.8                |
| **Mathematics**               |                    |                               |                          |                            |                       |                 |                          |                     |
| GSM8k<sup>*</sup>                     | 95.6               | **96.9**                      | 95.2                     | 95.4                       | 95.8                  | 96.7            | 96.7                     | 94.8                |
| MATH<sup>*</sup>                      | 76.6               | 74.1                          | **84.6**                 | 83.9                       | 81.8                  | **84.6**        | 73.8                     | 77.4                |
| **Coding**                    |                    |                               |                          |                            |                       |                 |                          |                     |
| MBPP +                        | 76.2               | 75.1                          | 75.4                     | 75.9                       | 77.0              | **78.8**        | 73.0                     | 71.7                |
| HumanEval                     | 90.2               | **93.7**                      | 86.6                     | 89.6                       | 86.6                  | 92.1            | 89.0                     | 86.9                |

<sup>*</sup> Evaluated following a _0-shot CoT_ setting.

### Long Benchmarks
#### 4M Needle In A Haystack Test
<p align="center">
  <img width="90%" src="figures/niah.png">
</p>

#### Ruler
| Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M |
|-------|----|----|-----|-----|-----|------|------|------|----|
| **GPT-4o (11-20)** | **0.970** | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - |
| **Claude-3.5-Sonnet (10-22)** | 0.965 | 0.960 | 0.957 | 0.950 | **0.952** | 0.938 | - | - | - |
| **Gemini-1.5-Pro (002)** | 0.962 | 0.960 | **0.960** | **0.958** | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 |
| **Gemini-2.0-Flash (exp)** | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - |
| **MiniMax-Text-01** | 0.963 | **0.961** | 0.953 | 0.954 | 0.943 | **0.947** | **0.945** | **0.928** | **0.910** |

#### LongBench v2
| **Model**                  | **overall** | **easy** | **hard** | **short** | **medium** | **long** |
|----------------------------|-------------|----------|----------|------------|------------|----------|
| Human                      | 53.7        | 100.0    | 25.1     | 47.2       | 59.1       | 53.7     |
| **w/ CoT**                 |             |          |          |            |            |          |
| GPT-4o (11-20)             | 51.4        | 54.2     | 49.7     | 59.6       | 48.6       | 43.5     |
| Claude-3.5-Sonnet (10-22)  | 46.7        | 55.2     | 41.5     | 53.9       | 41.9       | 44.4     |
| Deepseek-V3                | -           | -        | -        | -          | -          | -        |
| Qwen2.5-72B-Inst.          | 43.5        | 47.9     | 40.8     | 48.9       | 40.9       | 39.8     |
| **MiniMax-Text-01**        | **56.5**    | **66.1** | **50.5** | **61.7**   | **56.7**   | **47.2** |
| **w/o CoT**                |             |          |          |            |            |          |
| GPT-4o (11-20)             | 50.1        | 57.4     | 45.6     | 53.3       | 52.4       | 40.2     |
| Claude-3.5-Sonnet (10-22)  | 41.0        | 46.9     | 37.3     | 46.1       | 38.6       | 37.0     |
| Deepseek-V3                | 48.7        | -        | -        | -          | -          | -        |
| Qwen2.5-72B-Inst.          | 42.1        | 42.7     | 41.8     | 45.6       | 38.1       | **44.4** |
| **MiniMax-Text-01**        | **52.9**    | **60.9** | **47.9** | **58.9**   | **52.6**   | 43.5     |

#### MTOB
| **Context Type** | **no context** | **half book** | **full book** | **Δ half book** | **Δ full book** |
|------------------|----------------|---------------|---------------|------------------|-----------------|
| **eng → kalam (ChrF)** | | | | | |
| GPT-4o (11-20) | 9.90 | **54.30** | - | 44.40 | - |
| Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 |
| Gemini-1.5-Pro (002) | 16.79 | 53.68 | **57.90** | 36.89 | 41.11 |
| Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 |
| Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 |
| **MiniMax-Text-01** | 6.0 | 51.74 | 51.60 | **45.7** | **45.6** |
| **kalam → eng (BLEURT)** | | | | | |
| GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - |
| Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 |
| Gemini-1.5-Pro (002) | 32.02 | **61.52** | **63.09** | **29.50** | **31.07** |
| Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 |
| Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 |
| **MiniMax-Text-01** | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 |


## 4. Quickstart
Here we provide a simple example of loading the tokenizer and model to generate content.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig

# load hf config
hf_config = AutoConfig.from_pretrained("MiniMax-Text-01", trust_remote_code=True)

# quantization config, int8 is recommended
quantization_config =  QuantoConfig(
            weights="int8",
            modules_to_not_convert=[
                "lm_head",
                "embed_tokens",
            ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
            + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
        )

# set device map
device_map = {
    'model.embed_tokens': 'cuda:0',
    'model.norm': f'cuda:{world_size - 1}',
    'lm_head': f'cuda:{world_size - 1}'
}
# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
for i in range(world_size):
    for j in range(layers_per_device):
        device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MiniMax-Text-01")
prompt = "Hello!"
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
    {"role": "user", "content": [{"type": "text", "text": prompt}]},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
# tokenize and move to device
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")

# load bfloat16 model, move to device, and apply quantization
quantized_model = AutoModelForCausalLM.from_pretrained(
    "MiniMax-Text-01",
    torch_dtype="bfloat16",
    device_map=device_map,
    quantization_config=quantization_config,
    trust_remote_code=True,
    offload_buffers=True,
)

# generate response
generation_config = GenerationConfig(
    max_new_tokens=20,
    eos_token_id=200020,
    use_cache=True,
)
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
print(f"generated_ids: {generated_ids}")
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## 5. Chatbot & API
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://intl.minimaxi.com) for developers.

Contact us at [[email protected]](mailto:[email protected]).