MiniMax-Text-01 / README.md

Initial Commit

e1640d7 4 days ago

16.7 kB

	<div align="center">
	<img src="figures/MiniMaxLogo.png" width="60%" alt="MiniMax-Text-01" />
	</div>
	<hr>

	<div align="center" style="line-height: 1;">
	<a href="https://www.minimaxi.com/en" target="_blank" style="margin: 2px;">
	<img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
	</a>
	<a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
	<img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MinMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>
	<div align="center" style="line-height: 1;">
	<a href="https://www.hailuo.ai/" target="_blank" style="margin: 2px;">
	<img alt="Chat" src="https://img.shields.io/badge/Chat-_Hailuo AI-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgMzc1LjE0IDM3NS4xNCI+PGRlZnM+PHN0eWxlPi5jbHMtMXtmaWxsOnVybCgjdW5uYW1lZC1ncmFkaWVudCk7fTwvc3R5bGU+PGxpbmVhckdyYWRpZW50IGlkPSJ1bm5hbWVkLWdyYWRpZW50IiB4MT0iOC40MiIgeTE9IjEzLjgxIiB4Mj0iNDI5LjY1IiB5Mj0iNDIyLjM3IiBncmFkaWVudFVuaXRzPSJ1c2VyU3BhY2VPblVzZSI+PHN0b3Agb2Zmc2V0PSIwLjA5IiBzdG9wLWNvbG9yPSIjZmZhYjBjIi8+PHN0b3Agb2Zmc2V0PSIwLjMxIiBzdG9wLWNvbG9yPSIjZmY1NTM4Ii8+PHN0b3Agb2Zmc2V0PSIwLjQ2IiBzdG9wLWNvbG9yPSIjZTk0MDVkIi8+PHN0b3Agb2Zmc2V0PSIwLjc1IiBzdG9wLWNvbG9yPSIjZDI2NmRhIi8+PHN0b3Agb2Zmc2V0PSIwLjg5IiBzdG9wLWNvbG9yPSIjZDU4NGVmIi8+PC9saW5lYXJHcmFkaWVudD48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMzc1LjE0LDE4Ny41N0MzNzUuMTQsODQsMjkwLjc0LS4yNiwxODcuMDksMCw4NC4yNi4yNi4yNiw4NC4yNSwwLDE4Ny4wOWMtLjI2LDEwMy42NSw4NCwxODgsMTg3LjU3LDE4OEgzMTAuODJBNjQuMjEsNjQuMjEsMCwwLDAsMzc1LDMxMC45M1YxOTMuODJoMEMzNzUuMDksMTkxLjc5LDM3NS4xNCwxODkuNjcsMzc1LjE0LDE4Ny41N1ptLTI4NCwxMDQuMTdjLTI5Ljg2LTI1LjQ5LTQ4LjI2LTY2LjI3LTQ3LjQtMTA3Ljg1cS4wOS00LjM4LjQ2LTguNzNWMTc1YzQuMzItNDkuNiwzNi4zNy05NS44OCw4MS4yOS0xMTcuMzZTMjI2LjUyLDQwLjIxLDI2Ny44NSw2OHM2Ni4zMiw3OC4yMSw2My40LDEyNy45MmExNzgsMTc4LDAsMCwxLTUuMTQsMzIuMjVjLTEsNC4yLTIuMyw4LjU3LTUuMjgsMTEuNzJzLTguMiw0LjYtMTEuNzMsMi4wOWMtMy4zNy0yLjQxLTMuODctNy4xMi00LjE2LTExLjI1LTIuMzMtMzMuMzctMTEuMjQtNjcuNzYtMzMuNzktOTIuNDdhMTAzLjY3LDEwMy42NywwLDAsMC02Ni4zOC0zMi44NEExMDcuMTksMTA3LjE5LDAsMCwwLDEzMy4yMiwxMjVDMTE2LDEzNy4yNywxMDIuNTUsMTU0Ljg4LDk2LDE3NXMtNS44Niw0Mi42MSwyLjcxLDYxLjkzYTgxLjg5LDgxLjg5LDAsMCwwLDI5LjcxLDM1YzIyLjk0LDE1LjA2LDU0LjMxLDE3LjIsNzguMTQsMy42czM4LjA3LTQzLjEsMzItNjkuODZTMjA1LjQsMTU4LDE3OC4xMSwxNjAuODRjLTQuMTYuNDMtMTAuMTMsMC0xMC4yOC00LjIxLS4xMi0zLjI0LDMuNzctNC45NCw3LTUuNTIsMjcuNjgtNSw1Ny4zNCw5LjA5LDcyLjUzLDMyLjc3czE2LDU1LjQxLDMuNTYsODAuNjYtMzcsNDMuNjktNjQuMzYsNTAuMzVDMTQ5LjY4LDMyMy44NywxMTYuMzEsMzEzLjI1LDkxLjExLDI5MS43NFoiLz48L3N2Zz4=&logoWidth=16" style="display: inline-block; vertical-align: middle;"/>
	</a>
	<a href="https://intl.minimaxi.com" style="margin: 2px;">
	<img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>
	<div align="center" style="line-height: 1;">
	<a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/LICENSE" style="margin: 2px;">
	<img alt="License" src="https://img.shields.io/badge/📜_License-Model_Agreement-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>


	# MiniMax-Text-01

	## 1. Introduction

	MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.

	<p align="center">
	<img width="100%" src="figures/TextBench.png">
	</p>

	## 2. Model Architecture

	The architecture of MiniMax-Text-01 is briefly described as follows:
	- Total Parameters: 456B
	- Activated Parameters per Token: 45.9B
	- Number Layers: 80
	- Hybrid Attention: a softmax attention is positioned after every 7 lightning attention.
	- Number of attention heads: 64
	- Attention head dimension: 128
	- Mixture of Experts:
	- Number of experts: 32
	- Expert hidden dimension: 9216
	- Top-2 routing strategy
	- Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000
	- Hidden Size: 6144
	- Vocab Size: 200,064

	## 3. Evaluation

	### Core Academic Benchmarks

	\| Tasks \| GPT-4o (11-20) \| Claude-3.5-Sonnet (10-22) \| Gemini-1.5-Pro (002) \| Gemini-2.0-Flash (exp) \| Qwen2.5-72B-Inst. \| DeepSeek-V3 \| Llama-3.1-405B-Inst. \| MiniMax-Text-01 \|
	\|-------------------------------\|--------------------\|-------------------------------\|--------------------------\|----------------------------\|-----------------------\|-----------------\|--------------------------\|---------------------\|
	\| General \| \| \| \| \| \| \| \| \|
	\| MMLU<sup></sup> \| 85.7 \| 88.3 \| 86.8 \| 86.5 \| 86.1 \| 88.5 \| 88.6* \| 88.5 \|
	\| MMLU-Pro<sup></sup> \| 74.4 \| 78.0* \| 75.8 \| 76.4 \| 71.1 \| 75.9 \| 73.3 \| 75.7 \|
	\| SimpleQA \| 39.0 \| 28.1 \| 23.4 \| 26.6 \| 10.3 \| 24.9 \| 23.2 \| 23.7 \|
	\| C-SimpleQA \| 64.6 \| 56.8 \| 59.4 \| 63.3 \| 52.2 \| 64.8 \| 54.7 \| 67.4 \|
	\| IFEval _(avg)_ \| 84.1 \| 90.1 \| 89.4 \| 88.4 \| 87.2 \| 87.3 \| 86.4 \| 89.1 \|
	\| Arena-Hard \| 92.4 \| 87.6 \| 85.3 \| 72.7 \| 81.2 \| 91.4 \| 63.5 \| 89.1 \|
	\| Reasoning \| \| \| \| \| \| \| \| \|
	\| GPQA<sup></sup> _(diamond)_ \| 46.0 \| 65.0* \| 59.1 \| 62.1 \| 49.0 \| 59.1 \| 50.7 \| 54.4 \|
	\| DROP<sup></sup> _(F1)_ \| 89.2 \| 88.8 \| 89.2 \| 89.3 \| 85.0 \| 91.0 \| 92.5* \| 87.8 \|
	\| Mathematics \| \| \| \| \| \| \| \| \|
	\| GSM8k<sup></sup> \| 95.6 \| 96.9* \| 95.2 \| 95.4 \| 95.8 \| 96.7 \| 96.7 \| 94.8 \|
	\| MATH<sup></sup> \| 76.6 \| 74.1 \| 84.6* \| 83.9 \| 81.8 \| 84.6 \| 73.8 \| 77.4 \|
	\| Coding \| \| \| \| \| \| \| \| \|
	\| MBPP + \| 76.2 \| 75.1 \| 75.4 \| 75.9 \| 77.0 \| 78.8 \| 73.0 \| 71.7 \|
	\| HumanEval \| 90.2 \| 93.7 \| 86.6 \| 89.6 \| 86.6 \| 92.1 \| 89.0 \| 86.9 \|

	<sup>*</sup> Evaluated following a _0-shot CoT_ setting.

	### Long Benchmarks
	#### 4M Needle In A Haystack Test
	<p align="center">
	<img width="90%" src="figures/niah.png">
	</p>

	#### Ruler
	\| Model \| 4k \| 8k \| 16k \| 32k \| 64k \| 128k \| 256k \| 512k \| 1M \|
	\|-------\|----\|----\|-----\|-----\|-----\|------\|------\|------\|----\|
	\| GPT-4o (11-20) \| 0.970 \| 0.921 \| 0.890 \| 0.888 \| 0.884 \| - \| - \| - \| - \|
	\| Claude-3.5-Sonnet (10-22) \| 0.965 \| 0.960 \| 0.957 \| 0.950 \| 0.952 \| 0.938 \| - \| - \| - \|
	\| Gemini-1.5-Pro (002) \| 0.962 \| 0.960 \| 0.960 \| 0.958 \| 0.938 \| 0.917 \| 0.916 \| 0.861 \| 0.850 \|
	\| Gemini-2.0-Flash (exp) \| 0.960 \| 0.960 \| 0.951 \| 0.957 \| 0.937 \| 0.860 \| 0.797 \| 0.709 \| - \|
	\| MiniMax-Text-01 \| 0.963 \| 0.961 \| 0.953 \| 0.954 \| 0.943 \| 0.947 \| 0.945 \| 0.928 \| 0.910 \|

	#### LongBench v2
	\| Model \| overall \| easy \| hard \| short \| medium \| long \|
	\|----------------------------\|-------------\|----------\|----------\|------------\|------------\|----------\|
	\| Human \| 53.7 \| 100.0 \| 25.1 \| 47.2 \| 59.1 \| 53.7 \|
	\| w/ CoT \| \| \| \| \| \| \|
	\| GPT-4o (11-20) \| 51.4 \| 54.2 \| 49.7 \| 59.6 \| 48.6 \| 43.5 \|
	\| Claude-3.5-Sonnet (10-22) \| 46.7 \| 55.2 \| 41.5 \| 53.9 \| 41.9 \| 44.4 \|
	\| Deepseek-V3 \| - \| - \| - \| - \| - \| - \|
	\| Qwen2.5-72B-Inst. \| 43.5 \| 47.9 \| 40.8 \| 48.9 \| 40.9 \| 39.8 \|
	\| MiniMax-Text-01 \| 56.5 \| 66.1 \| 50.5 \| 61.7 \| 56.7 \| 47.2 \|
	\| w/o CoT \| \| \| \| \| \| \|
	\| GPT-4o (11-20) \| 50.1 \| 57.4 \| 45.6 \| 53.3 \| 52.4 \| 40.2 \|
	\| Claude-3.5-Sonnet (10-22) \| 41.0 \| 46.9 \| 37.3 \| 46.1 \| 38.6 \| 37.0 \|
	\| Deepseek-V3 \| 48.7 \| - \| - \| - \| - \| - \|
	\| Qwen2.5-72B-Inst. \| 42.1 \| 42.7 \| 41.8 \| 45.6 \| 38.1 \| 44.4 \|
	\| MiniMax-Text-01 \| 52.9 \| 60.9 \| 47.9 \| 58.9 \| 52.6 \| 43.5 \|

	#### MTOB
	\| Context Type \| no context \| half book \| full book \| Δ half book \| Δ full book \|
	\|------------------\|----------------\|---------------\|---------------\|------------------\|-----------------\|
	\| eng → kalam (ChrF) \| \| \| \| \| \|
	\| GPT-4o (11-20) \| 9.90 \| 54.30 \| - \| 44.40 \| - \|
	\| Claude-3.5-Sonnet (10-22) \| 20.22 \| 53.62 \| 55.65 \| 33.39 \| 35.42 \|
	\| Gemini-1.5-Pro (002) \| 16.79 \| 53.68 \| 57.90 \| 36.89 \| 41.11 \|
	\| Gemini-2.0-Flash (exp) \| 12.20 \| 49.50 \| 53.30 \| 37.30 \| 41.10 \|
	\| Qwen-Long \| 16.55 \| 48.48 \| 45.94 \| 31.92 \| 29.39 \|
	\| MiniMax-Text-01 \| 6.0 \| 51.74 \| 51.60 \| 45.7 \| 45.6 \|
	\| kalam → eng (BLEURT) \| \| \| \| \| \|
	\| GPT-4o (11-20) \| 33.20 \| 58.30 \| - \| 25.10 \| - \|
	\| Claude-3.5-Sonnet (10-22) \| 31.42 \| 59.70 \| 62.30 \| 28.28 \| 30.88 \|
	\| Gemini-1.5-Pro (002) \| 32.02 \| 61.52 \| 63.09 \| 29.50 \| 31.07 \|
	\| Gemini-2.0-Flash (exp) \| 33.80 \| 57.50 \| 57.00 \| 23.70 \| 23.20 \|
	\| Qwen-Long \| 30.13 \| 53.14 \| 32.15 \| 23.01 \| 2.02 \|
	\| MiniMax-Text-01 \| 33.65 \| 57.10 \| 58.00 \| 23.45 \| 24.35 \|


	## 4. Quickstart
	Here we provide a simple example of loading the tokenizer and model to generate content.
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig

	# load hf config
	hf_config = AutoConfig.from_pretrained("MiniMax-Text-01", trust_remote_code=True)

	# quantization config, int8 is recommended
	quantization_config = QuantoConfig(
	weights="int8",
	modules_to_not_convert=[
	"lm_head",
	"embed_tokens",
	] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
	+ [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
	)

	# set device map
	device_map = {
	'model.embed_tokens': 'cuda:0',
	'model.norm': f'cuda:{world_size - 1}',
	'lm_head': f'cuda:{world_size - 1}'
	}
	# assume 8 GPUs
	world_size = 8
	layers_per_device = hf_config.num_hidden_layers // world_size
	for i in range(world_size):
	for j in range(layers_per_device):
	device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'

	# load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("MiniMax-Text-01")
	prompt = "Hello!"
	messages = [
	{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
	{"role": "user", "content": [{"type": "text", "text": prompt}]},
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	# tokenize and move to device
	model_inputs = tokenizer(text, return_tensors="pt").to("cuda")

	# load bfloat16 model, move to device, and apply quantization
	quantized_model = AutoModelForCausalLM.from_pretrained(
	"MiniMax-Text-01",
	torch_dtype="bfloat16",
	device_map=device_map,
	quantization_config=quantization_config,
	trust_remote_code=True,
	offload_buffers=True,
	)

	# generate response
	generation_config = GenerationConfig(
	max_new_tokens=20,
	eos_token_id=200020,
	use_cache=True,
	)
	generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
	print(f"generated_ids: {generated_ids}")
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]
	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	## 5. Chatbot & API
	For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://intl.minimaxi.com) for developers.

	Contact us at [[email protected]](mailto:[email protected]).