vibestudio-HQ commited on
Commit
4579561
·
verified ·
1 Parent(s): 57c0d1e

Upload MiniMax M2 Model SGLang Deployment Guide.md

Browse files
docs/MiniMax M2 Model SGLang Deployment Guide.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MiniMax M2 Model SGLang Deployment Guide
2
+
3
+ We recommend using [SGLang](https://github.com/sgl-project/sglang) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. SGLang is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing SGLang's official documentation to check hardware compatibility before deployment.
4
+
5
+ ## Applicable Models
6
+
7
+ This document applies to the following models. You only need to change the model name during deployment.
8
+
9
+ - [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
10
+
11
+ The deployment process is illustrated below using MiniMax-M2 as an example.
12
+
13
+ ## System Requirements
14
+
15
+ - OS: Linux
16
+
17
+ - Python: 3.9 \- 3.12
18
+
19
+ - GPU:
20
+
21
+ - compute capability 7.0 or higher
22
+
23
+ - Memory requirements: 220 GB for weights, 240 GB per 1M context tokens
24
+
25
+ The following are recommended configurations; actual requirements should be adjusted based on your use case:
26
+
27
+ - 4x 96GB GPUs: Supported context length of up to 400K tokens.
28
+
29
+ - 8x 144GB GPUs: Supported context length of up to 3M tokens.
30
+
31
+ ## Deployment with Python
32
+
33
+ It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
34
+
35
+ We recommend installing SGLang in a fresh Python environment:
36
+
37
+ ```shell
38
+ git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
39
+ cd sglang
40
+
41
+ # Install the python packages
42
+ pip install --upgrade pip
43
+ pip install -e "python"
44
+ ```
45
+
46
+ Run the following command to start the SGLang server. SGLang will automatically download and cache the MiniMax-M2 model from Hugging Face.
47
+
48
+ 4-GPU deployment command:
49
+
50
+ ```shell
51
+ python -m sglang.launch_server \
52
+ --model-path MiniMaxAI/MiniMax-M2 \
53
+ --tp-size 4 \
54
+ --tool-call-parser minimax-m2 \
55
+ --reasoning-parser minimax-append-think \
56
+ --host 0.0.0.0 \
57
+ --trust-remote-code \
58
+ --port 8000 \
59
+ --mem-fraction-static 0.85
60
+ ```
61
+
62
+ 8-GPU deployment command:
63
+
64
+ ```shell
65
+ python -m sglang.launch_server \
66
+ --model-path MiniMaxAI/MiniMax-M2 \
67
+ --tp-size 8 \
68
+ --ep-size 8 \
69
+ --tool-call-parser minimax-m2 \
70
+ --trust-remote-code \
71
+ --host 0.0.0.0 \
72
+ --reasoning-parser minimax-append-think \
73
+ --port 8000 \
74
+ --mem-fraction-static 0.85
75
+ ```
76
+
77
+ ## Testing Deployment
78
+
79
+ After startup, you can test the SGLang OpenAI-compatible API with the following command:
80
+
81
+ ```shell
82
+ curl http://localhost:8000/v1/chat/completions \
83
+ -H "Content-Type: application/json" \
84
+ -d '{
85
+ "model": "MiniMaxAI/MiniMax-M2",
86
+ "messages": [
87
+ {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
88
+ {"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
89
+ ]
90
+ }'
91
+ ```
92
+
93
+ ## Common Issues
94
+
95
+ ### Hugging Face Network Issues
96
+
97
+ If you encounter network issues, you can set up a proxy before pulling the model.
98
+
99
+ ```shell
100
+ export HF_ENDPOINT=https://hf-mirror.com
101
+ ```
102
+
103
+ ### MiniMax-M2 model is not currently supported
104
+
105
+ Please upgrade to the latest stable version, \>= v0.5.4.post3.
106
+