Loading weigts error when running MiniMax-2.1 with sglang using pipeline parallelism
#18
by
tuo02
- opened
When I`m running MiniMax-2.1 on H200 with sglang command:python3 -m sglang.launch_server --model /volume/models/MiniMaxAI/MiniMax-M2.1/snapshots/17f852dfda7d63c61b1107d47552bb30488ffbee --trust-remote-code --tp-size 2 --pp-size 2 --mem-fraction-static 0.85 --chunked-prefill-size 20480 --page-size 64 --cuda-graph-max-bs 64 --enable-metrics
the error occurs:
[2026-01-12 02:39:23 PP3 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 2653, in run_scheduler_process
scheduler = Scheduler(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 316, in __init__
self.tp_worker = TpModelWorker(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 245, in __init__
self._model_runner = ModelRunner(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 361, in __init__
self.initialize(min_per_gpu_memory)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 436, in initialize
self.load_model()
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 804, in load_model
self.model = get_model(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_loader/__init__.py", line 28, in get_model
return loader.load_model(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_loader/loader.py", line 600, in load_model
self.load_weights_and_postprocess(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_loader/loader.py", line 608, in load_weights_and_postprocess
model.load_weights(weights)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/minimax_m2.py", line 882, in load_weights
param = params_dict[name]
KeyError: 'model.layers.37.block_sparse_moe.e_score_correction_bias'
[2026-01-12 02:39:23] Received sigquit from a child process. It usually means the child failed.
PP is not supported for M2 in vLLM and SGLang currently. We recommend using TP or EP instead, as they have been fully implemented and verified.