Model: Minimax M2 - chat support #16946

pwilkin · 2025-11-02T16:50:26Z

Adds chat support to Minimax M2 together with tool calling and simple reasoning (non-interleaved).

Uses fixed Unsloth template (https://huggingface.co/unsloth/MiniMax-M2-GGUF)

Includes upstream minja fix: google/minja#87

CISC · 2025-11-02T18:03:53Z

You should submit it to https://github.com/ochafik/minja

pwilkin · 2025-11-02T18:32:04Z

You should submit it to https://github.com/ochafik/minja

Ah, forgot about that. Done.

pwilkin · 2025-11-02T18:32:46Z

aldehir · 2025-11-02T23:32:49Z

common/chat.cpp

+
+static void common_chat_parse_minimax_m2(common_chat_msg_parser & builder) {
+    // Parse thinking tags first - this handles the main reasoning content
+    // Chat template doesn't seem to handle interleaving thinking, so we don't worry about it either


Are we sure we're using the correct definition of interleaved thinking here? I don't think it means the CoT is interleaved with the content during generation, but rather it is interleaved in the entire prompt during multi-turn tool calling sessions. It seems to behave very similarly to gpt-oss. None of my testing, granted at Q2_XL, seems to indicate that the CoT is interleaved during generation. It's also only applied if the last message is a tool response.

Using the proposed fix for tool response support by @ochafik, it works as is if I pass reasoning_content with the assistant messages. Without this fix, the tool messages are transformed to user by the polyfill.

Template Example

curl -X POST http://localhost:8080/apply-template \ -H "Content-Type: application/json" \ -d '{ "messages": [ { "role": "system", "content": "You are a weather man" }, { "role": "user", "content": "Can you compare the weather at New York and Los Angeles?" }, { "role": "assistant", "reasoning_content": "I need to get the weather of New York and Los Angeles, let me do New York first.", "tool_calls": [ { "id": "1", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\": \"New York\"}" } } ] }, { "role": "tool", "tool_call_id": "1", "content": "50 F" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a specified city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "The city name, e.g. San Francisco" } }, "required": ["city"] } } } ] }'

]~b]system You are a weather man # Tools You may call one or more tools to assist with the user query. Here are the tools available in JSONSchema format: <tools> <tool>{"name": "get_weather", "description": "Get the current weather for a specified city", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The city name, e.g. San Francisco"}}, "required": ["city"]}}</tool> </tools> When making tool calls, use XML format to invoke tools and pass parameters: <minimax:tool_call> <invoke name="tool-name-1"> <parameter name="param-key-1">param-value-1</parameter> <parameter name="param-key-2">param-value-2</parameter> ... </invoke> </minimax:tool_call>[e~[ ]~b]user Can you compare the weather at New York and Los Angeles?[e~[ ]~b]ai <think> I need to get the weather of New York and Los Angeles, let me do New York first. </think> <minimax:tool_call> <invoke name="get_weather"> <parameter name="city">New York</parameter> </invoke> </minimax:tool_call>[e~[ ]~b]tool <response>50 F</response>[e~[ ]~b]ai <think>

It does place the burden of returning reasoning_content on the clients.

@aldehir That's actually a good clarification - I was somehow convinced that interlaving reasoning actually meant content blocks with multiple reasoning / content chunks intertwined (I think that the Anthropic protocol allows something like that). We shouldn't have a problem with it if it's just tool calls intertwined with reasoning blocks.

@hksdpc255 please take a look at this discussion, since I feel you're repeating the same error (with using reasoning-format none + literally outputting the opening <think> tag).

@pwilkin Thanks for pointing that out. I actually had the same misunderstanding about interleaved thinking at first.

Because of that, I initially implemented full support for reasoning and normal content being interleaved during generation. Later I realized that this wasn’t really required in our current setup. But since I already had a custom test harness for it, I verified that my implementation can indeed handle such interleaved reasoning/content streams. It might still be useful in the future if models start emitting that pattern more often.

As for --reasoning-format none, my understanding was that it means not to treat reasoning specially, but to include it directly in the normal assistant message. This interpretation seemed consistent with how some chat templates (like GLM 4.5 / 4.6 and MiniMax M2) automatically detect <think> blocks in the main content, extract them into reasoning_content, and remove them from the visible answer. That behavior is quite helpful for clients that don’t support returning reasoning_content back to the server — which I believe is the case for most code agents.

I’m currently using --reasoning-format none to serve the Zed editor, and in that setup, MiniMax M2 performs impressively well on fairly complex tasks.

However, I might have misunderstood the actual purpose of --reasoning-format none. If so, I’d really appreciate clarification. And if it’s not meant for this kind of use case, I think introducing a new --reasoning-format mode to explicitly support it would make a lot of sense.

common/chat.cpp

hksdpc255 · 2025-11-03T06:25:55Z

Could we take a look at PR #16932? I’ve already implemented Minimax M2 tool calling there.

…al whitespace

pwilkin · 2025-11-04T18:12:27Z

@CISC @aldehir I think this is ready for final review, there are apparently problems with the core Minimax model (see #16945 ) but let's not wait for that because I really want to get a big refactoring of the chat files going, they're a complete mess.

commit 23d4bb7 Author: Piotr Wilkin <[email protected]> Date: Tue Nov 4 19:07:49 2025 +0100 Add proper handling of optional parameters with test commit 9481289 Author: Piotr Wilkin <[email protected]> Date: Sun Nov 2 19:30:35 2025 +0100 Whitespace. commit 1a351a0 Author: Piotr Wilkin <[email protected]> Date: Sun Nov 2 17:34:47 2025 +0100 Use Unsloth template, add extra test parameters for ignoring additional whitespace commit de67255 Author: Piotr Wilkin <[email protected]> Date: Sat Nov 1 22:33:40 2025 +0100 On the other hand, this is probably safer commit 4e58382 Author: Piotr Wilkin <[email protected]> Date: Sat Nov 1 22:32:20 2025 +0100 No newline after <think> commit e21f87e Author: Piotr Wilkin <[email protected]> Date: Sat Nov 1 22:19:48 2025 +0100 Minimax M2 chat template support

pwilkin · 2025-11-06T17:32:55Z

Superseded by #16932

pwilkin requested a review from ggerganov as a code owner November 2, 2025 16:50

pwilkin requested review from CISC and ggerganov and removed request for ggerganov November 2, 2025 16:50

github-actions bot added the testing Everything test related label Nov 2, 2025

aldehir reviewed Nov 2, 2025

View reviewed changes

aldehir mentioned this pull request Nov 2, 2025

chat: Allow reasoning_content to be passed back #16934

Closed

hksdpc255 added a commit to hksdpc255/llama.cpp that referenced this pull request Nov 3, 2025

add test copied from ggml-org#16946

87c1ed9

pwilkin added 6 commits November 4, 2025 19:10

Minimax M2 chat template support

e21f87e

No newline after <think>

4e58382

On the other hand, this is probably safer

de67255

Use Unsloth template, add extra test parameters for ignoring addition…

1a351a0

…al whitespace

Whitespace.

9481289

Add proper handling of optional parameters with test

23d4bb7

pwilkin force-pushed the minimax-chat branch from e79520d to 23d4bb7 Compare November 4, 2025 18:10

DajanaV mentioned this pull request Nov 4, 2025

UPSTREAM PR #16946: Model: Minimax M2 - chat support auroralabs-loci/llama.cpp#83

Open

pwilkin mentioned this pull request Nov 4, 2025

Model: Minimax M2 - chat format #16904

Open

pwilkin closed this Nov 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model: Minimax M2 - chat support #16946

Model: Minimax M2 - chat support #16946

pwilkin commented Nov 2, 2025

Uh oh!

CISC commented Nov 2, 2025

Uh oh!

pwilkin commented Nov 2, 2025

Uh oh!

pwilkin commented Nov 2, 2025

Uh oh!

aldehir Nov 2, 2025

Uh oh!

pwilkin Nov 3, 2025

Uh oh!

pwilkin Nov 4, 2025

Uh oh!

hksdpc255 Nov 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

hksdpc255 commented Nov 3, 2025 •

edited

Loading

Uh oh!

pwilkin commented Nov 4, 2025

Uh oh!

pwilkin commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Model: Minimax M2 - chat support #16946

Model: Minimax M2 - chat support #16946

Conversation

pwilkin commented Nov 2, 2025

Uh oh!

CISC commented Nov 2, 2025

Uh oh!

pwilkin commented Nov 2, 2025

Uh oh!

pwilkin commented Nov 2, 2025

Uh oh!

aldehir Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

pwilkin Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

pwilkin Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

hksdpc255 Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hksdpc255 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Nov 4, 2025

Uh oh!

pwilkin commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hksdpc255 Nov 5, 2025 •

edited

Loading

hksdpc255 commented Nov 3, 2025 •

edited

Loading