-
Notifications
You must be signed in to change notification settings - Fork 13.3k
qwen3-coder tool call parser #15019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
qwen3-coder tool call parser #15019
Conversation
![]() Note the excessive 0s in the test 6 response. ╰─ ./test_tool_calls.sh
Test 1: Simple single parameter function
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"get_current_time","arguments":"{\"timezone\":\"UTC\"}"},"id":"hjjtHH2uyKY2JE1iOAMpO8geQI3a7Kq8"}]}}],"created":1754076607,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":23,"prompt_tokens":291,"total_tokens":314},"id":"chatcmpl-Oh1CUFa7Uje5DBZdlPn2ymvUHHV1VfGO","timings":{"prompt_n":291,"prompt_ms":384.05,"prompt_per_token_ms":1.3197594501718213,"prompt_per_second":757.7138393438355,"predicted_n":23,"predicted_ms":386.682,"predicted_per_token_ms":16.81226086956522,"predicted_per_second":59.48039991517578}}
---
Test 2: Multiple parameters with different types
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"search_web","arguments":"{\"query\":\"Python machine learning tutorials\",\"max_results\":10,\"safe_search\":true}"},"id":"KaUBUGLOC8tJ7rRejdwoGxgnTIgX7Pcx"}]}}],"created":1754076608,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":44,"prompt_tokens":352,"total_tokens":396},"id":"chatcmpl-tbE2CyT6u51I5w5hvmTkz5IAWIEqfopF","timings":{"prompt_n":313,"prompt_ms":380.837,"prompt_per_token_ms":1.2167316293929713,"prompt_per_second":821.8739250650541,"predicted_n":44,"predicted_ms":785.807,"predicted_per_token_ms":17.85925,"predicted_per_second":55.993392779652}}
---
Test 3: Multiple tools available
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"calculate","arguments":"{\"expression\":25}"},"id":"gyHKa9Nu1RXaemgir1NbNwk2kJkhUN93"}]}}],"created":1754076609,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":27,"prompt_tokens":444,"total_tokens":471},"id":"chatcmpl-vDHl8fm1kgOywcEFRWNN4ib6lQu3WwWu","timings":{"prompt_n":405,"prompt_ms":447.868,"prompt_per_token_ms":1.105846913580247,"prompt_per_second":904.2842980521045,"predicted_n":27,"predicted_ms":421.646,"predicted_per_token_ms":15.61651851851852,"predicted_per_second":64.0347590158569}}
---
Test 4: Complex parameters with nested objects
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"send_email","arguments":"{\"to\":\"[email protected]\",\"subject\":\"Meeting Tomorrow\",\"body\":\"Hi John, just confirming our meeting scheduled for tomorrow. Best regards!\",\"attachments\":[]}"},"id":"0cbmSr1KCrjnI2i05u9d2kFtDL6mxy2i"}]}}],"created":1754076611,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":67,"prompt_tokens":395,"total_tokens":462},"id":"chatcmpl-z1IWL8zTqJ0NYAUgT0QCyClvpDGYFSW1","timings":{"prompt_n":356,"prompt_ms":391.67,"prompt_per_token_ms":1.1001966292134833,"prompt_per_second":908.9284346516199,"predicted_n":67,"predicted_ms":1065.706,"predicted_per_token_ms":15.906059701492536,"predicted_per_second":62.86912150255324}}
---
Test 5: No tools needed scenario
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"The capital of France is Paris."}}],"created":1754076611,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":8,"prompt_tokens":290,"total_tokens":298},"id":"chatcmpl-uCvewROkNZys70V5XinqN3IX1au86OqK","timings":{"prompt_n":251,"prompt_ms":317.8,"prompt_per_token_ms":1.2661354581673308,"prompt_per_second":789.80490874764,"predicted_n":8,"predicted_ms":106.817,"predicted_per_token_ms":13.352125,"predicted_per_second":74.89444564067517}}
---
Test 6: Tool with enum parameter
{"choices":[{"finish_reason":"length","index":0,"message":{"role":"assistant","content":"<tool_call>\n<function=set_temperature>\n<parameter=temperature>\n}}],"created":1754076945,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":16384,"prompt_tokens":339,"total_tokens":16723},"id":"chatcmpl-d1tMubhsORcuT4orGUXXflfsnYMPo5Qd","timings":{"prompt_n":300,"prompt_ms":375.494,"prompt_per_token_ms":1.2516466666666668,"prompt_per_second":798.9475198005827,"predicted_n":16384,"predicted_ms":333558.829,"predicted_per_token_ms":20.358815246582033,"predicted_per_second":49.11877178942848}}
---
Test 7: Tool call with reasoning
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"get_weather_forecast","arguments":"{\"location\":\"Seattle\",\"date\":2023}"},"id":"99zJfjF9AbRY0E4xuxIkwEo41HP1V9JT"}]}}],"created":1754076946,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":42,"prompt_tokens":341,"total_tokens":383},"id":"chatcmpl-rYYzcsvPxy662qJ1YUOW0Nwcp7J6ChKd","timings":{"prompt_n":302,"prompt_ms":413.176,"prompt_per_token_ms":1.3681324503311258,"prompt_per_second":730.9233837396171,"predicted_n":42,"predicted_ms":664.164,"predicted_per_token_ms":15.81342857142857,"predicted_per_second":63.2373931739751}}% |
#!/bin/bash
# Test 1: Simple single parameter function
echo "Test 1: Simple single parameter function"
curl -X POST http://127.0.0.1:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-30b-a3b-instruct",
"messages": [
{"role": "user", "content": "What is the current time?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "Get the current time",
"parameters": {
"type": "object",
"properties": {
"timezone": {
"type": "string",
"description": "The timezone to get the time for"
}
},
"required": ["timezone"]
}
}
}
]
}'
echo -e "\n\n---\n"
# Test 2: Multiple parameters with different types
echo "Test 2: Multiple parameters with different types"
curl -X POST http://127.0.0.1:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-30b-a3b-instruct",
"messages": [
{"role": "user", "content": "Search for Python tutorials about machine learning"}
],
"tools": [
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return"
},
"safe_search": {
"type": "boolean",
"description": "Enable safe search filtering"
}
},
"required": ["query"]
}
}
}
]
}'
echo -e "\n\n---\n"
# Test 3: Multiple tools available
echo "Test 3: Multiple tools available"
curl -X POST http://127.0.0.1:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-30b-a3b-instruct",
"messages": [
{"role": "user", "content": "Calculate 25 * 4 and then convert the result to hexadecimal"}
],
"tools": [
{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform mathematical calculations",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to evaluate"
}
},
"required": ["expression"]
}
}
},
{
"type": "function",
"function": {
"name": "convert_number",
"description": "Convert a number between different bases",
"parameters": {
"type": "object",
"properties": {
"number": {
"type": "integer",
"description": "The number to convert"
},
"from_base": {
"type": "integer",
"description": "The base to convert from (2-36)"
},
"to_base": {
"type": "integer",
"description": "The base to convert to (2-36)"
}
},
"required": ["number", "from_base", "to_base"]
}
}
}
]
}'
echo -e "\n\n---\n"
# Test 4: Complex parameters with nested objects
echo "Test 4: Complex parameters with nested objects"
curl -X POST http://127.0.0.1:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-30b-a3b-instruct",
"messages": [
{"role": "user", "content": "Send an email to [email protected] about the meeting tomorrow"}
],
"tools": [
{
"type": "function",
"function": {
"name": "send_email",
"description": "Send an email message",
"parameters": {
"type": "object",
"properties": {
"to": {
"type": "string",
"description": "Recipient email address"
},
"subject": {
"type": "string",
"description": "Email subject"
},
"body": {
"type": "string",
"description": "Email body content"
},
"attachments": {
"type": "array",
"description": "List of file paths to attach",
"items": {
"type": "string"
}
}
},
"required": ["to", "subject", "body"]
}
}
}
]
}'
echo -e "\n\n---\n"
# Test 5: No tools needed scenario
echo "Test 5: No tools needed scenario"
curl -X POST http://127.0.0.1:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-30b-a3b-instruct",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and country"
}
},
"required": ["location"]
}
}
}
]
}'
echo -e "\n\n---\n"
# Test 6: Tool with enum parameter
echo "Test 6: Tool with enum parameter"
curl -X POST http://127.0.0.1:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-30b-a3b-instruct",
"messages": [
{"role": "user", "content": "Set the temperature to 72 degrees"}
],
"tools": [
{
"type": "function",
"function": {
"name": "set_temperature",
"description": "Set the temperature of a device",
"parameters": {
"type": "object",
"properties": {
"temperature": {
"type": "number",
"description": "The temperature value"
},
"unit": {
"type": "string",
"description": "The temperature unit",
"enum": ["celsius", "fahrenheit", "kelvin"]
}
},
"required": ["temperature", "unit"]
}
}
}
]
}'
echo -e "\n\n---\n"
# Test 7: Tool call with reasoning
echo "Test 7: Tool call with reasoning"
curl -X POST http://127.0.0.1:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-30b-a3b-instruct",
"messages": [
{"role": "user", "content": "I need to know if it will rain tomorrow in Seattle. Can you check the weather forecast?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather_forecast",
"description": "Get weather forecast for a specific location and date",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name"
},
"date": {
"type": "string",
"description": "The date in YYYY-MM-DD format"
}
},
"required": ["location", "date"]
}
}
}
]
}' |
Okay, it does work with ./llama-server --port 9999 --flash-attn --metrics --model Qwen3-Coder-30B-A3B-Instruct-GGUF/Qwen3-Coder-30B-A3B-Instruct-Q8_0.gguf \
--temp 0.7 \
--top-k 20 \
--top-p 0.7 \
--repeat-penalty 1.5 \
--presence_penalty 0.2 \
--n-predict 16384 \
--ctx-size 100000 \
--chat-template-file ../../models/templates/Qwen3-Coder.jinja \
--jinja ╰─ ./test_tool_calls.sh
Test 1: Simple single parameter function
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"get_current_time","arguments":"{\"timezone\":\"UTC\"}"},"id":"75b5wIBMpPzb6GuJiIVL8WmSjTxv4dPw"}]}}],"created":1754078261,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":23,"prompt_tokens":336,"total_tokens":359},"id":"chatcmpl-rYut7ARH9UuojO2rgC9abRHmPXl0b1cY","timings":{"prompt_n":336,"prompt_ms":415.761,"prompt_per_token_ms":1.2373839285714285,"prompt_per_second":808.1566092057697,"predicted_n":23,"predicted_ms":360.854,"predicted_per_token_ms":15.689304347826086,"predicted_per_second":63.737688926823594}}
---
Test 2: Multiple parameters with different types
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"search_web","arguments":"{\"query\":\"Python Machine Learning tutorial\",\"max_results\":10,\"safe_search\":true}"},"id":"AI6eUIRsqwsTf4lmPVzM5pplsPhjpqfr"}]}}],"created":1754078262,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":44,"prompt_tokens":397,"total_tokens":441},"id":"chatcmpl-QGyPqENZCvVG16BM0nMu9uNAZvEkfMQ8","timings":{"prompt_n":358,"prompt_ms":422.501,"prompt_per_token_ms":1.1801703910614525,"prompt_per_second":847.3352725792365,"predicted_n":44,"predicted_ms":791.181,"predicted_per_token_ms":17.981386363636364,"predicted_per_second":55.613064520002375}}
---
Test 3: Multiple tools available
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"calculate","arguments":"{\"expression\":25}"},"id":"6Z9h8niBG6XHUSUuzrAjjC5aOS2xIq2a"}]}}],"created":1754078263,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":27,"prompt_tokens":489,"total_tokens":516},"id":"chatcmpl-RNZIMerp9vfIgf6sri36IttP3zBZwcUT","timings":{"prompt_n":450,"prompt_ms":490.258,"prompt_per_token_ms":1.089462222222222,"prompt_per_second":917.8840528864395,"predicted_n":27,"predicted_ms":437.315,"predicted_per_token_ms":16.19685185185185,"predicted_per_second":61.74039308050262}}
---
Test 4: Complex parameters with nested objects
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"send_email","arguments":"{\"to\":\"[email protected]\",\"subject\":\"Meeting Tomorrow\",\"body\":\"Hi John, just a reminder that we have our team meetiing scheduled for tomorrow. Best regards.\",\"attachments\":[]}"},"id":"qAUv2db10NkFpzSSkf095w7MVFeiYIkJ"}]}}],"created":1754078264,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":74,"prompt_tokens":440,"total_tokens":514},"id":"chatcmpl-cYU4A50fQgpcreblZX9MOh7OJ4BJbD2i","timings":{"prompt_n":401,"prompt_ms":426.048,"prompt_per_token_ms":1.0624638403990025,"prompt_per_second":941.2085023283762,"predicted_n":74,"predicted_ms":1193.888,"predicted_per_token_ms":16.13362162162162,"predicted_per_second":61.982363504784374}}
---
Test 5: No tools needed scenario
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"The capital ofFranceis Paris.\n","tool_calls":[{"type":"function","function":{"name":"get_weather","arguments":"{\"location\":\"Paris, france\"}"},"id":"cHYC6Sez5SgbzGPAJJ0qVHvVJC6faLbX"}]}}],"created":1754078265,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":31,"prompt_tokens":335,"total_tokens":366},"id":"chatcmpl-GQTiQ2H2esGeiwA6sGxqo3YYYwQZwkrt","timings":{"prompt_n":296,"prompt_ms":361.213,"prompt_per_token_ms":1.2203141891891893,"prompt_per_second":819.461093592977,"predicted_n":31,"predicted_ms":510.389,"predicted_per_token_ms":16.464161290322583,"predicted_per_second":60.7379861243091}}
---
Test 6: Tool with enum parameter
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"set_temperature","arguments":"{\"temperature\":72,\"unit\":\"fahrenheit\"}"},"id":"AePf5xOZZ5neOKCf9EFQwEw3SaFi7WM0"}]}}],"created":1754078267,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":72,"prompt_tokens":384,"total_tokens":456},"id":"chatcmpl-vDwwx9lTc3Bo83wAMUQAXFEp6JUwbB0L","timings":{"prompt_n":345,"prompt_ms":386.195,"prompt_per_token_ms":1.1194057971014493,"prompt_per_second":893.3310892165874,"predicted_n":72,"predicted_ms":1251.584,"predicted_per_token_ms":17.383111111111113,"predicted_per_second":57.52710165678052}}
---
Test 7: Tool call with reasoning
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"get_weather_forecast","arguments":"{\"location\":\"Seattle\",\"date\":2023}"},"id":"zm3TKKIFa6U1ZoL47n9Zpw6J7j7HjdnQ"}]}}],"created":1754078268,"model":"gpt-3.5-turbo","system_fingerprint":"b6058-0f5ccd6f","object":"chat.completion","usage":{"completion_tokens":42,"prompt_tokens":386,"total_tokens":428},"id":"chatcmpl-mA8Xy5VQFj9K43j8QauyWD7tvi0GGSG8","timings":{"prompt_n":347,"prompt_ms":401.396,"prompt_per_token_ms":1.1567608069164266,"prompt_per_second":864.482954488834,"predicted_n":42,"predicted_ms":642.277,"predicted_per_token_ms":15.292309523809525,"predicted_per_second":65.3923462929546}}% |
Remove this. This model has native 256k context window. |
It's also kinda weird that it only tries to write |
I think there's still an issue with the parser not handling the weird format (whitespaces and newlines) which seem to occur when the context window is filled up a bit (30k+). I'll fix it asap. Nonetheless, the model also appears behave weird. But step by step. The only real issue with the model, that I'm sure has nothing to do with the parser is the repetitive number issue. |
I'm pretty sure that's just because "something" is eating the rest of the include directive... in the end everything might be fine. Currently l, there are multiple possible errors sources. 🤣 |
#14962 does make everything "work". It still prints out the tool calls and switching randomly between text + tool call, XML and JSON but I guess that's just the model and/or quant being weird. ![]() |
@bold84 Thanks for the effort! Hopefully your XML work turns out to be helpful as well eventually. 😊 |
Using the Hermes template, on which the model wasn't trained, makes everything work. The question is how well. See: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/qwen3coder_tool_parser.py If you benchmark both templates, I would not be surprised if you would see lower performance with the Hermes template. |
I tried merging master/#14962 with this PR but it's back to swallowing stuff after the |
I have a few more changes i haven't pushed yet. But one thing is sure, the 30B model isn't strong with tool calls when there's 30k+ tokens in the context window. |
Could this be the issue with the swallowing? Like it's parsing stuff and then the |
It's definitely something about |
I used the model with vllm a bit more, FP8 quant and Qwen's python tool parser. With crush and opencode. And as you can see, it's tricky to debug this. This is the main reason I decided to close this PR, there's no real value in it, if Qwen3-Coder 30B A3B isn't capable enough. |
I'm just hoping for a significant update or fix from Unsloth and/or Qwen. The non-coder non-thinking 2507 30B seemed to work fine for my small snake game in C and ncursus test. |
@bold84 How do you feel about putting this up for a merge? The discussion at #15703 has pretty much stalled and there's seemingly no progress with implementing the "better" way. Maybe just update the template in this PR to satisfy @ggerganov 's requirements, remove the (obsolete) disclaimers about parameters and be done with it? |
Using the upstream template from https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/chat_template.jinja and the upstream recommended parameters work fine.
![]() |
On the contrary, I tried with the jinja chat template and the latest llama.cpp master branch and it did NOT work for me. Switching back to this branch and it working fine again. |
Well yes, I meant with this PR. There were concerns about the updated upstream template with this PR. |
I will close this PR. Feel free to create a new PR based on the code. |
Darn, I've been using this branch while eagerly feeding my OCD refreshing the discussion thread at #15703 which unfortunately is two weeks stale. The existing master llama.cpp doesn't work even with the new template. Were there any side discussions about standardizing XML Tool Call? If not, then I agree with @marceldev89 about merging the branch. |
Saaaad. Like I said in a previous comment, the updated template works fine with this PR. |
Fine.. I will put it up for merge, but I won't be fixing issues. :-) |
Yes that's alright. If anything pops up then I'll try to keep up with it. Thanks 🙏 |
Regarding #15703: |
Synced the PR with master and the bundled template with upstream. @ggerganov This works flawlessly with the upstream recommended parameters and template (disregard the disclaimers in the PR). Would be nice to get this reviewed and merged in. 😊
![]() |
i'll give my 2-cents as a user here... Ofc a base But Qwen Coder is a major usecase right now, and Llama.cpp is one of the main backends... if its going to be a long delay to get this working correctly (which looks like it is) then why not patch the working solution above a a stop gap. and unpatch it if and when the clean solution materializes, that way from user perspective its the most accesible and useful thanks |
Let's see if @ochafik will have some capacity to chime in on the change. In the meantime, we should see if anyone would be interested in "adopting" the |
@ggerganov whoops, seems like merging master into this triggered an auto review request |
Using this branch with latest jinja and
I'm using qwen-code CLI, in case that matters at all. Here is my start command for llama-server:
GPU is RTX 5090 |
@Xenograph I haven't experienced any crashes at all. Maybe try without the |
That might've been it, no crashes so far. |
Spoke too soon, it took awhile but I eventually got another crash with the same message. This time with standard jinja.
|
Hi, I also experienced crashes with the latest commit. Weirdly, there's no error message at all. Is there any debug option? My command:
The error shown in qwen-code:
The output from llama-server:
And then it just crashed... |
I just tried it with qwen-code and it's working fine. Maybe check the sha256sum of the GGUF? qwen-code output
|
This pull request resolves #15012 and introduces comprehensive support for the Qwen3-Coder model family's XML-based tool-calling format. It includes a new, robust XML parser and updated chat template detection logic to ensure reliable function calling.
Key Changes:
New XML Parser (
common/chat-parser.cpp
):Chat Template Detection (
common/chat.h
,common/chat.cpp
):QWEN3_CODER_XML
format is applied consistently, even when no tools are explicitly provided in the request.Comprehensive tests (
tests/test-chat.cpp
):Usage Notes:
Model Behavior: During testing, it was observed that the 30B verison of the Qwen3-Coder models can sometimes generate numeric values with excessive precision (e.g.,
72.0000...
). This is a model-level sampling issue, not a bug in the implementation. It can be effectively mitigated by using the following sampling parameters:Model Performance: The implementation has been tested only with the 30B parameter model (different 8 bit quantizations). The 30B gguf variant of Qwen3-Coder has shown inconsistent performance with tool calling and may not function reliably. I tested the FP8 variant of the model with vllm and there were no issues at all.