Skip to content

Conversation

@hksdpc255
Copy link

@hksdpc255 hksdpc255 commented Sep 9, 2025

UPDATE: Use my new PR #16932

This PR introduces an enhanced implementation of tool calling for GLM-4.5, building upon the existing contributions by @dhandhalyabhavik and @susmitds (see PR #15186).

Key improvements include:

  1. Grammar-constrained tool-call outputs
    The model’s tool-call messages are now rigorously enforced by a defined grammar, ensuring that generated calls are always well-formed and reliably parsed.

  2. Streaming support for tool-call parsing
    I have added streaming capabilities to the parser to handle tool-call messages as they’re generated. This enhancement enables more responsive and real-time interactions during inference.

Use this Jinja template while testing:

[gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson }}
{% endfor %}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}
{%- macro mask_user_text(content) -%}
{%- set str_user_text = content|string -%}
{%- set str_user_text = str_user_text.replace('<think>', '<|start_of_thought|>') -%}
{%- set str_user_text = str_user_text.replace('</think>', '<|end_of_thought|>') -%}
{%- set str_user_text = str_user_text.replace('<tool_call>', '<|tool_calls_begin|>') -%}
{%- set str_user_text = str_user_text.replace('</tool_call>', '<|tool_call_end|>') -%}
{%- set str_user_text = str_user_text.replace('<arg_key>', '<|tool_call_argument_begin|>') -%}
{%- set str_user_text = str_user_text.replace('</arg_key>', '<|tool_call_argument_end|>') -%}
{%- set str_user_text = str_user_text.replace('<arg_value>', '<|tool_call_value_begin|>') -%}
{%- set str_user_text = str_user_text.replace('</arg_value>', '<|tool_call_value_end|>') -%}
{{- str_user_text -}}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1, last_user_has_nothink=false) %}
{%- for m in messages %}
    {%- if m.role == 'user' %}
        {%- set user_content = visible_text(m.content) -%}
		{%- set ns.last_user_index = loop.index0 -%}
		{%- set _clean = user_content | trim -%}
		{%- set ns.last_user_has_nothink = _clean.endswith('/nothink') -%}
    {%- endif %}
{%- endfor %}
{%- if ns.last_user_has_nothink -%}
    {%- set enable_thinking = false -%}
{%- endif -%}
{# ===========================
   JSON helpers
   =========================== #}
{% macro unescape_json_string(u_str) -%}
{%- set une = namespace(buf='', s='', slen=0, consume=0, esc='') -%}
{%- set une.s = u_str | trim -%}
{%- if une.s.startswith('"') and une.s.endswith('"') and une.s|length >= 2 -%}
{%- set une.slen = une.s|length - 1 -%}
{# hex map for manual hex -> int conversion #}
{%- set hexmap = {'0':0,'1':1,'2':2,'3':3,'4':4,'5':5,'6':6,'7':7,'8':8,'9':9,'a':10,'b':11,'c':12,'d':13,'e':14,'f':15,'A':10,'B':11,'C':12,'D':13,'E':14,'F':15} -%}
{%- for ich in range(1, une.slen) -%}
{%- if ich >= une.consume -%}
    {%- set uch = une.s[ich:ich+1] -%}
    {%- if uch != '\\' -%}
        {%- set une.buf = une.buf + uch -%}
    {%- else -%}
        {# found backslash, look ahead #}
        {%- set jch = ich + 1 -%}
        {%- if jch >= une.slen -%}
          {%- set une.buf = une.buf + '\ufffd' -%} {# lonely backslash -> replacement #}
        {%- else -%}
          {%- set une.esc = une.s[jch:jch+1] -%}
          {%- if une.esc in ['"', '\\', '/', 'b', 'f', 'n', 'r', 't'] -%}
            {%- if une.esc == '"' -%}{%- set outch = '"' -%}{%- elif une.esc == '\\' -%}{%- set outch = '\\' -%}
            {%- elif une.esc == '/' -%}{%- set outch = '/' -%}{%- elif une.esc == 'b' -%}{%- set outch = '\b' -%}
            {%- elif une.esc == 'f' -%}{%- set outch = '\f' -%}{%- elif une.esc == 'n' -%}{%- set outch = '\n' -%}
            {%- elif une.esc == 'r' -%}{%- set outch = '\r' -%}{%- elif une.esc == 't' -%}{%- set outch = '\t' -%}
            {%- endif -%}
            {%- set une.buf = une.buf + outch -%}
            {%- set une.consume = jch + 1 -%}  {# next loop ich will skip until >= une.consume #}
          {%- elif une.esc == 'u' -%}
            {# attempt to read up to 4 hex digits starting at jch+1 #}
            {%- set kch = jch + 1 -%}
            {%- set hexpart = '' -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {# hex escape not supported: minja do not support '%c'%var or '\\uXXXX'|format or var|chr #}
            {%- if hexpart == '' -%}
              {# no hex digits -> replacement #}
              {%- set une.buf = une.buf + '\ufffd' -%}
              {%- set une.consume = jch + 1 -%}
            {%- else -%}
              {%- set une.consume = kch -%}
              {%- set une.buf = une.buf + '\ufffd' -%}
            {%- endif -%}
          {%- else -%}
            {# unknown escape: be lenient -> drop backslash, keep next char #}
            {%- set une.buf = une.buf + une.esc -%}
            {%- set une.consume = jch + 1 -%}
          {%- endif -%}
        {%- endif -%}
    {%- endif -%}
{%- endif -%}
{%- endfor -%}
{{ une.buf }}
{%- else -%}
{{ u_str }}
{%- endif -%}
{%- endmacro %}
{% macro emit_json_kv_from_object(json_str) -%}
{%- set ss = json_str | trim -%}
{%- set sslen = ss | length -%}
{%- set inner = ss[1:sslen-1] -%}
{# split top-level members on commas, respecting strings/nesting #}
{%- set ns = namespace(buf='', parts='', in_str=false, esc=false, depth_obj=0, depth_arr=0) -%}
{%- for ch in inner -%}
    {%- if ns.in_str -%}
        {%- if ns.esc -%}
            {%- set ns.esc = false -%}
        {%- elif ch == '\\' -%}
            {%- set ns.esc = true -%}
        {%- elif ch == '"' -%}
            {%- set ns.in_str = false -%}
        {%- endif -%}
        {%- set ns.buf = ns.buf + ch -%}
    {%- else -%}
        {%- if ch == '"' -%}
            {%- set ns.in_str = true -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '{' -%}
            {%- set ns.depth_obj = ns.depth_obj + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '}' -%}
            {%- set ns.depth_obj = ns.depth_obj - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '[' -%}
            {%- set ns.depth_arr = ns.depth_arr + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ']' -%}
            {%- set ns.depth_arr = ns.depth_arr - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ',' and ns.depth_obj == 0 and ns.depth_arr == 0 -%}
            {%- set ns.parts = ns.parts + ns.buf + '\x1F' -%}
            {%- set ns.buf = '' -%}
        {%- else -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}
{%- set ns.parts = ns.parts + ns.buf -%}
{# split each member on the first top-level colon into key/value #}
{%- for pair in ns.parts.split('\x1F') if pair | trim -%}
    {%- set p = pair | trim -%}
    {%- set st = namespace(buf='', in_str=false, esc=false, depth_obj=0, depth_arr=0, seen_colon=false, k='', v='') -%}
    {%- for ch in p -%}
        {%- if st.in_str -%}
            {%- if st.esc -%}
                {%- set st.esc = false -%}
            {%- elif ch == '\\' -%}
                {%- set st.esc = true -%}
            {%- elif ch == '"' -%}
                {%- set st.in_str = false -%}
            {%- endif -%}
            {%- set st.buf = st.buf + ch -%}
        {%- else -%}
            {%- if ch == '"' -%}
                {%- set st.in_str = true -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == '{' -%}
                {%- set st.depth_obj = st.depth_obj + 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == '}' -%}
                {%- set st.depth_obj = st.depth_obj - 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == '[' -%}
                {%- set st.depth_arr = st.depth_arr + 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == ']' -%}
                {%- set st.depth_arr = st.depth_arr - 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == ':' and not st.seen_colon and st.depth_obj == 0 and st.depth_arr == 0 -%}
                {%- set st.k = st.buf | trim -%}
                {%- set st.buf = '' -%}
                {%- set st.seen_colon = true -%}
            {%- else -%}
                {%- set st.buf = st.buf + ch -%}
            {%- endif -%}
        {%- endif -%}
    {%- endfor -%}
    {%- set st.v = st.buf | trim -%}
    {# dequote key if it's a JSON string #}
    {%- set key = st.k | trim -%}
<arg_key>{{ unescape_json_string(key) }}</arg_key>
{% set val_str = st.v | trim | safe %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length > 1 else unescape_json_string(val_str) }}</arg_value>{{- '\n' -}}
{%- endfor -%}
{%- endmacro %}
{% macro emit_json_items_from_array(_json_str) -%}
{%- set json_str = _json_str | trim -%}
{%- set json_str_len = json_str | length -%}
{%- if "[" in json_str[1:json_str_len] -%}
{%- set inner = json_str[1:json_str_len-1] -%}
{%- set ns = namespace(buf='', parts='', in_str=false, esc=false, depth_obj=0, depth_arr=0) -%}
{%- for ch in inner -%}
    {%- if ns.in_str -%}
        {%- if ns.esc -%}
            {%- set ns.esc = false -%}
        {%- elif ch == '\\' -%}
            {%- set ns.esc = true -%}
        {%- elif ch == '"' -%}
            {%- set ns.in_str = false -%}
        {%- endif -%}
        {%- set ns.buf = ns.buf + ch -%}
    {%- else -%}
        {%- if ch == '"' -%}
            {%- set ns.in_str = true -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '{' -%}
            {%- set ns.depth_obj = ns.depth_obj + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '}' -%}
            {%- set ns.depth_obj = ns.depth_obj - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '[' -%}
            {%- set ns.depth_arr = ns.depth_arr + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ']' -%}
            {%- set ns.depth_arr = ns.depth_arr - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ',' and ns.depth_obj == 0 and ns.depth_arr == 0 -%}
            {%- set ns.parts = ns.parts + ns.buf + '\x1F' -%}
            {%- set ns.buf = '' -%}
        {%- else -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}
{%- set ns.parts = ns.parts + ns.buf -%}

{%- set idx = 0 -%}
{%- for item in ns.parts.split('\x1F') if item | trim -%}
<arg_key>{{ idx }}</arg_key>
{% set val_str = item | trim | safe %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length > 1 else unescape_json_string(val_str) }}</arg_value>
{%- set idx = idx + 1 -%}
{%- endfor -%}
{%- else -%}
{{ emit_json_kv_from_object(json_str[1:json_str_len-1]) }}
{%- endif -%}
{%- endmacro %}
{% macro emit_json_from_string(s) -%}
    {%- set t = s | trim -%}
    {%- if t.startswith('{') and t.endswith('}') -%}
        {{ emit_json_kv_from_object(t) }}
    {%- elif t.startswith('[') and t.endswith(']') -%}
        {{ emit_json_items_from_array(t) }}
    {%- else -%}
        {{ s }}
    {%- endif -%}
{%- endmacro %}
{% macro emit_args(args) -%}
{%- if args is string -%}
{{ emit_json_from_string(args) }}
{%- elif args is mapping -%}
{%- for k, v in args | items -%}
<arg_key>{{ k }}</arg_key>
{%- if v is mapping or (v is iterable and v is not string) -%}
{% set val_str = v | tojson %}
<arg_value>{{ val_str }}</arg_value>
{%- else -%}
{% set val_str = v %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length < 2 else val_str|string|tojson }}</arg_value>
{%- endif -%}
{%- if v is string -%}
{% set val_str = v %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length < 2 else val_str|string|tojson }}</arg_value>
{%- else -%}
{% set val_str = v | tojson %}
<arg_value>{{ val_str }}</arg_value>
{%- endif -%}

{%- endfor -%}
{%- elif args is iterable and args is not string -%}
{# native list case (some runtimes pass lists, not JSON strings) #}
{%- for v in args -%}
<arg_key>{{ loop.index0 }}</arg_key>
{% set val_str = v | tojson if v is mapping or (v is iterable and v is not string) else v %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length < 2 else val_str|string|tojson }}</arg_value>
{%- endfor -%}
{%- else -%}
{{ args | tojson }}
{%- endif -%}
{%- endmacro %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ mask_user_text(user_content) }}
{%- set _uc = user_content | trim -%}
{%- set last_user_index_safe = ns.last_user_index | default(-1, true) -%}
{%- if loop.index0 == last_user_index_safe
      and (enable_thinking is defined and not enable_thinking)
      and not _uc.endswith('/nothink') -%}
{{ '\n/nothink' }}
{%- endif -%}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string and 0 != m.reasoning_content.strip()|length %}
    {%- set reasoning_content = m.reasoning_content %}
{%- else %}
    {%- if '</think>' in content %}
        {%- set think_parts = content.split('</think>') %}
        {%- if content.endswith('</think>') %}
            {%- set before_end_think = think_parts %}
            {%- set after_end_think = '' %}
        {%- else %}
            {%- set think_parts_len = think_parts|length %}
            {%- set before_end_think = think_parts[:think_parts_len - 1] %}
            {%- set after_end_think = think_parts[think_parts_len - 1] %}
        {%- endif %}
        {% set nsreasoning = namespace(content='') %}
        {%- for before_end_think_part in before_end_think %}
            {%- set think_start_parts = before_end_think_part.split('<think>') %}
            {%- set think_start_parts_len = think_start_parts|length %}
            {%- set nsreasoning.content = nsreasoning.content + think_start_parts[think_start_parts_len - 1].lstrip('\n') %}
        {%- endfor %}
        {%- set reasoning_content = nsreasoning.content %}
        {%- set content = after_end_think.lstrip('\n') %}
    {%- endif %}
{%- endif %}
{%- set last_user_index_safe = ns.last_user_index | default(-1, true) -%}
{%- if last_user_index_safe >= 0 and loop.index0 > last_user_index_safe and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- set f = tc.function if tc.function else tc -%}
{{ '\n<tool_call>' + f.name }}
{{ emit_args(f.arguments) }}
{{- '</tool_call>' }}
{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
    {{- '<|observation|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{%- if m.content != '' -%}
{{-  emit_args(m.content) }}
{%- endif %}
{{- '</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}
{{- '<tool_response>' }}
{{ tr.output if tr.output is defined else tr }}
{{- '</tool_response>' }}{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}

Although not yet implemented, I‘m planning the following improvements:

  1. Patch jinja template in common_chat_params_init_glm_4_5 to make it compatible with the original Unsloth GGUF chat template, and potentially even with the official chat template.

  2. Add dedicated unit tests for grammar enforcement and streaming parsing.

Testing and feedback are welcome.

Suggested commit message after squash commits:

common: add GLM-4.5 tool calling support

- Add COMMON_CHAT_FORMAT_GLM_4_5 format enum
- Add template detection based on <arg_key> and <arg_value> tags
- Fix null content handling in message parsing and serialization
- Ensure GLM-4.5 detection runs before Hermes to avoid misidentification
- Implement GLM-4.5 tool call parser

Co-authored-by: Bhavik Dhandhalya <[email protected]>
Co-authored-by: Susmit Das <[email protected]>

@hksdpc255
Copy link
Author

Use --reasoning-format none if your OpenAI-compatible client does not support sending reasoning_content back to the server.

@sbrnaderi
Copy link

Got a runtime error:

<arg_value># Do this and that: <10
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Partial parse: Expected </arg_value> after <arg_value>

Looks like it is happening because of the "<10" characters in the generated text during a function call parsing. Probably it is trying to parse <10 as the beginning of an xml tag?

@hksdpc255
Copy link
Author

Got a runtime error:

<arg_value># Do this and that: <10
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Partial parse: Expected </arg_value> after <arg_value>

Looks like it is happening because of the "<10" characters in the generated text during a function call parsing. Probably it is trying to parse <10 as the beginning of an xml tag?

@sbrnaderi From the log you provided, there isn’t anything unexpected. The JSON parse error occurs because I first try to parse arg_value as JSON; if that fails, it is parsed as a raw string. The failure log cannot be suppressed due to the design of llama.cpp.

@sbrnaderi
Copy link

@hksdpc255 so, you are trying to parse the xml format from the GLM model to JSON, but I think what goes wrong here is that the "<10" part of the text is recognised as an xml tag. No?

Got a runtime error:

<arg_value># Do this and that: <10
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Partial parse: Expected </arg_value> after <arg_value>

Looks like it is happening because of the "<10" characters in the generated text during a function call parsing. Probably it is trying to parse <10 as the beginning of an xml tag?

@sbrnaderi From the log you provided, there isn’t anything unexpected. The JSON parse error occurs because I first try to parse arg_value as JSON; if that fails, it is parsed as a raw string. The failure log cannot be suppressed due to the design of llama.cpp.

@hksdpc255
Copy link
Author

@sbrnaderi Would you be able to share more logs or your prompt? The current log you shared doesn’t seem to have any problem, and additional details would help me figure out what’s going wrong.

@hksdpc255
Copy link
Author

@sbrnaderi I guess your issue is fixed by latest commit.

@sbrnaderi
Copy link

@hksdpc255 thanks, I will try your new commit.

@ai-christianson
Copy link

I'm running this PR with the supplied chat template and it is working 👍

@MikeLP
Copy link

MikeLP commented Sep 15, 2025

Also checked this PR and everything works perfect with provided jinja template

@DKingAlpha
Copy link

DKingAlpha commented Sep 30, 2025

Parsing json in <arg_value> is pretty much broken on current branch. Original patch will crash while streaming response ends with <arg_value> without tailing content.

This patch will fix the crash

--- a/common/json-partial.cpp	2025-10-01 03:17:14.681184368 +0800
+++ b/common/json-partial.cpp	2025-10-01 03:15:35.623175731 +0800
@@ -183,7 +183,7 @@
                 } else if (can_parse(str + "\"" + closing)) {
                     // Was inside an object value string
                     str += (out.healing_marker.json_dump_marker = magic_seed) + "\"" + closing;
-                } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
+                } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
                     // Was inside an object value string after an escape
                     str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"" + closing;
                 } else {
@@ -202,7 +202,7 @@
                 } else if (can_parse(str + "\"" + closing)) {
                     // Was inside an array value string
                     str += (out.healing_marker.json_dump_marker = magic_seed) + "\"" + closing;
-                } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
+                } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
                     // Was inside an array value string after an escape
                     str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"" + closing;
                 } else if (!was_maybe_number() && can_parse(str + ", 1" + closing)) {
@@ -227,7 +227,7 @@
                 } else if (can_parse(str + "\": 1" + closing)) {
                     // Was inside an object key string
                     str += (out.healing_marker.json_dump_marker = magic_seed) + "\": 1" + closing;
-                } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\": 1" + closing)) {
+                } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\": 1" + closing)) {
                     // Was inside an object key string after an escape
                     str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\": 1" + closing;
                 } else {
@@ -253,7 +253,7 @@
             if (can_parse(str + "\"")) {
                 // Was inside an string
                 str += (out.healing_marker.json_dump_marker = magic_seed) + "\"";
-            } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
+            } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
                 // Was inside an string after an escape
                 str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"";
             } else {

Besides, thanks for your PR. It's special because it works with complex json schema, with a quick hack like this:

--- a/common/json-schema-to-grammar.cpp	2025-10-01 00:22:00.744098340 +0800
+++ b/common/json-schema-to-grammar.cpp	2025-10-01 00:19:48.692716944 +0800
@@ -944,6 +944,9 @@
             return _add_rule(rule_name, out.str());
         } else if (schema.empty() || schema_type == "object") {
             return _add_rule(rule_name, _add_primitive("object", PRIMITIVE_RULES.at("object")));
+        } else if (schema_type.is_null() && schema.contains("not") && schema["not"].is_object() && schema["not"].empty()) {
+            // librechat returns not:{}, which does nothing.
+            return "";
         } else {
             if (!schema_type.is_string() || PRIMITIVE_RULES.find(schema_type.get<std::string>()) == PRIMITIVE_RULES.end()) {
                 _errors.push_back("Unrecognized schema: " + schema.dump());

LibreChat passed scrambled schema including {"not":{}}, this patch will ignore that.

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 1, 2025

@DKingAlpha Thanks for pointing that out! It seems my compiler adds some extra padding to the string object, which ends up masking the string array underflow crash.

@hksdpc255
Copy link
Author

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

@DKingAlpha
Copy link

DKingAlpha commented Oct 1, 2025

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

No

I am using clang-20, if that helps to reproduce.

Either this function(try_consume_json) is designed to run on non-empty string, which means you need to change your code, or its a bug in that part and never triggered before. I prefer the latter one.

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 1, 2025

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

No

I am using clang-20, if that helps to reproduce.

Either this function(try_consume_json) is designed to run on non-empty string, which means you need to change your code, or its a bug in that part and never triggered before. I prefer the latter one.

@DKingAlpha Would it still crash if you only patched the sections that my PR actually changed?

@@ -253,7 +253,7 @@
             if (can_parse(str + "\"")) {
                 // Was inside an string
                 str += (out.healing_marker.json_dump_marker = magic_seed) + "\"";
-            } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
+            } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
                 // Was inside an string after an escape
                 str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"";
             } else {

@DKingAlpha
Copy link

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

No

I am using clang-20, if that helps to reproduce.

Either this function(try_consume_json) is designed to run on non-empty string, which means you need to change your code, or its a bug in that part and never triggered before. I prefer the latter one.

@DKingAlpha Would it still crash if you only patched the sections that my PR actually changed?

@@ -253,7 +253,7 @@
             if (can_parse(str + "\"")) {
                 // Was inside an string
                 str += (out.healing_marker.json_dump_marker = magic_seed) + "\"";
-            } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
+            } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
                 // Was inside an string after an escape
                 str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"";
             } else {

Line 253 is exactly the location that crashed on my side. But I patched all other .length() - 1 together. They all look fishy. So I really cant say.

I mean even without running into it, only by static manual reviewing, it shall be checked before access

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 1, 2025

@DKingAlpha I believe

if (!healing_marker.empty() && !err_loc.stack.empty()) {
ensures that str is not empty. I’m considering changing my code from

if (!healing_marker.empty() && err_loc.stack.empty())

to

if (err_loc.position != 0 && !healing_marker.empty() && err_loc.stack.empty())

. What do you think about this change?

@aaronnewsome
Copy link

I don't have any detailed logs from the llama.cpp crash that opencode causes, but here's the tail of the container log on the latest crash today. this is with unsloths Q4 GLM-4.5-Air and the hksdpc255:master llama.cpp:

srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 172.20.3.42 200
srv  params_from_: Chat format: GLM 4.5
slot get_availabl: id  0 | task 21156 | selected slot by lcs similarity, lcs_len = 24546, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id  0 | task 21405 | processing task
slot update_slots: id  0 | task 21405 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 24575
slot update_slots: id  0 | task 21405 | n_past = 24546, memory_seq_rm [24546, end)
slot update_slots: id  0 | task 21405 | prompt processing progress, n_past = 24575, n_tokens = 29, progress = 0.001180
slot update_slots: id  0 | task 21405 | prompt done, n_past = 24575, n_tokens = 29
slot      release: id  0 | task 21405 | stop processing: n_past = 24643, truncated = 0
slot print_timing: id  0 | task 21405 | 
prompt eval time =     200.19 ms /    29 tokens (    6.90 ms per token,   144.86 tokens per second)
       eval time =    2013.96 ms /    69 tokens (   29.19 ms per token,    34.26 tokens per second)
      total time =    2214.15 ms /    98 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 172.20.3.42 200
srv  params_from_: Chat format: GLM 4.5
slot get_availabl: id  0 | task 21405 | selected slot by lcs similarity, lcs_len = 24643, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id  0 | task 21475 | processing task
slot update_slots: id  0 | task 21475 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 25164
slot update_slots: id  0 | task 21475 | n_past = 24643, memory_seq_rm [24643, end)
slot update_slots: id  0 | task 21475 | prompt processing progress, n_past = 25164, n_tokens = 521, progress = 0.020704
slot update_slots: id  0 | task 21475 | prompt done, n_past = 25164, n_tokens = 521
slot      release: id  0 | task 21475 | stop processing: n_past = 25220, truncated = 0
slot print_timing: id  0 | task 21475 | 
prompt eval time =    1958.69 ms /   521 tokens (    3.76 ms per token,   265.99 tokens per second)
       eval time =    1674.96 ms /    57 tokens (   29.39 ms per token,    34.03 tokens per second)
      total time =    3633.65 ms /   578 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 172.20.3.42 200
srv  params_from_: Chat format: GLM 4.5
slot get_availabl: id  0 | task 21475 | selected slot by lcs similarity, lcs_len = 25220, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id  0 | task 21533 | processing task
slot update_slots: id  0 | task 21533 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 25510
slot update_slots: id  0 | task 21533 | n_past = 25220, memory_seq_rm [25220, end)
slot update_slots: id  0 | task 21533 | prompt processing progress, n_past = 25510, n_tokens = 290, progress = 0.011368
slot update_slots: id  0 | task 21533 | prompt done, n_past = 25510, n_tokens = 290
slot      release: id  0 | task 21533 | stop processing: n_past = 25981, truncated = 0
slot print_timing: id  0 | task 21533 | 
prompt eval time =    1271.89 ms /   290 tokens (    4.39 ms per token,   228.01 tokens per second)
       eval time =   14239.01 ms /   472 tokens (   30.17 ms per token,    33.15 tokens per second)
      total time =   15510.89 ms /   762 tokens
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: now finding less tool calls!
/root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/start-llama: line 25:   123 Aborted                 (core dumped) llama-server --model /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf --alias GLM-4.5-Air-UD-Q4_K_XL --threads -1 --ctx-size 131072 --n-gpu-layers 49 --temp 1.0 --min-p 0.0 --top-p 0.95 --top-k 40 --repeat-penalty 1.05 --context-shift --host 0.0.0.0 --reasoning-format none --flash-attn off -hf ggml-org/gemma-3-12b-it-GGUF --no-mmproj-offload --jinja --chat-template-file /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/glm-45.jinja

Not every failed diff edit causes the server to crash, but when the server does crash, this is the most common error I see, Invalid diff: now finding less tool calls!

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 12, 2025

@aaronnewsome You can run llama-server with -lv 1 to enable verbose logging. This will output detailed information to stdout, which might help diagnose the issue.

@matbrez
Copy link

matbrez commented Oct 12, 2025

opencode edits are failing because its edit tool does not return any content and then the template replaces that empty result with SYSTEM ERROR: This tool is not working due to internal error, try another tool or give a direct response.

I replaced

{%- if m.content == '' -%}
{{- 'SYSTEM ERROR: This tool is not working due to internal error, try another tool or give a direct response.' }}
{%- else -%}
{{-  emit_args(m.content) }}
{%- endif %}

with

{%- if m.content != '' -%}
{{-  emit_args(m.content) }}
{%- endif %}

and so far I didn't notice any issues.

@aaronnewsome
Copy link

Here's what I see with -lv 1 before the crash:

que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 213
que    start_loop: update slots
srv  update_slots: posting NEXT_RESPONSE
que          post: new task, id = 214, front = 0
slot update_slots: id  0 | task 137 | slot decode token, n_ctx = 131072, n_past = 61190, n_cache_tokens = 61190, truncated = 0
srv  update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
data stream, to_send: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"}"}}]}}],"created":1760321159,"id":"chatcmpl-MJPawRpSZfxOmAILiSbqfJMTR9rBoGCG","model":"glm-4.5-air","system_fingerprint":"b6735-bbb592d5","object":"chat.completion.chunk"}

srv  update_chat_: Parsing chat message: 
<think></think>
<tool_call>edit
<arg_key>file_path</arg_key>
<arg_value>/home/anewsome/Documents/git/podscribe-rewrite/podscribe-react/src/pages/Dashboard/Dashboard.tsx</arg_value>
<arg_key>old_string</arg_key>
<arg_value><WebSocketStatus /></arg_value>
<arg_key>new_string</arg_key>
<arg_value>{/* <WebSocketStatus /> */}</arg_value>
<arg_key>expected_replacements</arg_key>
<arg_value>1</arg_value>
</tool_call>
Parsing input with format GLM 4.5: 
<think></think>
<tool_call>edit
<arg_key>file_path</arg_key>
<arg_value>/home/anewsome/Documents/git/podscribe-rewrite/podscribe-react/src/pages/Dashboard/Dashboard.tsx</arg_value>
<arg_key>old_string</arg_key>
<arg_value><WebSocketStatus /></arg_value>
<arg_key>new_string</arg_key>
<arg_value>{/* <WebSocketStatus /> */}</arg_value>
<arg_key>expected_replacements</arg_key>
<arg_value>1</arg_value>
</tool_call>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing object key - unexpected end of input; expected string literal: <<<{>>>
srv          send: sending result for task id = 137
srv          send: task id = 137 pushed to result queue
slot process_toke: id  0 | task 137 | stopped by EOS
slot process_toke: id  0 | task 137 | n_decoded = 77, n_remaining = -1, next token: 151338 '<|observation|>'
slot      release: id  0 | task 137 | stop processing: n_past = 61190, truncated = 0
slot print_timing: id  0 | task 137 | 
prompt eval time =     395.94 ms /    46 tokens (    8.61 ms per token,   116.18 tokens per second)
       eval time =    3957.22 ms /    77 tokens (   51.39 ms per token,    19.46 tokens per second)
      total time =    4353.17 ms /   123 tokens
srv  update_chat_: Parsing chat message: 
<think></think>
<tool_call>edit
<arg_key>file_path</arg_key>
<arg_value>/home/anewsome/Documents/git/podscribe-rewrite/podscribe-react/src/pages/Dashboard/Dashboard.tsx</arg_value>
<arg_key>old_string</arg_key>
<arg_value><WebSocketStatus /></arg_value>
<arg_key>new_string</arg_key>
<arg_value>{/* <WebSocketStatus /> */}</arg_value>
<arg_key>expected_replacements</arg_key>
<arg_value>1</arg_value>
</tool_call>
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: now finding less tool calls!
/root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/start-llama: line 26:   121 Aborted                 (core dumped) llama-server --model /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf --alias GLM-4.5-Air-UD-Q4_K_XL -lv 1 --threads -1 --ctx-size 131072 --n-gpu-layers 49 --temp 1.0 --min-p 0.0 --top-p 0.95 --top-k 40 --repeat-penalty 1.05 --context-shift --host 0.0.0.0 --reasoning-format none --flash-attn off -hf ggml-org/gemma-3-12b-it-GGUF --no-mmproj-offload --jinja --chat-template-file /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/glm-45.jinja

@hksdpc255
Copy link
Author

@matbrez Thanks for the suggestion! I’ve applied your changes and they look good.

@hksdpc255
Copy link
Author

@aaronnewsome Could you provide more logs so that they include three occurrences of srv update_chat_: Parsing chat message:?

@ddh0 ddh0 mentioned this pull request Oct 14, 2025
4 tasks
@aaronnewsome
Copy link

This one has 3 occurrences, tail the last 420 lines before the crash

# Processing Settings
cleanup_temp_files = true
temp_file_age_days = 7
log_level = INFO</arg_value>

Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal: <<<[>>>
Partial parse: Expected </tool_call> after </arg_value>
srv          send: sending result for task id = 5702
srv          send: task id = 5702 pushed to result queue
slot process_toke: id  0 | task 5702 | n_decoded = 382, n_remaining = -1, next token:   198 '
'
srv  update_slots: run slots completed
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 6084
que    start_loop: update slots
srv  update_slots: posting NEXT_RESPONSE
que          post: new task, id = 6085, front = 0
slot update_slots: id  0 | task 5702 | slot decode token, n_ctx = 131072, n_past = 50077, n_cache_tokens = 50077, truncated = 0
srv  update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
srv  update_chat_: Parsing chat message: 
<think></think>
<tool_call>write_file
<arg_key>file_path</arg_key>
<arg_value>/mnt/data/Dropbox/Documents/LLM/ProductAdvisor/visual-advisor/config.ini</arg_value>
<arg_key>content</arg_key>
<arg_value>[VisualAdvisor]
# Main configuration for Visual Advisor

# Date Range Settings
start_date = 2025-10-01
end_date = 2025-10-17
model = GLM-4.5-Air-UD-Q6_K_XL

# Video Settings
video_width = 1920
video_height = 1080
video_fps = 30
duration_per_slide = 8
transition_duration = 1

# Chart Settings
chart_scale = 2
color_scheme = default
max_clients_display = 15
chart_types = pie,bar,heatmap,timeline

# Audio Settings
tts_engine = kokoro
audio_format = mp4
audio_bitrate = 192k
sample_rate = 44100
rest_seconds = 0.4

# Output Settings
output_directory = output
charts_subdir = charts
audio_subdir = audio
videos_subdir = videos

# Font Settings
font_path = /usr/local/fonts/fonts-main/ofl/architectsdaughter/ArchitectsDaughter-Regular.ttf
fallback_font = default

# Color Scheme (Hex codes)
colors.primary = #1f77b4
colors.secondary = #ff7f0e
colors.success = #2ca02c
colors.warning = #ffbb78
colors.danger = #d62728
colors.info = #17becf
colors.background = #333333
colors.text = #D4D4D4FF
colors.accent = #fb4e14

# Processing Settings
cleanup_temp_files = true
temp_file_age_days = 7
log_level = INFO</arg_value>
</tool_call>
Parsing input with format GLM 4.5: 
<think></think>
<tool_call>write_file
<arg_key>file_path</arg_key>
<arg_value>/mnt/data/Dropbox/Documents/LLM/ProductAdvisor/visual-advisor/config.ini</arg_value>
<arg_key>content</arg_key>
<arg_value>[VisualAdvisor]
# Main configuration for Visual Advisor

# Date Range Settings
start_date = 2025-10-01
end_date = 2025-10-17
model = GLM-4.5-Air-UD-Q6_K_XL

# Video Settings
video_width = 1920
video_height = 1080
video_fps = 30
duration_per_slide = 8
transition_duration = 1

# Chart Settings
chart_scale = 2
color_scheme = default
max_clients_display = 15
chart_types = pie,bar,heatmap,timeline

# Audio Settings
tts_engine = kokoro
audio_format = mp4
audio_bitrate = 192k
sample_rate = 44100
rest_seconds = 0.4

# Output Settings
output_directory = output
charts_subdir = charts
audio_subdir = audio
videos_subdir = videos

# Font Settings
font_path = /usr/local/fonts/fonts-main/ofl/architectsdaughter/ArchitectsDaughter-Regular.ttf
fallback_font = default

# Color Scheme (Hex codes)
colors.primary = #1f77b4
colors.secondary = #ff7f0e
colors.success = #2ca02c
colors.warning = #ffbb78
colors.danger = #d62728
colors.info = #17becf
colors.background = #333333
colors.text = #D4D4D4FF
colors.accent = #fb4e14

# Processing Settings
cleanup_temp_files = true
temp_file_age_days = 7
log_level = INFO</arg_value>
</tool_call>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal: <<<[>>>
srv          send: sending result for task id = 5702
srv          send: task id = 5702 pushed to result queue
slot process_toke: id  0 | task 5702 | n_decoded = 383, n_remaining = -1, next token: 151353 '</tool_call>'
srv  update_slots: run slots completed
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 6085
que    start_loop: update slots
srv  update_slots: posting NEXT_RESPONSE
que          post: new task, id = 6086, front = 0
slot update_slots: id  0 | task 5702 | slot decode token, n_ctx = 131072, n_past = 50078, n_cache_tokens = 50078, truncated = 0
srv  update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
data stream, to_send: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"}"}}]}}],"created":1760666747,"id":"chatcmpl-Wf0yEVfFtQE4EjJcnnhbKFv0cctgOKNf","model":"glm-4.6","system_fingerprint":"b6735-bbb592d5","object":"chat.completion.chunk"}

srv  update_chat_: Parsing chat message: 
<think></think>
<tool_call>write_file
<arg_key>file_path</arg_key>
<arg_value>/mnt/data/Dropbox/Documents/LLM/ProductAdvisor/visual-advisor/config.ini</arg_value>
<arg_key>content</arg_key>
<arg_value>[VisualAdvisor]
# Main configuration for Visual Advisor

# Date Range Settings
start_date = 2025-10-01
end_date = 2025-10-17
model = GLM-4.5-Air-UD-Q6_K_XL

# Video Settings
video_width = 1920
video_height = 1080
video_fps = 30
duration_per_slide = 8
transition_duration = 1

# Chart Settings
chart_scale = 2
color_scheme = default
max_clients_display = 15
chart_types = pie,bar,heatmap,timeline

# Audio Settings
tts_engine = kokoro
audio_format = mp4
audio_bitrate = 192k
sample_rate = 44100
rest_seconds = 0.4

# Output Settings
output_directory = output
charts_subdir = charts
audio_subdir = audio
videos_subdir = videos

# Font Settings
font_path = /usr/local/fonts/fonts-main/ofl/architectsdaughter/ArchitectsDaughter-Regular.ttf
fallback_font = default

# Color Scheme (Hex codes)
colors.primary = #1f77b4
colors.secondary = #ff7f0e
colors.success = #2ca02c
colors.warning = #ffbb78
colors.danger = #d62728
colors.info = #17becf
colors.background = #333333
colors.text = #D4D4D4FF
colors.accent = #fb4e14

# Processing Settings
cleanup_temp_files = true
temp_file_age_days = 7
log_level = INFO</arg_value>
</tool_call>
Parsing input with format GLM 4.5: 
<think></think>
<tool_call>write_file
<arg_key>file_path</arg_key>
<arg_value>/mnt/data/Dropbox/Documents/LLM/ProductAdvisor/visual-advisor/config.ini</arg_value>
<arg_key>content</arg_key>
<arg_value>[VisualAdvisor]
# Main configuration for Visual Advisor

# Date Range Settings
start_date = 2025-10-01
end_date = 2025-10-17
model = GLM-4.5-Air-UD-Q6_K_XL

# Video Settings
video_width = 1920
video_height = 1080
video_fps = 30
duration_per_slide = 8
transition_duration = 1

# Chart Settings
chart_scale = 2
color_scheme = default
max_clients_display = 15
chart_types = pie,bar,heatmap,timeline

# Audio Settings
tts_engine = kokoro
audio_format = mp4
audio_bitrate = 192k
sample_rate = 44100
rest_seconds = 0.4

# Output Settings
output_directory = output
charts_subdir = charts
audio_subdir = audio
videos_subdir = videos

# Font Settings
font_path = /usr/local/fonts/fonts-main/ofl/architectsdaughter/ArchitectsDaughter-Regular.ttf
fallback_font = default

# Color Scheme (Hex codes)
colors.primary = #1f77b4
colors.secondary = #ff7f0e
colors.success = #2ca02c
colors.warning = #ffbb78
colors.danger = #d62728
colors.info = #17becf
colors.background = #333333
colors.text = #D4D4D4FF
colors.accent = #fb4e14

# Processing Settings
cleanup_temp_files = true
temp_file_age_days = 7
log_level = INFO</arg_value>
</tool_call>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal: <<<[>>>
srv          send: sending result for task id = 5702
srv          send: task id = 5702 pushed to result queue
slot process_toke: id  0 | task 5702 | stopped by EOS
slot process_toke: id  0 | task 5702 | n_decoded = 384, n_remaining = -1, next token: 151338 '<|observation|>'
slot      release: id  0 | task 5702 | stop processing: n_past = 50078, truncated = 0
slot print_timing: id  0 | task 5702 | 
prompt eval time =    1844.26 ms /   363 tokens (    5.08 ms per token,   196.83 tokens per second)
       eval time =   16828.28 ms /   384 tokens (   43.82 ms per token,    22.82 tokens per second)
      total time =   18672.54 ms /   747 tokens
srv  update_chat_: Parsing chat message: 
<think></think>
<tool_call>write_file
<arg_key>file_path</arg_key>
<arg_value>/mnt/data/Dropbox/Documents/LLM/ProductAdvisor/visual-advisor/config.ini</arg_value>
<arg_key>content</arg_key>
<arg_value>[VisualAdvisor]
# Main configuration for Visual Advisor

# Date Range Settings
start_date = 2025-10-01
end_date = 2025-10-17
model = GLM-4.5-Air-UD-Q6_K_XL

# Video Settings
video_width = 1920
video_height = 1080
video_fps = 30
duration_per_slide = 8
transition_duration = 1

# Chart Settings
chart_scale = 2
color_scheme = default
max_clients_display = 15
chart_types = pie,bar,heatmap,timeline

# Audio Settings
tts_engine = kokoro
audio_format = mp4
audio_bitrate = 192k
sample_rate = 44100
rest_seconds = 0.4

# Output Settings
output_directory = output
charts_subdir = charts
audio_subdir = audio
videos_subdir = videos

# Font Settings
font_path = /usr/local/fonts/fonts-main/ofl/architectsdaughter/ArchitectsDaughter-Regular.ttf
fallback_font = default

# Color Scheme (Hex codes)
colors.primary = #1f77b4
colors.secondary = #ff7f0e
colors.success = #2ca02c
colors.warning = #ffbb78
colors.danger = #d62728
colors.info = #17becf
colors.background = #333333
colors.text = #D4D4D4FF
colors.accent = #fb4e14

# Processing Settings
cleanup_temp_files = true
temp_file_age_days = 7
log_level = INFO</arg_value>
</tool_call>
Parsing input with format GLM 4.5: 
<think></think>
<tool_call>write_file
<arg_key>file_path</arg_key>
<arg_value>/mnt/data/Dropbox/Documents/LLM/ProductAdvisor/visual-advisor/config.ini</arg_value>
<arg_key>content</arg_key>
<arg_value>[VisualAdvisor]
# Main configuration for Visual Advisor

# Date Range Settings
start_date = 2025-10-01
end_date = 2025-10-17
model = GLM-4.5-Air-UD-Q6_K_XL

# Video Settings
video_width = 1920
video_height = 1080
video_fps = 30
duration_per_slide = 8
transition_duration = 1

# Chart Settings
chart_scale = 2
color_scheme = default
max_clients_display = 15
chart_types = pie,bar,heatmap,timeline

# Audio Settings
tts_engine = kokoro
audio_format = mp4
audio_bitrate = 192k
sample_rate = 44100
rest_seconds = 0.4

# Output Settings
output_directory = output
charts_subdir = charts
audio_subdir = audio
videos_subdir = videos

# Font Settings
font_path = /usr/local/fonts/fonts-main/ofl/architectsdaughter/ArchitectsDaughter-Regular.ttf
fallback_font = default

# Color Scheme (Hex codes)
colors.primary = #1f77b4
colors.secondary = #ff7f0e
colors.success = #2ca02c
colors.warning = #ffbb78
colors.danger = #d62728
colors.info = #17becf
colors.background = #333333
colors.text = #D4D4D4FF
colors.accent = #fb4e14

# Processing Settings
cleanup_temp_files = true
temp_file_age_days = 7
log_level = INFO</arg_value>
</tool_call>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal: <<<[>>>
Partial parse: JSON
Parsed message: {"role":"assistant","content":"\n<tool_call>write_file\n<arg_key>file_path</arg_key>\n<arg_value>/mnt/data/Dropbox/Documents/LLM/ProductAdvisor/visual-advisor/config.ini</arg_value>\n<arg_key>content</arg_key>\n<arg_value>[VisualAdvisor]\n# Main configuration for Visual Advisor\n\n# Date Range Settings\nstart_date = 2025-10-01\nend_date = 2025-10-17\nmodel = GLM-4.5-Air-UD-Q6_K_XL\n\n# Video Settings\nvideo_width = 1920\nvideo_height = 1080\nvideo_fps = 30\nduration_per_slide = 8\ntransition_duration = 1\n\n# Chart Settings\nchart_scale = 2\ncolor_scheme = default\nmax_clients_display = 15\nchart_types = pie,bar,heatmap,timeline\n\n# Audio Settings\ntts_engine = kokoro\naudio_format = mp4\naudio_bitrate = 192k\nsample_rate = 44100\nrest_seconds = 0.4\n\n# Output Settings\noutput_directory = output\ncharts_subdir = charts\naudio_subdir = audio\nvideos_subdir = videos\n\n# Font Settings\nfont_path = /usr/local/fonts/fonts-main/ofl/architectsdaughter/ArchitectsDaughter-Regular.ttf\nfallback_font = default\n\n# Color Scheme (Hex codes)\ncolors.primary = #1f77b4\ncolors.secondary = #ff7f0e\ncolors.success = #2ca02c\ncolors.warning = #ffbb78\ncolors.danger = #d62728\ncolors.info = #17becf\ncolors.background = #333333\ncolors.text = #D4D4D4FF\ncolors.accent = #fb4e14\n\n# Processing Settings\ncleanup_temp_files = true\ntemp_file_age_days = 7\nlog_level = INFO</arg_value>\n</tool_call>"}
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: now finding less tool calls!
/root/.ollama/models/GLM-4.5-Air-UD-Q4_K_XL/start-llama: line 24:   126 Aborted                 (core dumped) llama-server --model /root/.ollama/models/GLM-4.5-Air-UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf --log-verbosity 1 --alias GLM-4.5-Air-UD-Q4_K_XL --threads -1 --ctx-size 131072 --n-gpu-layers 99 --temp 1.0 --min-p 0.0 --top-p 0.95 --top-k 40 --repeat-penalty 1.05 --context-shift --host 0.0.0.0 --reasoning-format auto --flash-attn off --jinja --chat-template-file /root/.ollama/models/GLM-4.5-Air-UD-Q4_K_XL/glm-45.jinja

@hksdpc255
Copy link
Author

@aaronnewsome Thanks! I see what’s going wrong. I’ll work on a fix.

@hksdpc255
Copy link
Author

@aaronnewsome Try now.

@Xe
Copy link

Xe commented Oct 22, 2025

Trying to use this with the GLM 4.5 Air model yields 500s on tool calling with responses like this:

got exception: {"code":500,"message":"Unknown method: items at row 75, column 22:\n{% set _args = tc.arguments %}\n{% for k, v in _args.items() %}\n                     ^\n<arg_key>{{ k }}</arg_key>\n at row 75, column 1:\n{% set _args = tc.arguments %}\n{% for k, v in _args.items() %}\n^\n<arg_key>{{ k }}</arg_key>\n at row 69, column 29:\n{% if m.tool_calls %}\n{% for tc in m.tool_calls %}\n                            ^\n{%- if tc.function %}\n at row 69, column 1:\n{% if m.tool_calls %}\n{% for tc in m.tool_calls %}\n^\n{%- if tc.function %}\n at row 68, column 22:\n{%- endif -%}\n{% if m.tool_calls %}\n                     ^\n{% for tc in m.tool_calls %}\n at row 68, column 1:\n{%- endif -%}\n{% if m.tool_calls %}\n^\n{% for tc in m.tool_calls %}\n at row 48, column 35:\n{{- '/nothink' if (enable_thinking is defined and not enable_thinking and not content.endswith(\"/nothink\")) else '' -}}\n{%- elif m.role == 'assistant' -%}\n                                  ^\n<|assistant|>\n at row 45, column 1:\n{% for m in messages %}\n{%- if m.role == 'user' -%}<|user|>\n^\n{% set content = visible_text(m.content) %}{{ content }}\n at row 44, column 24:\n{%- endfor %}\n{% for m in messages %}\n                       ^\n{%- if m.role == 'user' -%}<|user|>\n at row 44, column 1:\n{%- endfor %}\n{% for m in messages %}\n^\n{%- if m.role == 'user' -%}<|user|>\n at row 1, column 1:\n[gMASK]<sop>\n^\n{%- if tools -%}\n","type":"server_error"}

In case it matters, I'm running this on a DGX spark.

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 23, 2025

This looks great @hksdpc255 . Bought an AMD AI Ryzen AI Max+ 395 just to run glm-4.5-air for zed and it is not tool calling. Let us know what we can do to help

@odellus Why? I tested this on an older version of Zed before.

@hksdpc255
Copy link
Author

@odellus @Xe These issues seems off-topic for this PR. You can ask such questions in GitHub Discussions instead.

@hksdpc255
Copy link
Author

@odellus @Xe Further clarification on why it is off-topic: your log shows you’re not using the Jinja template from the OP.

llama.cpp’s Jinja2 renderer doesn’t support .items() on dict, so it will fail in that case.

The issue is already fixed with the updated template I provided in the OP.

If you prefer to use your own Jinja template, either replace .items() by | items, or open a new issue or discussion, since this would require adding support for .items() function in the upstream Jinja2 renderer.

@hksdpc255 hksdpc255 changed the title common: Yet another add GLM-4.5 tool calling support common: Yet another add GLM-4.5/GLM-4.6 tool calling support Oct 24, 2025
@odellus
Copy link

odellus commented Oct 24, 2025

@odellus @Xe Further clarification on why it is off-topic: your log shows you’re not using the Jinja template from the OP.

llama.cpp’s Jinja2 renderer doesn’t support .items() on dict, so it will fail in that case.

The issue is already fixed with the updated template I provided in the OP.

If you prefer to use your own Jinja template, either replace .items() by | items, or open a new issue or discussion, since this would require adding support for .items() function in the upstream Jinja2 renderer.

can verify it works with --jinja --chat-template-file given-glm45-template.j2

steps to reproduce [working]

  1. git clone https://github.com/hksdpc255/llama.cpp.git
  2. cd llama.cpp
  3. Save the jinja chat template file in root of repo path/to/llama.cpp/glm45_template.j2
  4. <build from source, I am building against vulkan, instructions in readme>
  5. The following command to run server
build/bin/llama-server \
  -c 120000 \
  --port 1234 \
   --api-key $API_KEY \
  -fa 1 \
  -ngl 99 \
  -b 2048 \
  -t 12 \
  -hf unsloth/GLM-4.5-Air-GGUF:Q4_K_M \
  \ # THIS IS REALLY IMPORTANT
  --jinja --chat-template-file glm45_template.j2

Quick test again the endpoint in zed shows tool calling and suppression of thinking tokens is working for this PR.

image

@odellus
Copy link

odellus commented Nov 1, 2025

Maybe this belongs in discussion but I am able to get qwen3-coder-30B working on this PR with no modification to llama.cpp source

@Mushoz
Copy link

Mushoz commented Nov 2, 2025

Why is this being closed? I was actually looking forward to this to land :D

@hksdpc255
Copy link
Author

This branch is no longer maintained.
I’ve written a patch that won’t conflict with newer model updates if you still want to use it: https://github.com/hksdpc255/llama.cpp/tree/glm45_toolsupport

I’ve also reworked a more robust implementation here: #16932 — testing and feedback are welcome.

@hksdpc255 hksdpc255 deleted the branch ggml-org:master November 2, 2025 09:58
@hksdpc255 hksdpc255 closed this Nov 2, 2025
@hksdpc255 hksdpc255 deleted the master branch November 2, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.