Skip to content

Conversation

shun095
Copy link
Contributor

@shun095 shun095 commented Aug 30, 2025

Fixes: #15681 and #15713

I'm still getting used to contributing on GitHub, so please let me know if there are any issues or if I should make adjustments.

This PR is still in progress and I'll add tests later.
I've now added the tests, and the PR is ready for review.

// By leveraging try_consume_regex()/try_find_regex() throwing
// common_chat_msg_partial_exception for these partial tokens,
// processing is interrupted and the tokens are not passed to add_content().
if (auto res = builder.try_consume_regex(start_think_regex)) {
Copy link
Contributor Author

@shun095 shun095 Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if there are some better way to implement this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a corner of the code I'm intimately familiar with, but I see similar usage elsewhere to parse tool calls that may be partial, so I think this looks like the right approach.

@shun095 shun095 marked this pull request as draft August 30, 2025 08:16
@shun095 shun095 marked this pull request as ready for review August 31, 2025 15:59
@github-actions github-actions bot added the testing Everything test related label Aug 31, 2025
@shun095 shun095 force-pushed the fix_granite_streaming_parser branch from 4a1699b to 6967aec Compare August 31, 2025 16:15
@shun095 shun095 force-pushed the fix_granite_streaming_parser branch from 6967aec to f16589b Compare September 3, 2025 12:52
@gabe-l-hart gabe-l-hart self-requested a review September 19, 2025 14:52
@gabe-l-hart
Copy link
Collaborator

Hi @shun095, thanks for investigating this! I'll get it on the TODO list for review soon.

Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great fix! I want to make sure we don't break tool calling for 3.0/3.1/3.2, so will dig there and make sure we don't have to cover single-tool-call responses.

// By leveraging try_consume_regex()/try_find_regex() throwing
// common_chat_msg_partial_exception for these partial tokens,
// processing is interrupted and the tokens are not passed to add_content().
if (auto res = builder.try_consume_regex(start_think_regex)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a corner of the code I'm intimately familiar with, but I see similar usage elsewhere to parse tool calls that may be partial, so I think this looks like the right approach.

if (tool_calls_data.json.is_array()) {
if (!builder.add_tool_calls(tool_calls_data.json)) {
builder.add_content("<|tool_call|>" + tool_calls_data.json.dump());
if (auto tool_call = builder.try_consume_json_with_dumped_args({{{"arguments"}}})) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the earlier 3-series granite models have different tool calling behavior. I'm trying to verify whether any of them would return a single tool-call object (versus an array), but we may need to retain the clause that checks is_array.

Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that older granite models should not return single tool calls, so this is good to go. I've confirmed the fix locally as well. Thanks!

@gabe-l-hart gabe-l-hart merged commit f432d8d into ggml-org:master Sep 19, 2025
48 checks passed
struct pushed a commit to struct/llama.cpp that referenced this pull request Sep 26, 2025
* fix(chat): fix streaming parser for granite models

* tests: add test cases for Granite models chat parser
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
* fix(chat): fix streaming parser for granite models

* tests: add test cases for Granite models chat parser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Granite chat parser doesn't stream content section

2 participants