-
Notifications
You must be signed in to change notification settings - Fork 13.4k
chat: Fix streaming parser for granite models #15682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chat: Fix streaming parser for granite models #15682
Conversation
// By leveraging try_consume_regex()/try_find_regex() throwing | ||
// common_chat_msg_partial_exception for these partial tokens, | ||
// processing is interrupted and the tokens are not passed to add_content(). | ||
if (auto res = builder.try_consume_regex(start_think_regex)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if there are some better way to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a corner of the code I'm intimately familiar with, but I see similar usage elsewhere to parse tool calls that may be partial, so I think this looks like the right approach.
4a1699b
to
6967aec
Compare
6967aec
to
f16589b
Compare
Hi @shun095, thanks for investigating this! I'll get it on the TODO list for review soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great fix! I want to make sure we don't break tool calling for 3.0/3.1/3.2
, so will dig there and make sure we don't have to cover single-tool-call responses.
// By leveraging try_consume_regex()/try_find_regex() throwing | ||
// common_chat_msg_partial_exception for these partial tokens, | ||
// processing is interrupted and the tokens are not passed to add_content(). | ||
if (auto res = builder.try_consume_regex(start_think_regex)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a corner of the code I'm intimately familiar with, but I see similar usage elsewhere to parse tool calls that may be partial, so I think this looks like the right approach.
if (tool_calls_data.json.is_array()) { | ||
if (!builder.add_tool_calls(tool_calls_data.json)) { | ||
builder.add_content("<|tool_call|>" + tool_calls_data.json.dump()); | ||
if (auto tool_call = builder.try_consume_json_with_dumped_args({{{"arguments"}}})) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the earlier 3-series granite models have different tool calling behavior. I'm trying to verify whether any of them would return a single tool-call object (versus an array), but we may need to retain the clause that checks is_array
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that older granite models should not return single tool calls, so this is good to go. I've confirmed the fix locally as well. Thanks!
* fix(chat): fix streaming parser for granite models * tests: add test cases for Granite models chat parser
* fix(chat): fix streaming parser for granite models * tests: add test cases for Granite models chat parser
Fixes: #15681 and #15713
I'm still getting used to contributing on GitHub, so please let me know if there are any issues or if I should make adjustments.
This PR is still in progress and I'll add tests later.I've now added the tests, and the PR is ready for review.