-
Notifications
You must be signed in to change notification settings - Fork 13.3k
model : gpt-oss add response_format support #15494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@aldehir thanks for the change, and it seems like now my chat completion request with response_format is working with the llama.cpp backend. one question, would the grammar rule also affect the reasoning token generation of gpt-oss? i.e. forcing the reasoning tokens to be generated in the json schema format, which certainly would impact the performance. |
@samshipengs with |
@aldehir i haven't looked at the |
@samshipengs Ah, ok. The grammar for gpt-oss when using If you're finding reasoning traces in your structured output, I would verify you are passing in |
@aldehir I was using I noticed that if i don't use structured_output i.e. response_format not passing in, it seems to give me more sensible answer (im looking at the final channel of the harmony format response) comapred to the parsed from passing in a pydantic model in response_format. Is the grammar based constraint decoding in llama cpp done by GBNF? Do we know if openai (for their commercial models) uses the same constraint decoding technique? |
@samshipengs the grammar is defined in gbnf, but I don't know the specifics about the constrained decoding implementation. If you can provide an example of such a task, I can look further into it. |
This reverts commit 32732f2.
Add
response_format
support togpt-oss
models.The generic grammar implementation is not great for
gpt-oss
,curl example
Note the weirdness around landmarks.
This PR wraps the
response_format
schema in a harmony-aware grammar so the model can answer properly,fixes #15276