Skip to content

model : add reasoning/tool parsing to Llama 3.x Nemotron #15083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aldehir
Copy link

@aldehir aldehir commented Aug 5, 2025

This PR adds reasoning and tool parsing to the Llama 3.x Nemotron models.

Context:

  • The generic parser excludes the <think></think> tags, and I believe the sampling is also inhibiting the model from reasoning.
  • The think tags are not unique tokens, so it outputs tokens <think when streaming until the reasoning parsing has enough to match.

Implementation:

  • Added COMMON_CHAT_FORMAT_LLAMA_3_X_NEMOTRON and associated init/parse functions.
  • Added try_consume_partial_literal() to defer parsing when there's a prefix match, but I don't know if that's the best way. Could be named better too.

@github-actions github-actions bot added the testing Everything test related label Aug 5, 2025
@aldehir aldehir force-pushed the model/llama-nemotron-reasoning branch from 95f4c09 to 969368a Compare August 7, 2025 00:54
@aldehir aldehir marked this pull request as draft August 9, 2025 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant