-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Model] Support Qwen3 models with enable_thinking field #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for Qwen3 models by introducing a new enable_thinking field and related changes across the API protocols, conversation handling, configuration, tests, and examples.
- New tests and constants for Qwen3 configuration are introduced.
- The chat completion API and conversation methods now support an extra_body.enable_thinking flag.
- Examples and documentation have been updated to demonstrate the new Qwen3 functionality.
Reviewed Changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/conversation.test.ts | Added tests to verify Qwen3-specific behavior with empty thinking blocks. |
| tests/constants.ts | Introduced new Qwen3 config JSON with enable_thinking support, though conv_template name remains "qwen2". |
| src/openai_api_protocols/chat_completion.ts | Added extra_body field with enable_thinking flag. |
| src/llm_chat.ts | Updated message appending logic to conditionally disable thinking tokens. |
| src/engine.ts | Forwarded the enable_thinking flag from the extra_body field. |
| src/conversation.ts | Added methods for appending empty thinking headers and managing their lifecycle. |
| src/config.ts | Updated GenerationConfig and prebuiltAppConfig with Qwen3 models. |
| examples/simple-chat-ts/src/simple_chat.ts | Configured extra_body for Qwen3 models in the simple chat example. |
| examples/qwen3/src/qwen3_example.ts | Provided example usage of Qwen3 models with varying enable_thinking configurations. |
| examples/qwen3/src/qwen3_example.html | Updated HTML wrapper to load the new Qwen3 example. |
| examples/qwen3/README.md | Updated documentation with instructions for running Qwen3 demos. |
Files not reviewed (2)
- examples/qwen3/package.json: Language not supported
- package.json: Language not supported
Comments suppressed due to low confidence (1)
tests/constants.ts:271
- [nitpick] The conv_template name in the Qwen3 configuration is set to "qwen2", which may be confusing. Consider updating it to "qwen3" for consistency with the model type.
"name": "qwen2",
### Change - The only change is #686, which - Add prebuilt models: - Qwen3-0.6B: `q0f16, q0f32, q4f16_1, q4f32_1` - Other Qwen3: `{1.7B, 4B, 8B} x {q4f16_1, q4f32_1}` - Support `extra_body: {enable_thinking: false}` for qwen3 models to toggle thinking - See `examples/qwen3` for more on Qwen3 usage - Also bumped `web-tokenizers` package to `0.1.6` to resolve rust-related issues ### TVMjs - No change, version `0.18.0-dev2` just like 0.2.71
- This PR adds the following Qwen3 models to WebLLM's prebuilt models:
- Qwen3-0.6B: `q0f16, q0f32, q4f16_1, q4f32_1`
- Other Qwen3: `{1.7B, 4B, 8B} x {q4f16_1, q4f32_1}`
- In addition, we add `extra_body` field and
`extra_body.enable_thinking` field to support switching between thinking
and non-thinking mode.
- We also bumped web-tokenizer to 0.1.6, which resolves newly converted MLC models throwing rust-related error
### Change - The only change is mlc-ai#686, which - Add prebuilt models: - Qwen3-0.6B: `q0f16, q0f32, q4f16_1, q4f32_1` - Other Qwen3: `{1.7B, 4B, 8B} x {q4f16_1, q4f32_1}` - Support `extra_body: {enable_thinking: false}` for qwen3 models to toggle thinking - See `examples/qwen3` for more on Qwen3 usage - Also bumped `web-tokenizers` package to `0.1.6` to resolve rust-related issues ### TVMjs - No change, version `0.18.0-dev2` just like 0.2.71

Overview
q0f16, q0f32, q4f16_1, q4f32_1{1.7B, 4B, 8B} x {q4f16_1, q4f32_1}extra_bodyfield andextra_body.enable_thinkingfield to support switching between thinking and non-thinking mode. To prevent Qwen3 from thinking, use:examples/qwen3/no_thinkand/thinkin the promptInternal notes
enable_thinkingis achieved by:extra_bodyandenable_thinkingfield toChatCompletionRequestenable_thinkingfield toGenerationConfigthat forwards the value inengine.tsllm_chat.ts, whenprefillStep()andenable_thinkingis false, we callconversation.appendEmptyThinkingReplyHeader(), instead of the normalappendReplyHeader()conversation.ts, adjustgetPromptArrayInternal()to support the reply header with an empty thinking block, using a fieldisLastMessageEmptyThinkingReplyHeadertests/conversation.test.tsFuture work
const emptyThinkingBlockStr = "<think>\n\n</think>\n\n";. This should be configurable per-model in the future. Perhaps make it a part of theConvConfigcompareConversationObject()inengine.tsto allow missing several last messages (in this case, the message without the thinking tokens), so that in longer conversations, those that already stripped the thinking tokens can reuse KV