Skip to content

Conversation

iamlemec
Copy link
Collaborator

@iamlemec iamlemec commented Aug 1, 2025

This adds GGUF conversion support for the Qwen3-Embedding class of models and makes pooling work properly by default. I'm not sure how the official Qwen GGUFs were produced, but I'll try to get them updated if this gets merged. More details:

  • Though the pre-tokenizer is in fact the same qwen2 as usual, the HF tokenizer adds an EOT token that makes the checksum different from the usual text generation models
  • For some reason the tensor names are not prefixed with model. like most other models in this class
  • Actually loads the default pooling mode so it works out of the box

@github-actions github-actions bot added the python python script changes label Aug 1, 2025
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this - it would be great to get this model supported.

@CISC CISC merged commit 339bd02 into ggml-org:master Aug 2, 2025
49 of 50 checks passed
@CISC
Copy link
Collaborator

CISC commented Aug 2, 2025

Ooops, was a bit trigger happy, forgot convert_hf_to_gguf_update.py...

@iamlemec care to add the model there in a new PR?

Edit: It has to go in pre_computed_hashes (since qwen2 is in models already), which means the entry in convert_hf_to_gguf.py will be moved towards the top as well.

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 2, 2025
@iamlemec
Copy link
Collaborator Author

iamlemec commented Aug 2, 2025

@CISC Ah I see, will have that up in a moment. See #15030.

@Mushoz
Copy link

Mushoz commented Aug 27, 2025

@iamlemec Did you ever speak to Qwen about the official GGUFs? Are they okay to use in their current form or would they have to be updated?

@iamlemec
Copy link
Collaborator Author

@Mushoz looks like they uploaded new GGUFs to HF! Just tested the 0.6B version and it works as expected with correct pooling.

@Mushoz
Copy link

Mushoz commented Aug 28, 2025

@iamlemec which ones are you talking about?

I am looking at this one: https://huggingface.co/Qwen/Qwen3-Embedding-0.6B-GGUF/tree/main
But that has been updated 14th of July, way before this PR was merged.

Same holds true for the 8b version: https://huggingface.co/Qwen/Qwen3-Embedding-8B-GGUF/tree/main

@iamlemec
Copy link
Collaborator Author

@Mushoz Huh, you're right. Maybe I missed that update before and was still using the old ones? Anyway, the ones you linked to have the pooling type properly specified and seem to work as expected!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants