Skip to content

Conversation

@samkoesnadi
Copy link
Contributor

  • Added README
  • Fixed the clip quantize implementation in the clip.cpp file

For my Qwen2VL 2B experiment, the quantization reduces the file size from 1.4 GB to 400 MB. So this gives a massive benefit in file size. Although I heard about the others working on a generic API for Visual LM, but the implementation might still take some time. This can work as a solution until there is further progress in that other topic.

@monatis @HimariO @jeffbolznv @ggerganov poking you guys, just in case you have the time to review. I think this will benefit many people :)

* Added README
* Fixed the clip quantize implementation in the file
#include "llama.h"
#include "ggml.h"

void print_usage(int argc, char ** argv) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mark this function static, otherwise GCC is failing with "no previous prototypes." And, you can also consume argc with a simple void macro --they are warnings in fact but -Werror flag treats all warnings as errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just fixed this. Thanks! :D

@monatis
Copy link
Collaborator

monatis commented Feb 5, 2025

LGTM, waiting for the tests to go green then merging. Thanks!

@monatis monatis merged commit 1ec2080 into ggml-org:master Feb 5, 2025
46 checks passed
@samkoesnadi samkoesnadi deleted the feat/clip-quantize branch February 5, 2025 08:33
@SmallAndSoft
Copy link

I hope this does not complicates and delays computing CLIP on GPU

@samkoesnadi
Copy link
Contributor Author

I hope this does not complicates and delays computing CLIP on GPU

I have tested this on CPU and it has relatively the same processing time as unquantized version. Since, GPU currently is not supported by llama.cpp's GPU so I don't dare to speak much about it. Would love to see a trial run for it if you have the time

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
…-org#11644)

* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file

* Fixed the gcc warning regarding minor linting

* Removed trailing whitespace
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
…-org#11644)

* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file

* Fixed the gcc warning regarding minor linting

* Removed trailing whitespace
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
…-org#11644)

* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file

* Fixed the gcc warning regarding minor linting

* Removed trailing whitespace
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
…-org#11644)

* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file

* Fixed the gcc warning regarding minor linting

* Removed trailing whitespace
@cmj18
Copy link

cmj18 commented Mar 13, 2025

你好,请教下,对于Qwen2VL模型我对clip做了q4量化之后,是image相关处理速度会变快吗,比如image encoder速度这些?还是说只是单纯减少了模型大小。谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants