-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Added quantization for the visual projector LLAVA, Qwen2VL #11644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Added README * Fixed the clip quantize implementation in the file
examples/llava/clip-quantize-cli.cpp
Outdated
| #include "llama.h" | ||
| #include "ggml.h" | ||
|
|
||
| void print_usage(int argc, char ** argv) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can mark this function static, otherwise GCC is failing with "no previous prototypes." And, you can also consume argc with a simple void macro --they are warnings in fact but -Werror flag treats all warnings as errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just fixed this. Thanks! :D
|
LGTM, waiting for the tests to go green then merging. Thanks! |
|
I hope this does not complicates and delays computing CLIP on GPU |
I have tested this on CPU and it has relatively the same processing time as unquantized version. Since, GPU currently is not supported by llama.cpp's GPU so I don't dare to speak much about it. Would love to see a trial run for it if you have the time |
…-org#11644) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace
…-org#11644) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace
…-org#11644) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace
…-org#11644) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace
|
你好,请教下,对于Qwen2VL模型我对clip做了q4量化之后,是image相关处理速度会变快吗,比如image encoder速度这些?还是说只是单纯减少了模型大小。谢谢! |
For my Qwen2VL 2B experiment, the quantization reduces the file size from 1.4 GB to 400 MB. So this gives a massive benefit in file size. Although I heard about the others working on a generic API for Visual LM, but the implementation might still take some time. This can work as a solution until there is further progress in that other topic.
@monatis @HimariO @jeffbolznv @ggerganov poking you guys, just in case you have the time to review. I think this will benefit many people :)