Added quantization for the visual projector LLAVA, Qwen2VL #11644

samkoesnadi · 2025-02-04T06:06:51Z

Added README
Fixed the clip quantize implementation in the clip.cpp file

For my Qwen2VL 2B experiment, the quantization reduces the file size from 1.4 GB to 400 MB. So this gives a massive benefit in file size. Although I heard about the others working on a generic API for Visual LM, but the implementation might still take some time. This can work as a solution until there is further progress in that other topic.

@monatis @HimariO @jeffbolznv @ggerganov poking you guys, just in case you have the time to review. I think this will benefit many people :)

* Added README * Fixed the clip quantize implementation in the file

monatis · 2025-02-05T05:45:35Z

examples/llava/clip-quantize-cli.cpp

+#include "llama.h"
+#include "ggml.h"
+
+void print_usage(int argc, char ** argv) {


You can mark this function static, otherwise GCC is failing with "no previous prototypes." And, you can also consume argc with a simple void macro --they are warnings in fact but -Werror flag treats all warnings as errors.

I have just fixed this. Thanks! :D

examples/llava/clip-quantize-cli.cpp

monatis · 2025-02-05T07:31:50Z

LGTM, waiting for the tests to go green then merging. Thanks!

SmallAndSoft · 2025-02-05T10:56:10Z

I hope this does not complicates and delays computing CLIP on GPU

samkoesnadi · 2025-02-05T11:28:37Z

I hope this does not complicates and delays computing CLIP on GPU

I have tested this on CPU and it has relatively the same processing time as unquantized version. Since, GPU currently is not supported by llama.cpp's GPU so I don't dare to speak much about it. Would love to see a trial run for it if you have the time

…-org#11644) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace

cmj18 · 2025-03-13T03:32:24Z

你好，请教下，对于Qwen2VL模型我对clip做了q4量化之后，是image相关处理速度会变快吗，比如image encoder速度这些？还是说只是单纯减少了模型大小。谢谢！

Added quantization for visual projector

20806cf

* Added README * Fixed the clip quantize implementation in the file

github-actions bot added the examples label Feb 4, 2025

monatis suggested changes Feb 5, 2025

View reviewed changes

samkoesnadi added 2 commits February 5, 2025 14:11

Fixed the gcc warning regarding minor linting

cdb305c

Removed trailing whitespace

9373518

monatis approved these changes Feb 5, 2025

View reviewed changes

examples/llava/clip-quantize-cli.cpp Show resolved Hide resolved

monatis merged commit 1ec2080 into ggml-org:master Feb 5, 2025
46 checks passed

samkoesnadi deleted the feat/clip-quantize branch February 5, 2025 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added quantization for the visual projector LLAVA, Qwen2VL #11644

Added quantization for the visual projector LLAVA, Qwen2VL #11644

Uh oh!

samkoesnadi commented Feb 4, 2025

Uh oh!

monatis Feb 5, 2025

Uh oh!

samkoesnadi Feb 5, 2025

Uh oh!

Uh oh!

monatis commented Feb 5, 2025

Uh oh!

Uh oh!

SmallAndSoft commented Feb 5, 2025

Uh oh!

samkoesnadi commented Feb 5, 2025

Uh oh!

cmj18 commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Added quantization for the visual projector LLAVA, Qwen2VL #11644

Added quantization for the visual projector LLAVA, Qwen2VL #11644

Uh oh!

Conversation

samkoesnadi commented Feb 4, 2025

Uh oh!

monatis Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

samkoesnadi Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

monatis commented Feb 5, 2025

Uh oh!

Uh oh!

SmallAndSoft commented Feb 5, 2025

Uh oh!

samkoesnadi commented Feb 5, 2025

Uh oh!

cmj18 commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants