-
Notifications
You must be signed in to change notification settings - Fork 13.7k
MobileVLM native implementation #4954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ee3c4b1 to
ea6cdcc
Compare
| } | ||
|
|
||
| // ggml_conv_depthwise | ||
| struct ggml_tensor * ggml_conv_depthwise_2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the ggml_conv_depthwise_2d operator be represented as a combination of existing operators. For example, ggml_conv_2d is expressed via ggml_im2col, ggml_mul_mat and ggml_reshape:
If yes, it would be a better approach since it will allow directly GPU offload support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing the code implementation in PyTorch, it seems that this operation (ggml_conv_depthwise_2d) is a conv2d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Thanks for your advices, We remove the ggml_conv_depthwise_2d native implementation, reconstruct it by ggml_im2col, ggml_mul_matand ggml_reshape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Does it run on the GPU now or is it still missing some operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov it can't run on GPU currently. there are two problems: 1. ggml-cuda doesn't support pool2d operator; 2. ggml_cuda_can_mul_mat requires ne0>=32, ne1>=32, ne2>=3, but depthwise conv with kernel 3x3's ne=[0]=9. What is your opinions about the two problems ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extend support for these eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, we'll try to extend support it basically.
|
Really looking forward to the full offloaded support, great to see. |
…wo by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake
11c3f1e to
0936740
Compare
0936740 to
c37859b
Compare
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the editor config checks
thanks, it's done. |
|
@XiaotaoChen I don't think your example would actually work, you use -p "" as template but llava-cli uses -p "" only as User question and hard-codes Vicuna for llava-1.5 as template. P.S. from the discussion it sounds like minimal work is missing for a full GPU offload, is that still planned ? |
|
To add GPU support to this new model, it is necessary to create 3 new kernels for |
* MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <[email protected]>
* MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <[email protected]>
MobileVLM
Currently this implementation supports MobileVLM-v1.7 variants.
for more information, please go to Meituan-AutoML/MobileVLM
The implementation is based on llava, and is compatible with llava and mobileVLM. The usage is basically same as llava.
Usage
Build with cmake or run
make llava-clito build it.After building, run:
./llava-clito see the usage. For example:./llava-cli -m MobileVLM-1.7B/ggml-model-q4_k.gguf \ --mmproj MobileVLM-1.7B/mmproj-model-f16.gguf \ --image path/to/an/image.jpg \ -p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWho is the author of this book? Answer the question using a single word or phrase. ASSISTANT:"Model conversion
mobileVLM-1.7Bandclip-vit-large-patch14-336locally:llava-surgery.pyto split the LLaVA model to LLaMA and multimodel projector constituents:convert-image-encoder-to-gguf.pywith--projector-type ldpto convert the LLaVA image encoder to GGUF:python ./examples/llava/convert-image-encoder-to-gguf \ -m path/to/clip-vit-large-patch14-336 \ --llava-projector path/to/MobileVLM-1.7B/llava.projector \ --output-dir path/to/MobileVLM-1.7B \ --projector-type ldpconvert.pyto convert the LLaMA part of LLaVA to GGUF:quantizeto convert LLaMA part's DataType fromfp16toq4_kNow both the LLaMA part and the image encoder is in the
MobileVLM-1.7Bdirectory.Android compile and run
compile
refer to
android/build_64.shmkdir android/build_64 cd android/build_64 ../build_64.shrun on Android
refer to
android/adb_run.sh, modify resources'nameandpathsome result on Android with
Snapdragon 888chipcase 1
input
/data/local/tmp/llava-cli \ -m /data/local/tmp/ggml-model-q4_k.gguf \ --mmproj /data/local/tmp/mmproj-model-f16.gguf \ -t 4 \ --image /data/local/tmp/demo.jpg \ -p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWho is the author of this book? \nAnswer the question using a single word or phrase. ASSISTANT:"output
case 2
input
/data/local/tmp/llava-cli \ -m /data/local/tmp/ggml-model-q4_k.gguf \ --mmproj /data/local/tmp/mmproj-model-f16.gguf \ -t 4 \ --image /data/local/tmp/cat.jpeg \ -p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWhat is in the image? ASSISTANT:"output
TODO
Support non-CPU backend for the new operators, such as
depthwise,hardswish,hardsigmoidOptimize LDP projector performance
run MobileVLM on
Jetson OrinSupport more model variants, such as
MobileVLM-3B.contributor