Skip to content

Commit 11c3f1e

Browse files
author
Chenxiaotao03
committed
move android script to example/llava directory
1 parent 9303bbf commit 11c3f1e

File tree

7 files changed

+654
-0
lines changed

7 files changed

+654
-0
lines changed
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
cd /data/local/tmp /data/local/tmp/llava-cli -m /data/local/tmp/ggml-model-q4_k.gguf --mmproj /data/local/tmp/mmproj-model-f16.gguf -t 4 --image /data/local/tmp/cat.jpeg -p A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>
2+
What is in the image? ASSISTANT:
3+
llama_model_loader: loaded meta data with 22 key-value pairs and 219 tensors from /data/local/tmp/ggml-model-q4_k.gguf (version GGUF V3 (latest))
4+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
5+
llama_model_loader: - kv 0: general.architecture str = llama
6+
llama_model_loader: - kv 1: general.name str = mobileVLM
7+
llama_model_loader: - kv 2: llama.context_length u32 = 2048
8+
llama_model_loader: - kv 3: llama.embedding_length u32 = 2048
9+
llama_model_loader: - kv 4: llama.block_count u32 = 24
10+
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632
11+
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
12+
llama_model_loader: - kv 7: llama.attention.head_count u32 = 16
13+
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 16
14+
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
15+
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
16+
llama_model_loader: - kv 11: general.file_type u32 = 14
17+
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
18+
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
19+
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
20+
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
21+
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
22+
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
23+
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0
24+
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true
25+
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false
26+
llama_model_loader: - kv 21: general.quantization_version u32 = 2
27+
llama_model_loader: - type f32: 49 tensors
28+
llama_model_loader: - type q4_K: 161 tensors
29+
llama_model_loader: - type q5_K: 8 tensors
30+
llama_model_loader: - type q6_K: 1 tensors
31+
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
32+
llm_load_print_meta: format = GGUF V3 (latest)
33+
llm_load_print_meta: arch = llama
34+
llm_load_print_meta: vocab type = SPM
35+
llm_load_print_meta: n_vocab = 32000
36+
llm_load_print_meta: n_merges = 0
37+
llm_load_print_meta: n_ctx_train = 2048
38+
llm_load_print_meta: n_embd = 2048
39+
llm_load_print_meta: n_head = 16
40+
llm_load_print_meta: n_head_kv = 16
41+
llm_load_print_meta: n_layer = 24
42+
llm_load_print_meta: n_rot = 128
43+
llm_load_print_meta: n_embd_head_k = 128
44+
llm_load_print_meta: n_embd_head_v = 128
45+
llm_load_print_meta: n_gqa = 1
46+
llm_load_print_meta: n_embd_k_gqa = 2048
47+
llm_load_print_meta: n_embd_v_gqa = 2048
48+
llm_load_print_meta: f_norm_eps = 0.0e+00
49+
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
50+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
51+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
52+
llm_load_print_meta: n_ff = 5632
53+
llm_load_print_meta: n_expert = 0
54+
llm_load_print_meta: n_expert_used = 0
55+
llm_load_print_meta: rope scaling = linear
56+
llm_load_print_meta: freq_base_train = 10000.0
57+
llm_load_print_meta: freq_scale_train = 1
58+
llm_load_print_meta: n_yarn_orig_ctx = 2048
59+
llm_load_print_meta: rope_finetuned = unknown
60+
llm_load_print_meta: model type = ?B
61+
llm_load_print_meta: model ftype = Q4_K - Small
62+
llm_load_print_meta: model params = 1.36 B
63+
llm_load_print_meta: model size = 755.81 MiB (4.65 BPW)
64+
llm_load_print_meta: general.name = mobileVLM
65+
llm_load_print_meta: BOS token = 1 '<s>'
66+
llm_load_print_meta: EOS token = 2 '</s>'
67+
llm_load_print_meta: UNK token = 0 '<unk>'
68+
llm_load_print_meta: PAD token = 0 '<unk>'
69+
llm_load_print_meta: LF token = 13 '<0x0A>'
70+
llm_load_tensors: ggml ctx size = 0.08 MiB
71+
llm_load_tensors: offloading 0 repeating layers to GPU
72+
llm_load_tensors: offloaded 0/25 layers to GPU
73+
llm_load_tensors: CPU buffer size = 755.81 MiB
74+
...........................................................................................
75+
llama_new_context_with_model: n_ctx = 2048
76+
llama_new_context_with_model: freq_base = 10000.0
77+
llama_new_context_with_model: freq_scale = 1
78+
llama_kv_cache_init: CPU KV buffer size = 384.00 MiB
79+
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
80+
llama_new_context_with_model: graph splits (measure): 1
81+
llama_new_context_with_model: CPU compute buffer size = 80.00 MiB
82+
83+
clip_model_load: model name: openai/clip-vit-large-patch14-336
84+
clip_model_load: description: image encoder for LLaVA
85+
clip_model_load: GGUF version: 3
86+
clip_model_load: alignment: 32
87+
clip_model_load: n_tensors: 397
88+
clip_model_load: n_kv: 19
89+
clip_model_load: ftype: f16
90+
91+
clip_model_load: loaded meta data with 19 key-value pairs and 397 tensors from /data/local/tmp/mmproj-model-f16.gguf
92+
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
93+
clip_model_load: - kv 0: general.architecture str = clip
94+
clip_model_load: - kv 1: clip.has_text_encoder bool = false
95+
clip_model_load: - kv 2: clip.has_vision_encoder bool = true
96+
clip_model_load: - kv 3: clip.has_llava_projector bool = true
97+
clip_model_load: - kv 4: general.file_type u32 = 1
98+
clip_model_load: - kv 5: general.name str = openai/clip-vit-large-patch14-336
99+
clip_model_load: - kv 6: general.description str = image encoder for LLaVA
100+
clip_model_load: - kv 7: clip.projector_type str = ldp
101+
clip_model_load: - kv 8: clip.vision.image_size u32 = 336
102+
clip_model_load: - kv 9: clip.vision.patch_size u32 = 14
103+
clip_model_load: - kv 10: clip.vision.embedding_length u32 = 1024
104+
clip_model_load: - kv 11: clip.vision.feed_forward_length u32 = 4096
105+
clip_model_load: - kv 12: clip.vision.projection_dim u32 = 768
106+
clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16
107+
clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010
108+
clip_model_load: - kv 15: clip.vision.block_count u32 = 23
109+
clip_model_load: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
110+
clip_model_load: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
111+
clip_model_load: - kv 18: clip.use_gelu bool = false
112+
clip_model_load: - type f32: 247 tensors
113+
clip_model_load: - type f16: 150 tensors
114+
clip_model_load: CLIP using CPU backend
115+
clip_model_load: text_encoder: 0
116+
clip_model_load: vision_encoder: 1
117+
clip_model_load: llava_projector: 1
118+
clip_model_load: model size: 591.67 MB
119+
clip_model_load: metadata size: 0.15 MB
120+
clip_model_load: params backend buffer size = 591.67 MB (397 tensors)
121+
clip_model_load: compute allocated memory: 32.89 MB
122+
123+
encode_image_with_clip: image encoded in 56206.88 ms by CLIP ( 390.33 ms per image patch)
124+
In the image, a cat is laying down in an open field.
125+
llama_print_timings: load time = 62582.57 ms
126+
llama_print_timings: sample time = 5.19 ms / 16 runs ( 0.32 ms per token, 3083.45 tokens per second)
127+
llama_print_timings: prompt eval time = 50748.39 ms / 232 tokens ( 218.74 ms per token, 4.57 tokens per second)
128+
llama_print_timings: eval time = 4449.44 ms / 16 runs ( 278.09 ms per token, 3.60 tokens per second)
129+
llama_print_timings: total time = 111847.09 ms / 248 tokens
130+
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
cd /data/local/tmp /data/local/tmp/llava-cli -m /data/local/tmp/ggml-model-q4_k.gguf --mmproj /data/local/tmp/mmproj-model-f16.gguf -t 4 --image /data/local/tmp/demo.jpg -p A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>
2+
Who is the author of this book?
3+
Answer the question using a single word or phrase. ASSISTANT:
4+
llama_model_loader: loaded meta data with 22 key-value pairs and 219 tensors from /data/local/tmp/ggml-model-q4_k.gguf (version GGUF V3 (latest))
5+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
6+
llama_model_loader: - kv 0: general.architecture str = llama
7+
llama_model_loader: - kv 1: general.name str = mobileVLM
8+
llama_model_loader: - kv 2: llama.context_length u32 = 2048
9+
llama_model_loader: - kv 3: llama.embedding_length u32 = 2048
10+
llama_model_loader: - kv 4: llama.block_count u32 = 24
11+
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632
12+
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
13+
llama_model_loader: - kv 7: llama.attention.head_count u32 = 16
14+
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 16
15+
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
16+
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
17+
llama_model_loader: - kv 11: general.file_type u32 = 14
18+
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
19+
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
20+
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
21+
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
22+
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
23+
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
24+
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0
25+
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true
26+
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false
27+
llama_model_loader: - kv 21: general.quantization_version u32 = 2
28+
llama_model_loader: - type f32: 49 tensors
29+
llama_model_loader: - type q4_K: 161 tensors
30+
llama_model_loader: - type q5_K: 8 tensors
31+
llama_model_loader: - type q6_K: 1 tensors
32+
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
33+
llm_load_print_meta: format = GGUF V3 (latest)
34+
llm_load_print_meta: arch = llama
35+
llm_load_print_meta: vocab type = SPM
36+
llm_load_print_meta: n_vocab = 32000
37+
llm_load_print_meta: n_merges = 0
38+
llm_load_print_meta: n_ctx_train = 2048
39+
llm_load_print_meta: n_embd = 2048
40+
llm_load_print_meta: n_head = 16
41+
llm_load_print_meta: n_head_kv = 16
42+
llm_load_print_meta: n_layer = 24
43+
llm_load_print_meta: n_rot = 128
44+
llm_load_print_meta: n_embd_head_k = 128
45+
llm_load_print_meta: n_embd_head_v = 128
46+
llm_load_print_meta: n_gqa = 1
47+
llm_load_print_meta: n_embd_k_gqa = 2048
48+
llm_load_print_meta: n_embd_v_gqa = 2048
49+
llm_load_print_meta: f_norm_eps = 0.0e+00
50+
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
51+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
52+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
53+
llm_load_print_meta: n_ff = 5632
54+
llm_load_print_meta: n_expert = 0
55+
llm_load_print_meta: n_expert_used = 0
56+
llm_load_print_meta: rope scaling = linear
57+
llm_load_print_meta: freq_base_train = 10000.0
58+
llm_load_print_meta: freq_scale_train = 1
59+
llm_load_print_meta: n_yarn_orig_ctx = 2048
60+
llm_load_print_meta: rope_finetuned = unknown
61+
llm_load_print_meta: model type = ?B
62+
llm_load_print_meta: model ftype = Q4_K - Small
63+
llm_load_print_meta: model params = 1.36 B
64+
llm_load_print_meta: model size = 755.81 MiB (4.65 BPW)
65+
llm_load_print_meta: general.name = mobileVLM
66+
llm_load_print_meta: BOS token = 1 '<s>'
67+
llm_load_print_meta: EOS token = 2 '</s>'
68+
llm_load_print_meta: UNK token = 0 '<unk>'
69+
llm_load_print_meta: PAD token = 0 '<unk>'
70+
llm_load_print_meta: LF token = 13 '<0x0A>'
71+
llm_load_tensors: ggml ctx size = 0.08 MiB
72+
llm_load_tensors: offloading 0 repeating layers to GPU
73+
llm_load_tensors: offloaded 0/25 layers to GPU
74+
llm_load_tensors: CPU buffer size = 755.81 MiB
75+
...........................................................................................
76+
llama_new_context_with_model: n_ctx = 2048
77+
llama_new_context_with_model: freq_base = 10000.0
78+
llama_new_context_with_model: freq_scale = 1
79+
llama_kv_cache_init: CPU KV buffer size = 384.00 MiB
80+
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
81+
llama_new_context_with_model: graph splits (measure): 1
82+
llama_new_context_with_model: CPU compute buffer size = 80.00 MiB
83+
84+
clip_model_load: model name: openai/clip-vit-large-patch14-336
85+
clip_model_load: description: image encoder for LLaVA
86+
clip_model_load: GGUF version: 3
87+
clip_model_load: alignment: 32
88+
clip_model_load: n_tensors: 397
89+
clip_model_load: n_kv: 19
90+
clip_model_load: ftype: f16
91+
92+
clip_model_load: loaded meta data with 19 key-value pairs and 397 tensors from /data/local/tmp/mmproj-model-f16.gguf
93+
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
94+
clip_model_load: - kv 0: general.architecture str = clip
95+
clip_model_load: - kv 1: clip.has_text_encoder bool = false
96+
clip_model_load: - kv 2: clip.has_vision_encoder bool = true
97+
clip_model_load: - kv 3: clip.has_llava_projector bool = true
98+
clip_model_load: - kv 4: general.file_type u32 = 1
99+
clip_model_load: - kv 5: general.name str = openai/clip-vit-large-patch14-336
100+
clip_model_load: - kv 6: general.description str = image encoder for LLaVA
101+
clip_model_load: - kv 7: clip.projector_type str = ldp
102+
clip_model_load: - kv 8: clip.vision.image_size u32 = 336
103+
clip_model_load: - kv 9: clip.vision.patch_size u32 = 14
104+
clip_model_load: - kv 10: clip.vision.embedding_length u32 = 1024
105+
clip_model_load: - kv 11: clip.vision.feed_forward_length u32 = 4096
106+
clip_model_load: - kv 12: clip.vision.projection_dim u32 = 768
107+
clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16
108+
clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010
109+
clip_model_load: - kv 15: clip.vision.block_count u32 = 23
110+
clip_model_load: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
111+
clip_model_load: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
112+
clip_model_load: - kv 18: clip.use_gelu bool = false
113+
clip_model_load: - type f32: 247 tensors
114+
clip_model_load: - type f16: 150 tensors
115+
clip_model_load: CLIP using CPU backend
116+
clip_model_load: text_encoder: 0
117+
clip_model_load: vision_encoder: 1
118+
clip_model_load: llava_projector: 1
119+
clip_model_load: model size: 591.67 MB
120+
clip_model_load: metadata size: 0.15 MB
121+
clip_model_load: params backend buffer size = 591.67 MB (397 tensors)
122+
clip_model_load: compute allocated memory: 32.89 MB
123+
124+
encode_image_with_clip: image encoded in 21306.56 ms by CLIP ( 147.96 ms per image patch)
125+
Susan Wise Bauer
126+
llama_print_timings: load time = 23592.53 ms
127+
llama_print_timings: sample time = 1.58 ms / 6 runs ( 0.26 ms per token, 3799.87 tokens per second)
128+
llama_print_timings: prompt eval time = 12890.41 ms / 246 tokens ( 52.40 ms per token, 19.08 tokens per second)
129+
llama_print_timings: eval time = 440.90 ms / 6 runs ( 73.48 ms per token, 13.61 tokens per second)
130+
llama_print_timings: total time = 34976.43 ms / 252 tokens
131+

0 commit comments

Comments
 (0)