gemma : allow offloading the output tensor #5646

slaren · 2024-02-21T20:52:17Z

Apply the same solution as with falcon to allow offloading the output tensor.

Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s master	t/s PR	speedup
gemma 7B Q4_K - Medium	4.77 GiB	8.54 B	CUDA	99	pp 512	1996.45 ± 206.39	3959.67 ± 110.36	1.98
gemma 7B Q4_K - Medium	4.77 GiB	8.54 B	CUDA	99	tg 128	60.60 ± 0.26	110.51 ± 0.30	1.82

~~@JohannesGaessler for some reason, compare-llama-bench.py does not work in this case. Do you know what may be reason?~~

Nvm, I think this is because this change caused the reported model size/params count to change.

Traceback (most recent call last):
  File "/home/diego/code/llama.cpp/scripts/compare-llama-bench.py", line 305, in <module>
    gpu_blas = bool(rows_full[0][KEY_PROPERTIES.index("gpu_blas")])
IndexError: list index out of range

This query seems to return 0 rows:

SELECT tb.cpu_info, tb.gpu_info, tb.n_gpu_layers, tb.main_gpu, tb.cuda, tb.opencl, tb.metal, tb.gpu_blas, tb.blas, tb.model_filename, tb.model_type, tb.model_size, tb.model_n_params, tb.n_batch, tb.n_threads, tb.type_k, tb.type_v, tb.no_kv_offload, tb.mul_mat_q, tb.tensor_split, tb.n_prompt, tb.n_gen, tb.n_prompt, tb.n_gen, AVG(tb.avg_ts), AVG(tc.avg_ts) FROM test tb JOIN test tc ON tb.cpu_info = tc.cpu_info AND tb.gpu_info = tc.gpu_info AND tb.n_gpu_layers = tc.n_gpu_layers AND tb.main_gpu = tc.main_gpu AND tb.cuda = tc.cuda AND tb.opencl = tc.opencl AND tb.metal = tc.metal AND tb.gpu_blas = tc.gpu_blas AND tb.blas = tc.blas AND tb.model_filename = tc.model_filename AND tb.model_type = tc.model_type AND tb.model_size = tc.model_size AND tb.model_n_params = tc.model_n_params AND tb.n_batch = tc.n_batch AND tb.n_threads = tc.n_threads AND tb.type_k = tc.type_k AND tb.type_v = tc.type_v AND tb.no_kv_offload = tc.no_kv_offload AND tb.mul_mat_q = tc.mul_mat_q AND tb.tensor_split = tc.tensor_split AND tb.n_prompt = tc.n_prompt AND tb.n_gen = tc.n_gen AND tb.build_commit = '89febfed' AND tc.build_commit = '22ca4ddb' GROUP BY tb.cpu_info, tb.gpu_info, tb.n_gpu_layers, tb.main_gpu, tb.cuda, tb.opencl, tb.metal, tb.gpu_blas, tb.blas, tb.model_filename, tb.model_type, tb.model_size, tb.model_n_params, tb.n_batch, tb.n_threads, tb.type_k, tb.type_v, tb.no_kv_offload, tb.mul_mat_q, tb.tensor_split, tb.n_prompt, tb.n_gen, tb.n_gen, tb.n_prompt ORDER BY tb.cpu_info, tb.gpu_info, tb.n_gpu_layers, tb.main_gpu, tb.cuda, tb.opencl, tb.metal, tb.gpu_blas, tb.blas, tb.model_filename, tb.model_type, tb.model_size, tb.model_n_params, tb.n_batch, tb.n_threads, tb.type_k, tb.type_v, tb.no_kv_offload, tb.mul_mat_q, tb.tensor_split, tb.n_prompt, tb.n_gen, tb.n_gen, tb.n_prompt;

This is the data in the tables:

sqlite> select * from test;
89febfed|2230|1|0|0|0|0|0|1|1|13th Gen Intel(R) Core(TM) i9-13900K|NVIDIA GeForce RTX 3090 Ti|models/gemma-7b-it-Q4_K_M.gguf|gemma 7B Q4_K - Medium|5121183744|8538074112|512|16|f16|f16|99|layer|0|0|1|0.00|1|512|0|2024-02-21T20:40:28Z|268801109|48015341|1944.929686|281.264199
89febfed|2230|1|0|0|0|0|0|1|1|13th Gen Intel(R) Core(TM) i9-13900K|NVIDIA GeForce RTX 3090 Ti|models/gemma-7b-it-Q4_K_M.gguf|gemma 7B Q4_K - Medium|5121183744|8538074112|512|16|f16|f16|99|layer|0|0|1|0.00|1|0|128|2024-02-21T20:40:30Z|2103439045|12619252|60.854476|0.364524
22ca4ddb|2231|1|0|0|0|0|0|1|1|13th Gen Intel(R) Core(TM) i9-13900K|NVIDIA GeForce RTX 3090 Ti|models/gemma-7b-it-Q4_K_M.gguf|gemma 7B Q4_K - Medium|5563772928|9324899328|512|16|f16|f16|99|layer|0|0|1|0.00|1|512|0|2024-02-21T20:40:42Z|128998178|2139241|3969.902909|64.418657
22ca4ddb|2231|1|0|0|0|0|0|1|1|13th Gen Intel(R) Core(TM) i9-13900K|NVIDIA GeForce RTX 3090 Ti|models/gemma-7b-it-Q4_K_M.gguf|gemma 7B Q4_K - Medium|5563772928|9324899328|512|16|f16|f16|99|layer|0|0|1|0.00|1|0|128|2024-02-21T20:40:43Z|1162848942|3929889|110.075492|0.371931

ref ggml-org/llama.cpp#5646 Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre · 2024-02-21T22:50:32Z

This change breaks CPU inference of the Q4_0 quant, with or without #5650:

llm_load_tensors: ggml ctx size =    0.10 MiB
ggml_new_object: not enough space in the context's memory pool (needed 101968, available 101600)
[1]    60495 segmentation fault (core dumped)

slaren · 2024-02-21T23:14:41Z

Should be fixed in #5651

gemma : allow offloading the output tensor

22ca4dd

ggerganov approved these changes Feb 21, 2024

View reviewed changes

slaren merged commit ba2135c into master Feb 21, 2024

slaren deleted the sl/gemma-offload-output branch February 21, 2024 21:18

cebtenzzre pushed a commit to nomic-ai/llama.cpp that referenced this pull request Feb 21, 2024

gemma : allow offloading the output tensor (ggml-org#5646)

5b244c5

cebtenzzre added a commit to nomic-ai/gpt4all that referenced this pull request Feb 21, 2024

llama.cpp: gemma: allow offloading the output tensor

0eedc14

ref ggml-org/llama.cpp#5646 Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre mentioned this pull request Feb 21, 2024

llama.cpp: gemma: allow offloading the output tensor nomic-ai/gpt4all#1997

Merged

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

gemma : allow offloading the output tensor (ggml-org#5646)

bbaa5b0

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

gemma : allow offloading the output tensor (ggml-org#5646)

bd16925

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gemma : allow offloading the output tensor #5646

gemma : allow offloading the output tensor #5646

Uh oh!

slaren commented Feb 21, 2024 •

edited

Loading

Uh oh!

cebtenzzre commented Feb 21, 2024

Uh oh!

slaren commented Feb 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gemma : allow offloading the output tensor #5646

gemma : allow offloading the output tensor #5646

Uh oh!

Conversation

slaren commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cebtenzzre commented Feb 21, 2024

Uh oh!

slaren commented Feb 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slaren commented Feb 21, 2024 •

edited

Loading