Skip to content

Conversation

@qnixsynapse
Copy link
Collaborator

@qnixsynapse qnixsynapse commented Mar 13, 2025

Commit 08d5986 implemented optimization of Q4_0 tensors on intel GPUs by opting to reorder the Q4 block to separate quantized weights and dequantize scaler.

However, since this commit required setting extras in init_tensor function, this commit did not check if the tensor type is indeed of Q4_0, which resulted in memory leak.

This change adds a condition to prevent memory leak.

ps. This is not a permanent solution. We should remove setting extras inside init_tensor function.

Tested with both non Q4_0 and Q4_0 models.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Mar 13, 2025
@slaren
Copy link
Member

slaren commented Mar 13, 2025

The extras should also be freed in the reset function of the buffer interface, otherwise this will still leak extras when Q4_0 tensors are allocated in a compute buffer (e.g. for KV quantization).

@qnixsynapse
Copy link
Collaborator Author

@NeoZhangJianyu Can you review this PR please?

Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great!
Thank you!

@NeoZhangJianyu NeoZhangJianyu merged commit b3c9a65 into ggml-org:master Mar 17, 2025
47 checks passed
@qnixsynapse qnixsynapse deleted the fix/memory_leak branch March 17, 2025 02:28
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
* SYCL: set extras only on GGML_TYPE_Q4_0

* release tensor_extras in reset buffer interface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants