Skip to content

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Aug 29, 2025

ref #15602 (comment)

  • Avoid Vcur = ggml_cont_3d(..) when the QKV weights are merged in a single tensor
  • Make llama_kv_cache:: cpy_k and cpy_v more readable

@CISC CISC mentioned this pull request Aug 29, 2025
4 tasks
@ggerganov ggerganov marked this pull request as ready for review September 7, 2025 17:24
@ggerganov ggerganov force-pushed the gg/model-avoid-cont3d branch from f15d515 to 3dec397 Compare September 8, 2025 06:47
Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with CodeQwen1.5, Phi2, jina-embeddings-v3 and PLaMo2.

@CISC
Copy link
Collaborator

CISC commented Sep 8, 2025

@ggerganov
Copy link
Member Author

@CISC
Copy link
Collaborator

CISC commented Sep 8, 2025

Hmmm, https://github.com/ggml-org/ci/blob/results/llama.cpp/60/d6e7c6fd8bacac0892b8722f5d5c585139cb43/ggml-4-x86-cuda-v100/stdall#L1957

This is due to #15687

Ah, I get a segfault locally though at the first REPEAT test after ARGMAX.

@ggerganov
Copy link
Member Author

On my end, all tests except GET_ROWS and the new IM2COL_3D are passing.

@CISC
Copy link
Collaborator

CISC commented Sep 8, 2025

On my end, all tests except GET_ROWS and the new IM2COL_3D are passing.

Nvm, must have been some other issue pre-rebase, I pulled latest changes and applied #15868 and everything is fine now.

Edit: Eh, almost, got GGML_ASSERT(ggml_is_contiguous(src0)) on PAD, but that's surely not related. It's pad_ext test with v == true. Fixed in #15869

@ggerganov ggerganov merged commit cf0e3ba into master Sep 8, 2025
52 of 55 checks passed
@ggerganov ggerganov deleted the gg/model-avoid-cont3d branch September 8, 2025 07:25
njsyw1997 pushed a commit to aizip/llama.cpp that referenced this pull request Sep 10, 2025
* model : avoid ggml_cont_3d for fused QKV weights

ggml-ci

* kv-cache : make cpy_k and cpy_v implementation more readable

ggml-ci

* cont : add comments

ggml-ci

* cont : minor fix [no ci]

* cont : one more fix

* cont : clarity

ggml-ci

* kv-cache : require contiguous heads of k_cur and v_cur

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants