Skip to content

Conversation

@IMbackK
Copy link
Collaborator

@IMbackK IMbackK commented Mar 4, 2025

This refactors mmqv to unify the handling of parameters between host and device side code, avoideing duplication in calculateing nwarps and rows_per_cuda_block. Also explicitly handles wave_size != 32, for the minor benefit of getting us out of shared memory into warp level primitives one iteration earlier.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 4, 2025
…s in device code, even though that should not be a problem.
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an enum instead of an int to determine the table.

@IMbackK IMbackK requested a review from JohannesGaessler March 6, 2025 21:48
Co-authored-by: Johannes Gäßler <[email protected]>
@IMbackK IMbackK merged commit 10f2e81 into ggml-org:master Mar 11, 2025
47 checks passed
ishaangandhi pushed a commit to ishaangandhi/llama.cpp that referenced this pull request Mar 12, 2025
…per block between host and device code. (ggml-org#12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <[email protected]>
jpohhhh pushed a commit to Telosnex/llama.cpp that referenced this pull request Mar 14, 2025
…per block between host and device code. (ggml-org#12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
…per block between host and device code. (ggml-org#12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants