Skip to content

Conversation

@slaren
Copy link
Member

@slaren slaren commented Aug 31, 2025

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
@slaren slaren merged commit 9777032 into master Aug 31, 2025
42 of 48 checks passed
@slaren slaren deleted the sl/fix-fattn-reserve branch August 31, 2025 13:49
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 31, 2025
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants