Skip to content

Conversation

l3utterfly
Copy link
Contributor

Currently saving loading kv cache of recurrent memory crashes because layers can be null.

This mainly applies to the new LiquidAI/LFM2 models.

Tested with: https://huggingface.co/LiquidAI/LFM2-350M-GGUF

handle saving/loading null layers in recurrent memory
@ggerganov ggerganov requested a review from compilade July 14, 2025 11:01
Copy link
Collaborator

@compilade compilade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @l3utterfly! I've tested this with a Jamba model and llama-save-load-state, and it was indeed failing before, and is fixed by this change.

I'll add a test case to #14139 (once I also add variants for hybrid models) to help automatically detecting this kind of regression with hybrid architectures in the future.

Co-authored-by: Sigbjørn Skjæret <[email protected]>
@ggerganov ggerganov merged commit 7233358 into ggml-org:master Jul 23, 2025
47 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 23, 2025
* origin/master: (49 commits)
ci : correct label refactor->refactoring (ggml-org#14832)
CUDA: fix quantized KV cache + multiple sequences (ggml-org#14822)
tests : add non-cont K,V FA tests
memory : handle saving/loading null layers in recurrent memory (ggml-org#14675)
ggml: fix loongarch quantize_row_q8_1 error (ggml-org#14827)
CANN: weight format to NZ for Ascend310P3 (ggml-org#14407)
CUDA: add fused rms norm (ggml-org#14800)
ggml : model card yaml tab->2xspace (ggml-org#14819)
vulkan: fix rms_norm_mul to handle broadcasting dim0 (ggml-org#14817)
llama : add model type detection for rwkv7 7B&14B (ggml-org#14816)
imatrix: add option to display importance score statistics for a given imatrix file (ggml-org#12718)
Mtmd: add a way to select device for vision encoder (ggml-org#14236)
cuda : implement bf16 cpy ops and enable bf16 cont (ggml-org#14763)
opencl: remove unreachable `return` (ggml-org#14806)
server : allow setting `--reverse-prompt` arg (ggml-org#14799)
cuda: remove linking to cublasLt (ggml-org#14790)
opencl: fix `im2col` when `KW!=KH` (ggml-org#14803)
opencl: add conv2d kernel (ggml-org#14403)
sycl: Fix im2col (ggml-org#14797)
kleidiai: add support for get_rows (ggml-org#14676)
...
taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Jul 25, 2025
…org#14675)

* Update llama-memory-recurrent.cpp

handle saving/loading null layers in recurrent memory

* fixed styling issues and updated comments

* fix styling issue

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
@l3utterfly l3utterfly deleted the rmem-save-load-fix branch August 24, 2025 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants