Skip to content

Conversation

Green-Sky
Copy link
Collaborator

@Green-Sky Green-Sky commented Aug 31, 2025

The goal is to write directly to disk, instead of writing a copy to memory first.

sd.cpp uses this, and models can get quiet large now (40gig +), so writing an extra 40gig to ram is meh.

Split commits for easier review.

@Green-Sky Green-Sky changed the title gguf: gguf_writer refactor, gguf: gguf_writer refactor Aug 31, 2025
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 31, 2025
@Green-Sky Green-Sky force-pushed the gguf_write_refactor branch from 634ec9f to 992a16b Compare August 31, 2025 12:53
@Green-Sky Green-Sky marked this pull request as ready for review August 31, 2025 12:57
@Green-Sky Green-Sky requested a review from slaren September 1, 2025 12:17
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code should already be covered by test-gguf.

}

// file based writer
struct gguf_writer_file final : public gguf_writer_base {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it so that gguf_writer_file stops trying to write to the file once ok is false. Also please move it upwards so that the writer structs are all next to each other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it so that gguf_writer_file stops trying to write to the file once ok is false.

Not a perfect solution of breaking out of writing the file, but this turns writes effectively into nops.

Also please move it upwards so that the writer structs are all next to each other.

done.

@JohannesGaessler
Copy link
Collaborator

I forgot: please rebase onto the newest master commit to fix the CI.

@Green-Sky
Copy link
Collaborator Author

I forgot: please rebase onto the newest master commit to fix the CI.

Yes, that was my intention. CI just takes forever so I waited.

curl: (35) schannel: next InitializeSecurityContext failed: CRYPT_E_REVOCATION_OFFLINE (0x80092013) - The revocation function was unable to check revocation because the revocation server was offline.

Aaand looks like I had bad luck with the windows runners. Will rerun failed CI once the still running tasks have finished.

@Green-Sky Green-Sky force-pushed the gguf_write_refactor branch 3 times, most recently from 0596b1b to 90eb492 Compare September 3, 2025 12:54
@Green-Sky
Copy link
Collaborator Author

looks like the llama2c test is erroring in a way that gguf test does not catch.

I am going to change the code again and use exceptions, they are a better fit for early exit on error with what.

@Green-Sky Green-Sky marked this pull request as draft September 3, 2025 14:07
@Green-Sky Green-Sky marked this pull request as ready for review September 4, 2025 13:50
@Green-Sky
Copy link
Collaborator Author

Green-Sky commented Sep 4, 2025

Ok, I added the exceptions, for nicer error handling internally.
Also, the issue convert-llama2c had was fputc() returning an internally converted unsigned int8 value, resulting in it returning 226 instead of the -30 value we put into it. Simply casting the result to int8_t again would have been bad, since it can be EOF (int) on error and does not fit into 8bits (implementation defined), so I went and cast it to uint8_t first, to make the implicit conversion explicit.

@Green-Sky Green-Sky merged commit a812838 into ggml-org:master Sep 5, 2025
48 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 5, 2025
…g-model-disabled-agent-prefill

* origin/master: (84 commits)
CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802)
tests : add --list-ops and --show-coverage options (ggml-org#15745)
gguf: gguf_writer refactor (ggml-org#15691)
kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811)
model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801)
chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639)
chat : nemotron thinking & toolcalling support (ggml-org#15676)
scripts : add Jinja tester PySide6 simple app (ggml-org#15756)
llama : add support for EmbeddingGemma 300m (ggml-org#15798)
metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799)
llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791)
CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763)
server: add exceed_context_size_error type (ggml-org#15780)
Document the new max GPU layers default in help (ggml-org#15771)
ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669)
CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784)
opencl: add hs=40 to FA (ggml-org#15758)
CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760)
vulkan: fix mmv subgroup16 selection (ggml-org#15775)
vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724)
...
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 5, 2025
…upport

* origin/master:
Thinking model disabled assistant prefill (ggml-org#15404)
Implement --log-colors with always/never/auto (ggml-org#15792)
CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802)
tests : add --list-ops and --show-coverage options (ggml-org#15745)
gguf: gguf_writer refactor (ggml-org#15691)
kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811)
model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801)
chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639)
chat : nemotron thinking & toolcalling support (ggml-org#15676)
scripts : add Jinja tester PySide6 simple app (ggml-org#15756)
llama : add support for EmbeddingGemma 300m (ggml-org#15798)
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
* gguf: split gguf writer into base and buf impl
* gguf: templated gguf write out
* gguf: file based writer (avoid writing everything to memory first!)
* examples(llama2c): fix log not being the same level and compiler nits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants