gguf: gguf_writer refactor #15691

Green-Sky · 2025-08-31T12:43:26Z

The goal is to write directly to disk, instead of writing a copy to memory first.

sd.cpp uses this, and models can get quiet large now (40gig +), so writing an extra 40gig to ram is meh.

Split commits for easier review.

JohannesGaessler

The new code should already be covered by test-gguf.

ggml/src/gguf.cpp

JohannesGaessler · 2025-09-02T11:13:40Z

ggml/src/gguf.cpp

 }

+// file based writer
+struct gguf_writer_file final : public gguf_writer_base {


Please make it so that gguf_writer_file stops trying to write to the file once ok is false. Also please move it upwards so that the writer structs are all next to each other.

Please make it so that gguf_writer_file stops trying to write to the file once ok is false.

Not a perfect solution of breaking out of writing the file, but this turns writes effectively into nops.

Also please move it upwards so that the writer structs are all next to each other.

done.

JohannesGaessler · 2025-09-02T11:15:36Z

I forgot: please rebase onto the newest master commit to fix the CI.

Green-Sky · 2025-09-02T17:12:59Z

I forgot: please rebase onto the newest master commit to fix the CI.

Yes, that was my intention. CI just takes forever so I waited.

curl: (35) schannel: next InitializeSecurityContext failed: CRYPT_E_REVOCATION_OFFLINE (0x80092013) - The revocation function was unable to check revocation because the revocation server was offline.

Aaand looks like I had bad luck with the windows runners. Will rerun failed CI once the still running tasks have finished.

Green-Sky · 2025-09-03T14:06:43Z

looks like the llama2c test is erroring in a way that gguf test does not catch.

I am going to change the code again and use exceptions, they are a better fit for early exit on error with what.

Green-Sky · 2025-09-04T13:54:31Z

Ok, I added the exceptions, for nicer error handling internally.
Also, the issue convert-llama2c had was fputc() returning an internally converted unsigned int8 value, resulting in it returning 226 instead of the -30 value we put into it. Simply casting the result to int8_t again would have been bad, since it can be EOF (int) on error and does not fit into 8bits (implementation defined), so I went and cast it to uint8_t first, to make the implicit conversion explicit.

examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp

also fix fputc by first casting to uint8_t

…g-model-disabled-agent-prefill * origin/master: (84 commits) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) ...

…upport * origin/master: Thinking model disabled assistant prefill (ggml-org#15404) Implement --log-colors with always/never/auto (ggml-org#15792) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798)

* gguf: split gguf writer into base and buf impl * gguf: templated gguf write out * gguf: file based writer (avoid writing everything to memory first!) * examples(llama2c): fix log not being the same level and compiler nits

Green-Sky changed the title ~~gguf: gguf_writer refactor,~~ gguf: gguf_writer refactor Aug 31, 2025

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 31, 2025

Green-Sky force-pushed the gguf_write_refactor branch from 634ec9f to 992a16b Compare August 31, 2025 12:53

Green-Sky marked this pull request as ready for review August 31, 2025 12:57

Green-Sky requested a review from JohannesGaessler as a code owner August 31, 2025 12:57

Green-Sky mentioned this pull request Aug 31, 2025

feat: add wan2.1/2.2 support leejet/stable-diffusion.cpp#778

Merged

3 tasks

Green-Sky requested a review from slaren September 1, 2025 12:17

JohannesGaessler reviewed Sep 2, 2025

View reviewed changes

Green-Sky force-pushed the gguf_write_refactor branch from 992a16b to 409c80f Compare September 2, 2025 15:11

Green-Sky force-pushed the gguf_write_refactor branch 3 times, most recently from 0596b1b to 90eb492 Compare September 3, 2025 12:54

Green-Sky marked this pull request as draft September 3, 2025 14:07

github-actions bot added the examples label Sep 3, 2025

Green-Sky added 6 commits September 4, 2025 15:48

gguf: split gguf writer into base and buf impl

1307710

gguf: templated gguf write out

9f03339

gguf: file based writer (avoid writing everything to memory first!)

b1c51f7

address review comments

837a101

move ok into base

91ac5c8

add error message

cb215e4

Green-Sky force-pushed the gguf_write_refactor branch from 1aa8f9a to f7f5b84 Compare September 4, 2025 13:48

Green-Sky marked this pull request as ready for review September 4, 2025 13:50

JohannesGaessler approved these changes Sep 4, 2025

View reviewed changes

examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp Outdated Show resolved Hide resolved

Green-Sky added 2 commits September 4, 2025 16:23

examples(llama2c): fix log not being the same level and compiler nits

2be1d0a

remove ok flag and replace with exceptions

75bc8fa

also fix fputc by first casting to uint8_t

Green-Sky force-pushed the gguf_write_refactor branch from f7f5b84 to 75bc8fa Compare September 4, 2025 14:23

Green-Sky merged commit a812838 into ggml-org:master Sep 5, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gguf: gguf_writer refactor #15691

gguf: gguf_writer refactor #15691

Uh oh!

Green-Sky commented Aug 31, 2025 •

edited

Loading

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

JohannesGaessler Sep 2, 2025

Uh oh!

Green-Sky Sep 2, 2025

Uh oh!

JohannesGaessler commented Sep 2, 2025

Uh oh!

Green-Sky commented Sep 2, 2025

Uh oh!

Green-Sky commented Sep 3, 2025

Uh oh!

Green-Sky commented Sep 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gguf: gguf_writer refactor #15691

gguf: gguf_writer refactor #15691

Uh oh!

Conversation

Green-Sky commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JohannesGaessler Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Green-Sky Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Sep 2, 2025

Uh oh!

Green-Sky commented Sep 2, 2025

Uh oh!

Green-Sky commented Sep 3, 2025

Uh oh!

Green-Sky commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Green-Sky commented Aug 31, 2025 •

edited

Loading

Green-Sky commented Sep 4, 2025 •

edited

Loading