-
Notifications
You must be signed in to change notification settings - Fork 13.3k
gguf: gguf_writer refactor #15691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gguf: gguf_writer refactor #15691
Conversation
634ec9f
to
992a16b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code should already be covered by test-gguf
.
ggml/src/gguf.cpp
Outdated
} | ||
|
||
// file based writer | ||
struct gguf_writer_file final : public gguf_writer_base { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it so that gguf_writer_file
stops trying to write to the file once ok
is false. Also please move it upwards so that the writer structs are all next to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it so that gguf_writer_file stops trying to write to the file once ok is false.
Not a perfect solution of breaking out of writing the file, but this turns writes effectively into nops.
Also please move it upwards so that the writer structs are all next to each other.
done.
I forgot: please rebase onto the newest master commit to fix the CI. |
992a16b
to
409c80f
Compare
Yes, that was my intention. CI just takes forever so I waited.
Aaand looks like I had bad luck with the windows runners. Will rerun failed CI once the still running tasks have finished. |
0596b1b
to
90eb492
Compare
looks like the llama2c test is erroring in a way that gguf test does not catch. I am going to change the code again and use exceptions, they are a better fit for early exit on error with |
1aa8f9a
to
f7f5b84
Compare
Ok, I added the exceptions, for nicer error handling internally. |
also fix fputc by first casting to uint8_t
f7f5b84
to
75bc8fa
Compare
…g-model-disabled-agent-prefill * origin/master: (84 commits) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) ...
…upport * origin/master: Thinking model disabled assistant prefill (ggml-org#15404) Implement --log-colors with always/never/auto (ggml-org#15792) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798)
* gguf: split gguf writer into base and buf impl * gguf: templated gguf write out * gguf: file based writer (avoid writing everything to memory first!) * examples(llama2c): fix log not being the same level and compiler nits
The goal is to write directly to disk, instead of writing a copy to memory first.
sd.cpp uses this, and models can get quiet large now (40gig +), so writing an extra 40gig to ram is meh.
Split commits for easier review.