convert : parse safetensors directly #15667
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Should fix #15623
(targets #14810, but will be rebased)
This replaces the approach from #8482 to avoid using
get_slice
because it turns out it eagerly memmaps tensors which means on Windows this uses a lot of memory, and on Linux this inflates the resident set size.Safetensors files are now parsed directly, since the format is simple enough. This will also eventually allow tracking the file ranges of tensors to maybe use
os.copy_file_range
when possible to make conversion of COW filesystems very fast (in a future PR).On Linux, when using
memray
(a memory profiler), this change reduces the peak heap memory usage by quite a lot, and with GNUtime
, it also reduces the peak resident set size memory usage.The previous behavior when observed with
memray
seems to be thatsafe_open
puts all of the model into the heap (likely memmaped, though since the resident set size is smaller and grows). The new behavior when observed withmemray
is more similar to what I thought happened in the first place (bumps of memory usage at each processed tensor, but it goes back down between each).Here's a table of the "Maximum resident set size (kbytes)" from
time -v
(when using GNUtime
) on a few models:$ $(which time) -v python3 convert_hf_to_gguf.py /path/to/model_dir --outfile /path/to/model.gguf --outtype f16
master
(kbytes)Safetensors are already directly parsed since #12820 for remote models. This is similar, but for local models.
TODO:
Make sure to read the contributing guidelines before submitting a PR