Skip to content

Conversation

compilade
Copy link
Collaborator

@compilade compilade commented Aug 29, 2025

Should fix #15623
(targets #14810, but will be rebased)

This replaces the approach from #8482 to avoid using get_slice because it turns out it eagerly memmaps tensors which means on Windows this uses a lot of memory, and on Linux this inflates the resident set size.

Safetensors files are now parsed directly, since the format is simple enough. This will also eventually allow tracking the file ranges of tensors to maybe use os.copy_file_range when possible to make conversion of COW filesystems very fast (in a future PR).

On Linux, when using memray (a memory profiler), this change reduces the peak heap memory usage by quite a lot, and with GNU time, it also reduces the peak resident set size memory usage.

The previous behavior when observed with memray seems to be that safe_open puts all of the model into the heap (likely memmaped, though since the resident set size is smaller and grows). The new behavior when observed with memray is more similar to what I thought happened in the first place (bumps of memory usage at each processed tensor, but it goes back down between each).

Here's a table of the "Maximum resident set size (kbytes)" from time -v (when using GNU time) on a few models:

$ $(which time) -v python3 convert_hf_to_gguf.py /path/to/model_dir --outfile /path/to/model.gguf --outtype f16
Model Target type master (kbytes) This PR (kbytes)
https://huggingface.co/mistralai/Mistral-7B-v0.1 F16 10 334 248 1 129 248
https://huggingface.co/meta-llama/Llama-3.2-1B F16 3 023 112 2 104 256
https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct F16 9 165 048 2 680 124

Safetensors are already directly parsed since #12820 for remote models. This is similar, but for local models.


TODO:

  • Handle byteswapping on big-endian platforms?

Make sure to read the contributing guidelines before submitting a PR

@compilade compilade force-pushed the compilade/convert-safetensors-parse branch from 85edafe to 786b32d Compare September 1, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant