Skip to content

gguf-py : add Numpy MXFP4 de/quantization support #15111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 8, 2025

Conversation

compilade
Copy link
Collaborator

MXFP4 support was added in #15091, but Python-based (de)quantization is missing, which this should fix.

Other change: I've removed the FMA workaround for Q4_0 and Q5_0 quantization functions, since the C versions don't use FMA since a while ago (not sure exactly when).

All correctness tests pass.

$ python3 gguf-py/tests/test_quants.py --type MXFP4
INFO:test-quants:Testing MXFP4
DEBUG:test-quants:Quantizing to MXFP4 with Python
DEBUG:test-quants:Quantizing to MXFP4 with C
INFO:test-quants:Quantization to MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing from MXFP4 with Python
DEBUG:test-quants:Dequantizing from MXFP4 with C
INFO:test-quants:Dequantization from MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with Python
gguf-py/gguf/quants.py:704: RuntimeWarning: overflow encountered in multiply
  return (d * qs.astype(np.float32))
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with C
INFO:test-quants:Dequantization from random f16 data as MXFP4 matches exactly ✅

Note that there's a RuntimeWarning from Numpy above, but the equality check still passes. The warning is likely caused by invalid E8M0 values from the random data which is dequantized in that test.

Full test results
$ python3 gguf-py/tests/test_quants.py
INFO:test-quants:Testing F16
DEBUG:test-quants:Quantizing to F16 with Python
DEBUG:test-quants:Quantizing to F16 with C
INFO:test-quants:Quantization to F16 matches exactly ✅
DEBUG:test-quants:Dequantizing from F16 with Python
DEBUG:test-quants:Dequantizing from F16 with C
INFO:test-quants:Dequantization from F16 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as F16 with Python
DEBUG:test-quants:Dequantizing random f16 data as F16 with C
INFO:test-quants:Dequantization from random f16 data as F16 matches exactly ✅
INFO:test-quants:Testing BF16
DEBUG:test-quants:Quantizing to BF16 with Python
DEBUG:test-quants:Quantizing to BF16 with C
INFO:test-quants:Quantization to BF16 matches exactly ✅
DEBUG:test-quants:Dequantizing from BF16 with Python
DEBUG:test-quants:Dequantizing from BF16 with C
INFO:test-quants:Dequantization from BF16 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as BF16 with Python
DEBUG:test-quants:Dequantizing random f16 data as BF16 with C
INFO:test-quants:Dequantization from random f16 data as BF16 matches exactly ✅
INFO:test-quants:Testing Q4_0
DEBUG:test-quants:Quantizing to Q4_0 with Python
DEBUG:test-quants:Quantizing to Q4_0 with C
INFO:test-quants:Quantization to Q4_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q4_0 with Python
DEBUG:test-quants:Dequantizing from Q4_0 with C
INFO:test-quants:Dequantization from Q4_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q4_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q4_0 with C
INFO:test-quants:Dequantization from random f16 data as Q4_0 matches exactly ✅
INFO:test-quants:Testing Q4_1
DEBUG:test-quants:Quantizing to Q4_1 with Python
DEBUG:test-quants:Quantizing to Q4_1 with C
INFO:test-quants:Quantization to Q4_1 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q4_1 with Python
DEBUG:test-quants:Dequantizing from Q4_1 with C
INFO:test-quants:Dequantization from Q4_1 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q4_1 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q4_1 with C
INFO:test-quants:Dequantization from random f16 data as Q4_1 matches exactly ✅
INFO:test-quants:Testing Q5_0
DEBUG:test-quants:Quantizing to Q5_0 with Python
DEBUG:test-quants:Quantizing to Q5_0 with C
INFO:test-quants:Quantization to Q5_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q5_0 with Python
DEBUG:test-quants:Dequantizing from Q5_0 with C
INFO:test-quants:Dequantization from Q5_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q5_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q5_0 with C
INFO:test-quants:Dequantization from random f16 data as Q5_0 matches exactly ✅
INFO:test-quants:Testing Q5_1
DEBUG:test-quants:Quantizing to Q5_1 with Python
DEBUG:test-quants:Quantizing to Q5_1 with C
INFO:test-quants:Quantization to Q5_1 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q5_1 with Python
DEBUG:test-quants:Dequantizing from Q5_1 with C
INFO:test-quants:Dequantization from Q5_1 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q5_1 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q5_1 with C
INFO:test-quants:Dequantization from random f16 data as Q5_1 matches exactly ✅
INFO:test-quants:Testing Q8_0
DEBUG:test-quants:Quantizing to Q8_0 with Python
DEBUG:test-quants:Quantizing to Q8_0 with C
INFO:test-quants:Quantization to Q8_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q8_0 with Python
DEBUG:test-quants:Dequantizing from Q8_0 with C
INFO:test-quants:Dequantization from Q8_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q8_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q8_0 with C
INFO:test-quants:Dequantization from random f16 data as Q8_0 matches exactly ✅
INFO:test-quants:Testing Q2_K
DEBUG:test-quants:Quantizing to Q2_K with C
DEBUG:test-quants:Dequantizing from Q2_K with Python
DEBUG:test-quants:Dequantizing from Q2_K with C
INFO:test-quants:Dequantization from Q2_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q2_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q2_K with C
INFO:test-quants:Dequantization from random f16 data as Q2_K matches exactly ✅
INFO:test-quants:Testing Q3_K
DEBUG:test-quants:Quantizing to Q3_K with C
DEBUG:test-quants:Dequantizing from Q3_K with Python
DEBUG:test-quants:Dequantizing from Q3_K with C
INFO:test-quants:Dequantization from Q3_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q3_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q3_K with C
INFO:test-quants:Dequantization from random f16 data as Q3_K matches exactly ✅
INFO:test-quants:Testing Q4_K
DEBUG:test-quants:Quantizing to Q4_K with C
DEBUG:test-quants:Dequantizing from Q4_K with Python
DEBUG:test-quants:Dequantizing from Q4_K with C
INFO:test-quants:Dequantization from Q4_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q4_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q4_K with C
INFO:test-quants:Dequantization from random f16 data as Q4_K matches exactly ✅
INFO:test-quants:Testing Q5_K
DEBUG:test-quants:Quantizing to Q5_K with C
DEBUG:test-quants:Dequantizing from Q5_K with Python
DEBUG:test-quants:Dequantizing from Q5_K with C
INFO:test-quants:Dequantization from Q5_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q5_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q5_K with C
INFO:test-quants:Dequantization from random f16 data as Q5_K matches exactly ✅
INFO:test-quants:Testing Q6_K
DEBUG:test-quants:Quantizing to Q6_K with C
DEBUG:test-quants:Dequantizing from Q6_K with Python
DEBUG:test-quants:Dequantizing from Q6_K with C
INFO:test-quants:Dequantization from Q6_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q6_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q6_K with C
INFO:test-quants:Dequantization from random f16 data as Q6_K matches exactly ✅
INFO:test-quants:Testing TQ1_0
DEBUG:test-quants:Quantizing to TQ1_0 with Python
DEBUG:test-quants:Quantizing to TQ1_0 with C
INFO:test-quants:Quantization to TQ1_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from TQ1_0 with Python
DEBUG:test-quants:Dequantizing from TQ1_0 with C
INFO:test-quants:Dequantization from TQ1_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as TQ1_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as TQ1_0 with C
INFO:test-quants:Dequantization from random f16 data as TQ1_0 matches exactly ✅
INFO:test-quants:Testing TQ2_0
DEBUG:test-quants:Quantizing to TQ2_0 with Python
DEBUG:test-quants:Quantizing to TQ2_0 with C
INFO:test-quants:Quantization to TQ2_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from TQ2_0 with Python
DEBUG:test-quants:Dequantizing from TQ2_0 with C
INFO:test-quants:Dequantization from TQ2_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as TQ2_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as TQ2_0 with C
INFO:test-quants:Dequantization from random f16 data as TQ2_0 matches exactly ✅
INFO:test-quants:Testing MXFP4
DEBUG:test-quants:Quantizing to MXFP4 with Python
DEBUG:test-quants:Quantizing to MXFP4 with C
INFO:test-quants:Quantization to MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing from MXFP4 with Python
DEBUG:test-quants:Dequantizing from MXFP4 with C
INFO:test-quants:Dequantization from MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with Python
gguf-py/gguf/quants.py:704: RuntimeWarning: overflow encountered in multiply
  return (d * qs.astype(np.float32))
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with C
INFO:test-quants:Dequantization from random f16 data as MXFP4 matches exactly ✅
INFO:test-quants:Testing IQ2_XXS
DEBUG:test-quants:Quantizing to IQ2_XXS with C
DEBUG:test-quants:Dequantizing from IQ2_XXS with Python
DEBUG:test-quants:Dequantizing from IQ2_XXS with C
INFO:test-quants:Dequantization from IQ2_XXS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XXS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XXS with C
INFO:test-quants:Dequantization from random f16 data as IQ2_XXS matches exactly ✅
INFO:test-quants:Testing IQ2_XS
DEBUG:test-quants:Quantizing to IQ2_XS with C
DEBUG:test-quants:Dequantizing from IQ2_XS with Python
DEBUG:test-quants:Dequantizing from IQ2_XS with C
INFO:test-quants:Dequantization from IQ2_XS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XS with C
INFO:test-quants:Dequantization from random f16 data as IQ2_XS matches exactly ✅
INFO:test-quants:Testing IQ2_S
DEBUG:test-quants:Quantizing to IQ2_S with C
DEBUG:test-quants:Dequantizing from IQ2_S with Python
DEBUG:test-quants:Dequantizing from IQ2_S with C
INFO:test-quants:Dequantization from IQ2_S matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ2_S with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ2_S with C
INFO:test-quants:Dequantization from random f16 data as IQ2_S matches exactly ✅
INFO:test-quants:Testing IQ3_XXS
DEBUG:test-quants:Quantizing to IQ3_XXS with C
DEBUG:test-quants:Dequantizing from IQ3_XXS with Python
DEBUG:test-quants:Dequantizing from IQ3_XXS with C
INFO:test-quants:Dequantization from IQ3_XXS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ3_XXS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ3_XXS with C
INFO:test-quants:Dequantization from random f16 data as IQ3_XXS matches exactly ✅
INFO:test-quants:Testing IQ3_S
DEBUG:test-quants:Quantizing to IQ3_S with C
DEBUG:test-quants:Dequantizing from IQ3_S with Python
DEBUG:test-quants:Dequantizing from IQ3_S with C
INFO:test-quants:Dequantization from IQ3_S matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ3_S with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ3_S with C
INFO:test-quants:Dequantization from random f16 data as IQ3_S matches exactly ✅
INFO:test-quants:Testing IQ1_S
DEBUG:test-quants:Quantizing to IQ1_S with C
DEBUG:test-quants:Dequantizing from IQ1_S with Python
DEBUG:test-quants:Dequantizing from IQ1_S with C
INFO:test-quants:Dequantization from IQ1_S matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ1_S with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ1_S with C
INFO:test-quants:Dequantization from random f16 data as IQ1_S matches exactly ✅
INFO:test-quants:Testing IQ1_M
DEBUG:test-quants:Quantizing to IQ1_M with C
DEBUG:test-quants:Dequantizing from IQ1_M with Python
DEBUG:test-quants:Dequantizing from IQ1_M with C
INFO:test-quants:Dequantization from IQ1_M matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ1_M with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ1_M with C
INFO:test-quants:Dequantization from random f16 data as IQ1_M matches exactly ✅
INFO:test-quants:Testing IQ4_NL
DEBUG:test-quants:Quantizing to IQ4_NL with C
DEBUG:test-quants:Dequantizing from IQ4_NL with Python
DEBUG:test-quants:Dequantizing from IQ4_NL with C
INFO:test-quants:Dequantization from IQ4_NL matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ4_NL with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ4_NL with C
INFO:test-quants:Dequantization from random f16 data as IQ4_NL matches exactly ✅
INFO:test-quants:Testing IQ4_XS
DEBUG:test-quants:Quantizing to IQ4_XS with C
DEBUG:test-quants:Dequantizing from IQ4_XS with Python
DEBUG:test-quants:Dequantizing from IQ4_XS with C
INFO:test-quants:Dequantization from IQ4_XS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ4_XS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ4_XS with C
INFO:test-quants:Dequantization from random f16 data as IQ4_XS matches exactly ✅

Make sure to read the contributing guidelines before submitting a PR

@compilade compilade added python python script changes Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes labels Aug 6, 2025
Comment on lines +673 to +674
with np.errstate(divide="ignore"):
e = np.where(d > 0, np.floor(np.log2(d)) - 2 + 127, 0).astype(np.uint8)
Copy link
Collaborator Author

@compilade compilade Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising that the C implementation

const uint8_t e = (uint8_t) (floorf(log2f(amax)) - 2 + 127);

which doesn't check for zero before calling log2f (!) still results in the same number (which is e = 0).

Apparently, that works (checked by ensuring there's some zeroed input blocks in the tests).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log2f(0) returns -inf, which when converted to int turns to zero. However, I don't think this behavior is guaranteed by the C/C++ standard, it may be either undefined or implementation-defined behavior, so it would be better to add a check for zero. Do you want to add it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add it here?

Sure. Done in 2763dc8

@ngxson
Copy link
Collaborator

ngxson commented Aug 6, 2025

Not exactly related to this PR, but mentioning this for visibility, I left out the possibility to dequantize MXFP4 in convert_hf_to_gguf because users may use it to requantize MXFP4 to other types, like Qx_K or Qx_0. This will lead to significant quality degradation, which potentially flood us with issues.

Here is the commit where I remove the code (it was copied from oai --> HF conversion code): 04cfb6d

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 6, 2025
@compilade
Copy link
Collaborator Author

I left out the possibility to dequantize MXFP4 in convert_hf_to_gguf because users may use it to requantize MXFP4 to other types, like Qx_K or Qx_0. This will lead to significant quality degradation, which potentially flood us with issues.

@ngxson
Makes sense. I'm considering to make repacking and type detection a bit more general in #14810, so that the auto outtype would keep types closer to the original. Otherwise adding MXFP4 to that PR would conflict with how it's handled for gpt-oss. Might need to add an API for repacking in gguf-py/gguf/quants.py to avoid recalculating scales. Not there yet, but that's what I'm planning.

@compilade compilade merged commit e54d41b into master Aug 8, 2025
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning python python script changes Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants