gguf-py : add Numpy MXFP4 de/quantization support #15111

compilade · 2025-08-06T03:26:50Z

MXFP4 support was added in #15091, but Python-based (de)quantization is missing, which this should fix.

Other change: I've removed the FMA workaround for Q4_0 and Q5_0 quantization functions, since the C versions don't use FMA since a while ago (not sure exactly when).

All correctness tests pass.

$ python3 gguf-py/tests/test_quants.py --type MXFP4
INFO:test-quants:Testing MXFP4
DEBUG:test-quants:Quantizing to MXFP4 with Python
DEBUG:test-quants:Quantizing to MXFP4 with C
INFO:test-quants:Quantization to MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing from MXFP4 with Python
DEBUG:test-quants:Dequantizing from MXFP4 with C
INFO:test-quants:Dequantization from MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with Python
gguf-py/gguf/quants.py:704: RuntimeWarning: overflow encountered in multiply
  return (d * qs.astype(np.float32))
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with C
INFO:test-quants:Dequantization from random f16 data as MXFP4 matches exactly ✅

Note that there's a RuntimeWarning from Numpy above, but the equality check still passes. The warning is likely caused by invalid E8M0 values from the random data which is dequantized in that test.

Full test results

$ python3 gguf-py/tests/test_quants.py
INFO:test-quants:Testing F16
DEBUG:test-quants:Quantizing to F16 with Python
DEBUG:test-quants:Quantizing to F16 with C
INFO:test-quants:Quantization to F16 matches exactly ✅
DEBUG:test-quants:Dequantizing from F16 with Python
DEBUG:test-quants:Dequantizing from F16 with C
INFO:test-quants:Dequantization from F16 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as F16 with Python
DEBUG:test-quants:Dequantizing random f16 data as F16 with C
INFO:test-quants:Dequantization from random f16 data as F16 matches exactly ✅
INFO:test-quants:Testing BF16
DEBUG:test-quants:Quantizing to BF16 with Python
DEBUG:test-quants:Quantizing to BF16 with C
INFO:test-quants:Quantization to BF16 matches exactly ✅
DEBUG:test-quants:Dequantizing from BF16 with Python
DEBUG:test-quants:Dequantizing from BF16 with C
INFO:test-quants:Dequantization from BF16 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as BF16 with Python
DEBUG:test-quants:Dequantizing random f16 data as BF16 with C
INFO:test-quants:Dequantization from random f16 data as BF16 matches exactly ✅
INFO:test-quants:Testing Q4_0
DEBUG:test-quants:Quantizing to Q4_0 with Python
DEBUG:test-quants:Quantizing to Q4_0 with C
INFO:test-quants:Quantization to Q4_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q4_0 with Python
DEBUG:test-quants:Dequantizing from Q4_0 with C
INFO:test-quants:Dequantization from Q4_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q4_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q4_0 with C
INFO:test-quants:Dequantization from random f16 data as Q4_0 matches exactly ✅
INFO:test-quants:Testing Q4_1
DEBUG:test-quants:Quantizing to Q4_1 with Python
DEBUG:test-quants:Quantizing to Q4_1 with C
INFO:test-quants:Quantization to Q4_1 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q4_1 with Python
DEBUG:test-quants:Dequantizing from Q4_1 with C
INFO:test-quants:Dequantization from Q4_1 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q4_1 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q4_1 with C
INFO:test-quants:Dequantization from random f16 data as Q4_1 matches exactly ✅
INFO:test-quants:Testing Q5_0
DEBUG:test-quants:Quantizing to Q5_0 with Python
DEBUG:test-quants:Quantizing to Q5_0 with C
INFO:test-quants:Quantization to Q5_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q5_0 with Python
DEBUG:test-quants:Dequantizing from Q5_0 with C
INFO:test-quants:Dequantization from Q5_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q5_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q5_0 with C
INFO:test-quants:Dequantization from random f16 data as Q5_0 matches exactly ✅
INFO:test-quants:Testing Q5_1
DEBUG:test-quants:Quantizing to Q5_1 with Python
DEBUG:test-quants:Quantizing to Q5_1 with C
INFO:test-quants:Quantization to Q5_1 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q5_1 with Python
DEBUG:test-quants:Dequantizing from Q5_1 with C
INFO:test-quants:Dequantization from Q5_1 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q5_1 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q5_1 with C
INFO:test-quants:Dequantization from random f16 data as Q5_1 matches exactly ✅
INFO:test-quants:Testing Q8_0
DEBUG:test-quants:Quantizing to Q8_0 with Python
DEBUG:test-quants:Quantizing to Q8_0 with C
INFO:test-quants:Quantization to Q8_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from Q8_0 with Python
DEBUG:test-quants:Dequantizing from Q8_0 with C
INFO:test-quants:Dequantization from Q8_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q8_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as Q8_0 with C
INFO:test-quants:Dequantization from random f16 data as Q8_0 matches exactly ✅
INFO:test-quants:Testing Q2_K
DEBUG:test-quants:Quantizing to Q2_K with C
DEBUG:test-quants:Dequantizing from Q2_K with Python
DEBUG:test-quants:Dequantizing from Q2_K with C
INFO:test-quants:Dequantization from Q2_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q2_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q2_K with C
INFO:test-quants:Dequantization from random f16 data as Q2_K matches exactly ✅
INFO:test-quants:Testing Q3_K
DEBUG:test-quants:Quantizing to Q3_K with C
DEBUG:test-quants:Dequantizing from Q3_K with Python
DEBUG:test-quants:Dequantizing from Q3_K with C
INFO:test-quants:Dequantization from Q3_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q3_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q3_K with C
INFO:test-quants:Dequantization from random f16 data as Q3_K matches exactly ✅
INFO:test-quants:Testing Q4_K
DEBUG:test-quants:Quantizing to Q4_K with C
DEBUG:test-quants:Dequantizing from Q4_K with Python
DEBUG:test-quants:Dequantizing from Q4_K with C
INFO:test-quants:Dequantization from Q4_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q4_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q4_K with C
INFO:test-quants:Dequantization from random f16 data as Q4_K matches exactly ✅
INFO:test-quants:Testing Q5_K
DEBUG:test-quants:Quantizing to Q5_K with C
DEBUG:test-quants:Dequantizing from Q5_K with Python
DEBUG:test-quants:Dequantizing from Q5_K with C
INFO:test-quants:Dequantization from Q5_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q5_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q5_K with C
INFO:test-quants:Dequantization from random f16 data as Q5_K matches exactly ✅
INFO:test-quants:Testing Q6_K
DEBUG:test-quants:Quantizing to Q6_K with C
DEBUG:test-quants:Dequantizing from Q6_K with Python
DEBUG:test-quants:Dequantizing from Q6_K with C
INFO:test-quants:Dequantization from Q6_K matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as Q6_K with Python
DEBUG:test-quants:Dequantizing random f16 data as Q6_K with C
INFO:test-quants:Dequantization from random f16 data as Q6_K matches exactly ✅
INFO:test-quants:Testing TQ1_0
DEBUG:test-quants:Quantizing to TQ1_0 with Python
DEBUG:test-quants:Quantizing to TQ1_0 with C
INFO:test-quants:Quantization to TQ1_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from TQ1_0 with Python
DEBUG:test-quants:Dequantizing from TQ1_0 with C
INFO:test-quants:Dequantization from TQ1_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as TQ1_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as TQ1_0 with C
INFO:test-quants:Dequantization from random f16 data as TQ1_0 matches exactly ✅
INFO:test-quants:Testing TQ2_0
DEBUG:test-quants:Quantizing to TQ2_0 with Python
DEBUG:test-quants:Quantizing to TQ2_0 with C
INFO:test-quants:Quantization to TQ2_0 matches exactly ✅
DEBUG:test-quants:Dequantizing from TQ2_0 with Python
DEBUG:test-quants:Dequantizing from TQ2_0 with C
INFO:test-quants:Dequantization from TQ2_0 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as TQ2_0 with Python
DEBUG:test-quants:Dequantizing random f16 data as TQ2_0 with C
INFO:test-quants:Dequantization from random f16 data as TQ2_0 matches exactly ✅
INFO:test-quants:Testing MXFP4
DEBUG:test-quants:Quantizing to MXFP4 with Python
DEBUG:test-quants:Quantizing to MXFP4 with C
INFO:test-quants:Quantization to MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing from MXFP4 with Python
DEBUG:test-quants:Dequantizing from MXFP4 with C
INFO:test-quants:Dequantization from MXFP4 matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with Python
gguf-py/gguf/quants.py:704: RuntimeWarning: overflow encountered in multiply
  return (d * qs.astype(np.float32))
DEBUG:test-quants:Dequantizing random f16 data as MXFP4 with C
INFO:test-quants:Dequantization from random f16 data as MXFP4 matches exactly ✅
INFO:test-quants:Testing IQ2_XXS
DEBUG:test-quants:Quantizing to IQ2_XXS with C
DEBUG:test-quants:Dequantizing from IQ2_XXS with Python
DEBUG:test-quants:Dequantizing from IQ2_XXS with C
INFO:test-quants:Dequantization from IQ2_XXS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XXS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XXS with C
INFO:test-quants:Dequantization from random f16 data as IQ2_XXS matches exactly ✅
INFO:test-quants:Testing IQ2_XS
DEBUG:test-quants:Quantizing to IQ2_XS with C
DEBUG:test-quants:Dequantizing from IQ2_XS with Python
DEBUG:test-quants:Dequantizing from IQ2_XS with C
INFO:test-quants:Dequantization from IQ2_XS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ2_XS with C
INFO:test-quants:Dequantization from random f16 data as IQ2_XS matches exactly ✅
INFO:test-quants:Testing IQ2_S
DEBUG:test-quants:Quantizing to IQ2_S with C
DEBUG:test-quants:Dequantizing from IQ2_S with Python
DEBUG:test-quants:Dequantizing from IQ2_S with C
INFO:test-quants:Dequantization from IQ2_S matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ2_S with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ2_S with C
INFO:test-quants:Dequantization from random f16 data as IQ2_S matches exactly ✅
INFO:test-quants:Testing IQ3_XXS
DEBUG:test-quants:Quantizing to IQ3_XXS with C
DEBUG:test-quants:Dequantizing from IQ3_XXS with Python
DEBUG:test-quants:Dequantizing from IQ3_XXS with C
INFO:test-quants:Dequantization from IQ3_XXS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ3_XXS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ3_XXS with C
INFO:test-quants:Dequantization from random f16 data as IQ3_XXS matches exactly ✅
INFO:test-quants:Testing IQ3_S
DEBUG:test-quants:Quantizing to IQ3_S with C
DEBUG:test-quants:Dequantizing from IQ3_S with Python
DEBUG:test-quants:Dequantizing from IQ3_S with C
INFO:test-quants:Dequantization from IQ3_S matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ3_S with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ3_S with C
INFO:test-quants:Dequantization from random f16 data as IQ3_S matches exactly ✅
INFO:test-quants:Testing IQ1_S
DEBUG:test-quants:Quantizing to IQ1_S with C
DEBUG:test-quants:Dequantizing from IQ1_S with Python
DEBUG:test-quants:Dequantizing from IQ1_S with C
INFO:test-quants:Dequantization from IQ1_S matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ1_S with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ1_S with C
INFO:test-quants:Dequantization from random f16 data as IQ1_S matches exactly ✅
INFO:test-quants:Testing IQ1_M
DEBUG:test-quants:Quantizing to IQ1_M with C
DEBUG:test-quants:Dequantizing from IQ1_M with Python
DEBUG:test-quants:Dequantizing from IQ1_M with C
INFO:test-quants:Dequantization from IQ1_M matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ1_M with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ1_M with C
INFO:test-quants:Dequantization from random f16 data as IQ1_M matches exactly ✅
INFO:test-quants:Testing IQ4_NL
DEBUG:test-quants:Quantizing to IQ4_NL with C
DEBUG:test-quants:Dequantizing from IQ4_NL with Python
DEBUG:test-quants:Dequantizing from IQ4_NL with C
INFO:test-quants:Dequantization from IQ4_NL matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ4_NL with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ4_NL with C
INFO:test-quants:Dequantization from random f16 data as IQ4_NL matches exactly ✅
INFO:test-quants:Testing IQ4_XS
DEBUG:test-quants:Quantizing to IQ4_XS with C
DEBUG:test-quants:Dequantizing from IQ4_XS with Python
DEBUG:test-quants:Dequantizing from IQ4_XS with C
INFO:test-quants:Dequantization from IQ4_XS matches exactly ✅
DEBUG:test-quants:Dequantizing random f16 data as IQ4_XS with Python
DEBUG:test-quants:Dequantizing random f16 data as IQ4_XS with C
INFO:test-quants:Dequantization from random f16 data as IQ4_XS matches exactly ✅

Make sure to read the contributing guidelines before submitting a PR

compilade · 2025-08-06T03:31:24Z

gguf-py/gguf/quants.py

+        with np.errstate(divide="ignore"):
+            e = np.where(d > 0, np.floor(np.log2(d)) - 2 + 127, 0).astype(np.uint8)


It's surprising that the C implementation

llama.cpp/ggml/src/ggml-quants.c

Line 291 in 9515c61

const uint8_t e = (uint8_t) (floorf(log2f(amax)) - 2 + 127);

which doesn't check for zero before calling log2f (!) still results in the same number (which is e = 0).

Apparently, that works (checked by ensuring there's some zeroed input blocks in the tests).

log2f(0) returns -inf, which when converted to int turns to zero. However, I don't think this behavior is guaranteed by the C/C++ standard, it may be either undefined or implementation-defined behavior, so it would be better to add a check for zero. Do you want to add it here?

Do you want to add it here?

Sure. Done in 2763dc8

ngxson · 2025-08-06T20:01:13Z

Not exactly related to this PR, but mentioning this for visibility, I left out the possibility to dequantize MXFP4 in convert_hf_to_gguf because users may use it to requantize MXFP4 to other types, like Qx_K or Qx_0. This will lead to significant quality degradation, which potentially flood us with issues.

Here is the commit where I remove the code (it was copied from oai --> HF conversion code): 04cfb6d

compilade · 2025-08-06T20:46:39Z

I left out the possibility to dequantize MXFP4 in convert_hf_to_gguf because users may use it to requantize MXFP4 to other types, like Qx_K or Qx_0. This will lead to significant quality degradation, which potentially flood us with issues.

@ngxson
Makes sense. I'm considering to make repacking and type detection a bit more general in #14810, so that the auto outtype would keep types closer to the original. Otherwise adding MXFP4 to that PR would conflict with how it's handled for gpt-oss. Might need to add an API for repacking in gguf-py/gguf/quants.py to avoid recalculating scales. Not there yet, but that's what I'm planning.

gguf-py : add MXFP4 de/quantization support

141cab1

compilade added python python script changes Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes labels Aug 6, 2025

compilade commented Aug 6, 2025

View reviewed changes

CISC approved these changes Aug 6, 2025

View reviewed changes

CISC mentioned this pull request Aug 6, 2025

[question/bug] quantized sizes. #15117

Open

ggml-quants : handle zero amax for MXFP4

2763dc8

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 6, 2025

slaren approved these changes Aug 7, 2025

View reviewed changes

compilade merged commit e54d41b into master Aug 8, 2025
49 checks passed

compilade mentioned this pull request Aug 9, 2025

convert : handle pre-quantized models #14810

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gguf-py : add Numpy MXFP4 de/quantization support #15111

gguf-py : add Numpy MXFP4 de/quantization support #15111

compilade commented Aug 6, 2025

Uh oh!

compilade Aug 6, 2025 •

edited

Loading

Uh oh!

slaren Aug 6, 2025

Uh oh!

compilade Aug 6, 2025

Uh oh!

ngxson commented Aug 6, 2025

Uh oh!

compilade commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

		with np.errstate(divide="ignore"):
		e = np.where(d > 0, np.floor(np.log2(d)) - 2 + 127, 0).astype(np.uint8)

gguf-py : add Numpy MXFP4 de/quantization support #15111

gguf-py : add Numpy MXFP4 de/quantization support #15111

Conversation

compilade commented Aug 6, 2025

Uh oh!

compilade Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slaren Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

compilade Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Aug 6, 2025

Uh oh!

compilade commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

compilade Aug 6, 2025 •

edited

Loading