-
Notifications
You must be signed in to change notification settings - Fork 12.6k
gguf-py : add Numpy MXFP4 de/quantization support #15111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
with np.errstate(divide="ignore"): | ||
e = np.where(d > 0, np.floor(np.log2(d)) - 2 + 127, 0).astype(np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's surprising that the C implementation
llama.cpp/ggml/src/ggml-quants.c
Line 291 in 9515c61
const uint8_t e = (uint8_t) (floorf(log2f(amax)) - 2 + 127); |
which doesn't check for zero before calling log2f
(!) still results in the same number (which is e = 0
).
Apparently, that works (checked by ensuring there's some zeroed input blocks in the tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log2f(0)
returns -inf
, which when converted to int turns to zero. However, I don't think this behavior is guaranteed by the C/C++ standard, it may be either undefined or implementation-defined behavior, so it would be better to add a check for zero. Do you want to add it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add it here?
Sure. Done in 2763dc8
Not exactly related to this PR, but mentioning this for visibility, I left out the possibility to dequantize MXFP4 in Here is the commit where I remove the code (it was copied from oai --> HF conversion code): 04cfb6d |
@ngxson |
MXFP4 support was added in #15091, but Python-based (de)quantization is missing, which this should fix.
Other change: I've removed the FMA workaround for
Q4_0
andQ5_0
quantization functions, since the C versions don't use FMA since a while ago (not sure exactly when).All correctness tests pass.
Note that there's a RuntimeWarning from Numpy above, but the equality check still passes. The warning is likely caused by invalid E8M0 values from the random data which is dequantized in that test.
Full test results
Make sure to read the contributing guidelines before submitting a PR