-
Notifications
You must be signed in to change notification settings - Fork 367
Float8Tensor per row quantization pass bias to fbgemm kernel #2884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2884
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 4d70152 with merge base fbe3df9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
6935cc8 to
bacbe8c
Compare
2c1e5da to
613585d
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
613585d to
b98440e
Compare
b98440e to
6acc102
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
|
looks reasonable, can we add a test which covers the new path? |
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
6acc102 to
d5465ea
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
d5465ea to
951a49c
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
951a49c to
5860000
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
d94d663 to
0c70dee
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
0c70dee to
2b5046b
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
2b5046b to
091ad9e
Compare
091ad9e to
94aec1a
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
94aec1a to
5f8d5e2
Compare
drisspg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a test, this seems like it was a bug but wasn't caught?
|
@drisspg yeah test is this one:
but it's probably better to have a more exhaustive one for all options (auto and torch) as well it is a performance "bug" I think, the effect is that we are not using the most efficient path if there is a bias |
| "torch.ops.triton.quantize_fp8_row.default", 1 | ||
| ).check_count("torch.ops.fbgemm.f8f8bf16_rowwise.default", 1).run(code[0]) | ||
| ).check_count("torch.ops.fbgemm.f8f8bf16_rowwise.default", 1).check_not( | ||
| "triton_poi_fused_add_0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @drisspg we explicitly test that triton_poi_fused_add_0 is not generated, to make sure there is no additional res + bias there in this code path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not safe to depend on inductor generated kernel names such as triton_poi_fused_add_0, I think it would be better to ensure that there are no additional kernels called after torch.ops.fbgemm.f8f8bf16_rowwise.default is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
5f8d5e2 to
4d70152
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
Stacked PRs:
Float8Tensor per row quantization pass bias to fbgemm kernel
Summary:
Previously bias is not passed to fbgemm kernel for float8 per row quant,
this PR adds it.
Difference is we should have a faster float8 per row quantized kernel, without changing
numerics or other things.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: