-
Notifications
You must be signed in to change notification settings - Fork 802
[SYCL][HIP][libclc] Wire up AMD half support #11626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for your extensions and support ! |
Looks good, I'll wait for the CI testing to complete though. |
Our current squash policy is going to mangle authorship of this patchset. What are we going to do about it |
You can see from e.g. the log from the unexpected passing test, reduction_nd_ext_half.cpp, that the CI run was skipping all the tests requiring |
@@ -6,9 +6,6 @@ | |||
// work group size not bigger than 1`. | |||
// XFAIL: hip_nvidia | |||
|
|||
// Incorrect result on AMD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to remove XFAIL when it was only unexpectedly passing because it was being skipped because the CI device didn't have the fp16 aspect: it will have the aspect once you merge the unified-runtime patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I managed to confused myself there.
We still need to assume though that the user might use an older version of the ur repo after the merge of this PR, so it's better to simply make the test always require fp16 support (i.e. by the comment ("// REQUIRES: aspect-fp16")) and also leave the "XFAIL" since the result is always wrong on AMD GPUs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense
#define __CLC_BUILTIN_F __CLC_XCONCAT(__CLC_BUILTIN, _f32) | ||
|
||
#ifdef cl_khr_fp64 | ||
#pragma OPENCL EXTENSION cl_khr_fp64 : enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably a nit but I don't understand this: why enable an OPENCL EXTENSION on a non opencl backend? Presumably the cl_khr_fp64
was already working before this patch so I don't imagine this changes any behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been resolved in internal discussions already, but I'd still like to post it here as it might be interesting for others as well.
The fact that the AMDGPU target is a non-opencl doesn't matter here.
We basically enable the OpenCL fp64 extension because libclc is compiled by an OpenCL C compiler (i.e. it needs to be compliant with the openCL language specification).
Not enabling it would lead to the fact that a fp64 literal declared inside this cl file could be compiled as a float.
This is basically where Vanilla OpenCL C deviates from the C standard.
Potentially, this could lead to unexpected behaviour in the future and therefore, it's safer to enable the extension.
For the fp16 extension that is also enabled several times in cl files by this patch, it's a different story.
Not enabling it (i.e. removing the pragmas) leads to errors at compile time with the error message that the half
type is not allowed.
It's needed to actually make it a type:
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#reinterpreting-types-using-as_type-and-as_typen
(footnote 15)
This is also why @jinz2014 added these pragmas and wrapped them inside the ifdef macro checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ping @intel/llvm-gatekeepers to get this merged |
I think this needs a review from @intel/dpcpp-l0-pi-reviewers first |
I don't see any changes to the files owned by @intel/dpcpp-l0-pi-reviewers , but I gave my provisional approval (without a review) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@smaslov-intel I had to change the UR vars in |
Enables fp16 support for AMD GPUs.
Based on Zheming's previous work: 8488
Some test cases for e.g. images were disabled since they aren't supported
Draft PR for enabling fp16 support in the unified-runtime: oneapi-src/unified-runtime#988