Skip to content

Conversation

MartinWehking
Copy link
Contributor

@MartinWehking MartinWehking commented Oct 23, 2023

Enables fp16 support for AMD GPUs.
Based on Zheming's previous work: 8488
Some test cases for e.g. images were disabled since they aren't supported

Draft PR for enabling fp16 support in the unified-runtime: oneapi-src/unified-runtime#988

@jinz2014
Copy link
Contributor

Thank you for your extensions and support !

@JackAKirk
Copy link
Contributor

JackAKirk commented Oct 23, 2023

Looks good, I'll wait for the CI testing to complete though.

@ldrumm
Copy link
Contributor

ldrumm commented Oct 23, 2023

Our current squash policy is going to mangle authorship of this patchset. What are we going to do about it

@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 23, 2023 16:22 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 23, 2023 17:30 — with GitHub Actions Inactive
@JackAKirk
Copy link
Contributor

You can see from e.g. the log from the unexpected passing test, reduction_nd_ext_half.cpp, that the CI run was skipping all the tests requiring aspect::fp16. I think that you either need to get oneapi-src/unified-runtime#988 merged first, or if there is a way of pointing the CI to that unified-runtime PR branch do that. Otherwise we can't see that any of the affected tests are passing on the CI.

@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 24, 2023 10:11 — with GitHub Actions Inactive
@@ -6,9 +6,6 @@
// work group size not bigger than 1`.
// XFAIL: hip_nvidia

// Incorrect result on AMD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to remove XFAIL when it was only unexpectedly passing because it was being skipped because the CI device didn't have the fp16 aspect: it will have the aspect once you merge the unified-runtime patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I managed to confused myself there.
We still need to assume though that the user might use an older version of the ur repo after the merge of this PR, so it's better to simply make the test always require fp16 support (i.e. by the comment ("// REQUIRES: aspect-fp16")) and also leave the "XFAIL" since the result is always wrong on AMD GPUs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense

@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 24, 2023 10:32 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 24, 2023 10:46 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 24, 2023 11:05 — with GitHub Actions Inactive
@MartinWehking MartinWehking requested a review from a team as a code owner October 24, 2023 15:09
@MartinWehking MartinWehking marked this pull request as draft October 24, 2023 15:15
@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 24, 2023 15:42 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 24, 2023 16:16 — with GitHub Actions Inactive
#define __CLC_BUILTIN_F __CLC_XCONCAT(__CLC_BUILTIN, _f32)

#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a nit but I don't understand this: why enable an OPENCL EXTENSION on a non opencl backend? Presumably the cl_khr_fp64 was already working before this patch so I don't imagine this changes any behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been resolved in internal discussions already, but I'd still like to post it here as it might be interesting for others as well.
The fact that the AMDGPU target is a non-opencl doesn't matter here.
We basically enable the OpenCL fp64 extension because libclc is compiled by an OpenCL C compiler (i.e. it needs to be compliant with the openCL language specification).
Not enabling it would lead to the fact that a fp64 literal declared inside this cl file could be compiled as a float.
This is basically where Vanilla OpenCL C deviates from the C standard.
Potentially, this could lead to unexpected behaviour in the future and therefore, it's safer to enable the extension.

For the fp16 extension that is also enabled several times in cl files by this patch, it's a different story.
Not enabling it (i.e. removing the pragmas) leads to errors at compile time with the error message that the half type is not allowed.
It's needed to actually make it a type:
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#reinterpreting-types-using-as_type-and-as_typen
(footnote 15)

This is also why @jinz2014 added these pragmas and wrapped them inside the ifdef macro checks

Copy link
Contributor

@JackAKirk JackAKirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 25, 2023 16:08 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to WindowsCILock October 25, 2023 16:46 — with GitHub Actions Inactive
@MartinWehking
Copy link
Contributor Author

ping @intel/llvm-gatekeepers to get this merged

@MartinWehking MartinWehking marked this pull request as ready for review October 26, 2023 15:12
@JackAKirk
Copy link
Contributor

ping @intel/llvm-gatekeepers to get this merged

I think this needs a review from @intel/dpcpp-l0-pi-reviewers first

@smaslov-intel
Copy link
Contributor

I don't see any changes to the files owned by @intel/dpcpp-l0-pi-reviewers , but I gave my provisional approval (without a review)

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MartinWehking
Copy link
Contributor Author

MartinWehking commented Oct 26, 2023

@smaslov-intel I had to change the UR vars in sycl/plugins/unified_runtime/CMakeLists.txt at some point to make the CI use my UR patch. This automatically requested a review from you.
I reverted those changed again in 568ea41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants