-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[CUDA/HIP] fix propagate -cuid to a host-only compilation. #111650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA/HIP] fix propagate -cuid to a host-only compilation. #111650
Conversation
|
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Pankaj Dwivedi (PankajDwivedi-25) Changesbuild failure is observed in the hip test after patch #107483, which complains about a linking error. "/usr/bin/ld: /opt/rocm/share/hip/samples/2_Cookbook/16_assembly_to_executable/build/square_asm.out: hidden symbol `__hip_gpubin_handle_b21320dde8d193a' isn't defined Full diff: https://github.com/llvm/llvm-project/pull/111650.diff 1 Files Affected:
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index a5d43bdac23735..d6cdc40b0a292e 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3073,7 +3073,6 @@ class OffloadingActionBuilder final {
CUID = llvm::utohexstr(Hash.low(), /*LowerCase=*/true);
}
}
- IA->setId(CUID);
if (CompileHostOnly)
return ABRT_Success;
@@ -3081,6 +3080,8 @@ class OffloadingActionBuilder final {
// Replicate inputs for each GPU architecture.
auto Ty = IA->getType() == types::TY_HIP ? types::TY_HIP_DEVICE
: types::TY_CUDA_DEVICE;
+ IA->setId(CUID);
+
for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
CudaDeviceActions.push_back(
C.MakeAction<InputAction>(IA->getInputArg(), Ty, IA->getId()));
|
You can test this locally with the following command:git-clang-format --diff 1312369afbeb2083094b3d34a88c346b22e86971 7fab0002d9febf440545043d8782d7243a03f17b --extensions cpp -- clang/lib/Driver/Driver.cppView the diff from clang-format here.diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index d6cdc40b0a..d9e1ff34f5 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3081,7 +3081,7 @@ class OffloadingActionBuilder final {
auto Ty = IA->getType() == types::TY_HIP ? types::TY_HIP_DEVICE
: types::TY_CUDA_DEVICE;
IA->setId(CUID);
-
+
for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
CudaDeviceActions.push_back(
C.MakeAction<InputAction>(IA->getInputArg(), Ty, IA->getId()));
|
|
This does not seem to be the right fix. I tends to think the test https://github.com/ROCm/hip-tests/tree/amd-staging/samples/2_Cookbook/16_assembly_to_executable needs fix. Since it does not expect host-only compilation to use CUID, it should add |
I agree. The reason I did the change is that we have builds where we do host and per-GPU sub-compilations separately, but they all need to be in sync. Another way to look at it is that sub-compilation of the same TU file, with the same options should produce the same results if they were done as part of a combined compilation or partial compilation done with |
Yes, It is the same test case that is failing. As you are suggesting these changes I have to make in CMake file only ? |
Yes |
Thank you, it's working. |
|
Looks like this issue is still reproducible in latest build. I'm wondering why then it disappeared the previously.
|
In both cases there's some sort of inconsistency in your build. Find the compilation which creates the object file which refers to the missing symbol, and then we can try figuring out how we ended up not setting CUID (or setting it when we should not have). |
|
Both are separate builds here. 'By Inconsistency' you mean both cases can not be present in same build? |
|
I'm saying is that whatever refers to the fatbin handle has to have the same idea about the name of that handle as the object file that provides that handle. For that both have to be compiled with the same |
|
Great, looks it is part of another bug propagated. Closing this PR now. Thank you for your feedback. |
build failure is observed in the hip test after patch #107483, which complains about a linking error.
"/usr/bin/ld: /opt/rocm/share/hip/samples/2_Cookbook/16_assembly_to_executable/build/square_asm.out: hidden symbol `__hip_gpubin_handle_b21320dde8d193a' isn't defined
/usr/bin/ld: final link failed: bad value"