Skip to content

Conversation

@PankajDwivedi-25
Copy link
Contributor

build failure is observed in the hip test after patch #107483, which complains about a linking error.

"/usr/bin/ld: /opt/rocm/share/hip/samples/2_Cookbook/16_assembly_to_executable/build/square_asm.out: hidden symbol `__hip_gpubin_handle_b21320dde8d193a' isn't defined
/usr/bin/ld: final link failed: bad value"

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Oct 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Pankaj Dwivedi (PankajDwivedi-25)

Changes

build failure is observed in the hip test after patch #107483, which complains about a linking error.

"/usr/bin/ld: /opt/rocm/share/hip/samples/2_Cookbook/16_assembly_to_executable/build/square_asm.out: hidden symbol `__hip_gpubin_handle_b21320dde8d193a' isn't defined
/usr/bin/ld: final link failed: bad value"


Full diff: https://github.com/llvm/llvm-project/pull/111650.diff

1 Files Affected:

  • (modified) clang/lib/Driver/Driver.cpp (+2-1)
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index a5d43bdac23735..d6cdc40b0a292e 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3073,7 +3073,6 @@ class OffloadingActionBuilder final {
             CUID = llvm::utohexstr(Hash.low(), /*LowerCase=*/true);
           }
         }
-        IA->setId(CUID);
 
         if (CompileHostOnly)
           return ABRT_Success;
@@ -3081,6 +3080,8 @@ class OffloadingActionBuilder final {
         // Replicate inputs for each GPU architecture.
         auto Ty = IA->getType() == types::TY_HIP ? types::TY_HIP_DEVICE
                                                  : types::TY_CUDA_DEVICE;
+        IA->setId(CUID);
+        
         for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
           CudaDeviceActions.push_back(
               C.MakeAction<InputAction>(IA->getInputArg(), Ty, IA->getId()));

@github-actions
Copy link

github-actions bot commented Oct 9, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 1312369afbeb2083094b3d34a88c346b22e86971 7fab0002d9febf440545043d8782d7243a03f17b --extensions cpp -- clang/lib/Driver/Driver.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index d6cdc40b0a..d9e1ff34f5 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3081,7 +3081,7 @@ class OffloadingActionBuilder final {
         auto Ty = IA->getType() == types::TY_HIP ? types::TY_HIP_DEVICE
                                                  : types::TY_CUDA_DEVICE;
         IA->setId(CUID);
-        
+
         for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
           CudaDeviceActions.push_back(
               C.MakeAction<InputAction>(IA->getInputArg(), Ty, IA->getId()));

@yxsamliu
Copy link
Collaborator

yxsamliu commented Oct 9, 2024

This does not seem to be the right fix. I tends to think the test https://github.com/ROCm/hip-tests/tree/amd-staging/samples/2_Cookbook/16_assembly_to_executable needs fix. Since it does not expect host-only compilation to use CUID, it should add -fuse-cuid=none to the host-only compilation.

@Artem-B
Copy link
Member

Artem-B commented Oct 9, 2024

This does not seem to be the right fix. I tends to think the test https://github.com/ROCm/hip-tests/tree/amd-staging/samples/2_Cookbook/16_assembly_to_executable needs fix. Since it does not expect host-only compilation to use CUID, it should add -fuse-cuid=none to the host-only compilation.

I agree. The reason I did the change is that we have builds where we do host and per-GPU sub-compilations separately, but they all need to be in sync.

Another way to look at it is that sub-compilation of the same TU file, with the same options should produce the same results if they were done as part of a combined compilation or partial compilation done with --cuda-host-only/--cuda-device-only.
Not setting/propagating CUID for the host-only compilation makes the host build inconsistent.

@PankajDwivedi-25
Copy link
Contributor Author

This does not seem to be the right fix. I tends to think the test https://github.com/ROCm/hip-tests/tree/amd-staging/samples/2_Cookbook/16_assembly_to_executable needs fix. Since it does not expect host-only compilation to use CUID, it should add -fuse-cuid=none to the host-only compilation.

Yes, It is the same test case that is failing. As you are suggesting these changes I have to make in CMake file only ?

@yxsamliu
Copy link
Collaborator

This does not seem to be the right fix. I tends to think the test https://github.com/ROCm/hip-tests/tree/amd-staging/samples/2_Cookbook/16_assembly_to_executable needs fix. Since it does not expect host-only compilation to use CUID, it should add -fuse-cuid=none to the host-only compilation.

Yes, It is the same test case that is failing. As you are suggesting these changes I have to make in CMake file only ?

Yes

@PankajDwivedi-25
Copy link
Contributor Author

This does not seem to be the right fix. I tends to think the test https://github.com/ROCm/hip-tests/tree/amd-staging/samples/2_Cookbook/16_assembly_to_executable needs fix. Since it does not expect host-only compilation to use CUID, it should add -fuse-cuid=none to the host-only compilation.

Yes, It is the same test case that is failing. As you are suggesting these changes I have to make in CMake file only ?

Yes

Thank you, it's working.
There is another test case that required a similar fix https://github.com/ROCm/hip-tests/tree/amd-staging/samples/2_Cookbook/17_llvm_ir_to_executable.

@PankajDwivedi-25
Copy link
Contributor Author

PankajDwivedi-25 commented Nov 1, 2024

Looks like this issue is still reproducible in latest build. I'm wondering why then it disappeared the previously.
I'm seeing a minor difference now it is complaining about __hip_fatbin earlier it was __hip_gpubin_handle_2ba9067058fbe93a.

ld.lld: error: undefined symbol: __hip_fatbin

referenced by square.cpp
/work/downstream/hip-tests/samples/2_Cookbook/16_assembly_to_executable/build/square_host.o:(__hip_fatbin_wrapper)
did you mean: _hip_fatbin
defined in: /work/downstream/hip-tests/samples/2_Cookbook/16_assembly_to_executable/build/square_device.o
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
failed to execute:/opt/rocm-6.3.0/lib/llvm/bin/clang++ --driver-mode=g++ -O3 --hip-link /work/downstream/hip-tests/samples/2_Cookbook/16_assembly_to_executable/build/square_host.o /work/downstream/hip-tests/samples/2_Cookbook/16_assembly_to_executable/build/square_device.o -o "/work/downstream/hip-tests/samples/2_Cookbook/16_assembly_to_executable/build/square_asm.out" -O3 -parallel-jobs=4

@Artem-B
Copy link
Member

Artem-B commented Nov 1, 2024

now it is complaining about __hip_fatbin earlier it was __hip_gpubin_handle_2ba9067058fbe93a.

In both cases there's some sort of inconsistency in your build. Find the compilation which creates the object file which refers to the missing symbol, and then we can try figuring out how we ended up not setting CUID (or setting it when we should not have).
Without these details, it's hard to tell what's going on in your case.

@PankajDwivedi-25
Copy link
Contributor Author

Both are separate builds here. 'By Inconsistency' you mean both cases can not be present in same build?

@Artem-B
Copy link
Member

Artem-B commented Nov 4, 2024

I'm saying is that whatever refers to the fatbin handle has to have the same idea about the name of that handle as the object file that provides that handle. For that both have to be compiled with the same cuid. Normally, it's clang driver that does that all under the hood. If the build does host and device compilation separately, it's up to the build owner to get those parts built with the same cuid.

@PankajDwivedi-25
Copy link
Contributor Author

Great, looks it is part of another bug propagated.
Including fix path resolved the issue.

Closing this PR now.

Thank you for your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants