Skip to content

Conversation

@jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Aug 12, 2024

Summary;
Now that we use the linker to do LTO / device linking, we need to inform
the clang invocation to use -flto so it forwards arguments like
-On correctly.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Aug 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2024

@llvm/pr-subscribers-offload
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary;
Now that we use the linker to do LTO / device linking, we need to inform
the clang invocation to use -flto so it forwards arguments like
-On correctly.


Full diff: https://github.com/llvm/llvm-project/pull/102972.diff

2 Files Affected:

  • (modified) clang/test/Driver/linker-wrapper.c (+10-10)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+1)
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index e70715d2a9bd7e..068ea2d7d3c663 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -21,7 +21,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=NVPTX-LINK
 
-// NVPTX-LINK: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 {{.*}}.o {{.*}}.o
+// NVPTX-LINK: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 -flto {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
@@ -30,7 +30,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --device-debug -O0 \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=NVPTX-LINK-DEBUG
 
-// NVPTX-LINK-DEBUG: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 {{.*}}.o {{.*}}.o -g 
+// NVPTX-LINK-DEBUG: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 -flto {{.*}}.o {{.*}}.o -g 
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 \
@@ -39,7 +39,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LINK
 
-// AMDGPU-LINK: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMDGPU-LINK: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.amdgpu.bc,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx1030 \
@@ -48,7 +48,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --save-temps -O2 \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LTO-TEMPS
 
-// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -Wl,--no-undefined {{.*}}.o -save-temps
+// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -flto -Wl,--no-undefined {{.*}}.o -save-temps
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
@@ -59,7 +59,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN:   --linker-path=/usr/bin/ld.lld --whole-archive %t.a --no-whole-archive \
 // RUN:   %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CPU-LINK
 
-// CPU-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu -march=native -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o -Wl,-Bsymbolic -shared -Wl,--whole-archive {{.*}}.a -Wl,--no-whole-archive
+// CPU-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu -march=native -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o -Wl,-Bsymbolic -shared -Wl,--whole-archive {{.*}}.a -Wl,--no-whole-archive
 
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o
 // RUN: clang-linker-wrapper --dry-run --host-triple=x86_64-unknown-linux-gnu -mllvm -openmp-opt-disable \
@@ -148,7 +148,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --clang-backend \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CLANG-BACKEND
 
-// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o
+// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -flto -Wl,--no-undefined {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
@@ -171,8 +171,8 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t-on.o %t-off.o %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=AMD-TARGET-ID
 
-// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
-// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack- -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack- -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t-lib.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=generic
@@ -187,8 +187,8 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t1.o %t2.o %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=ARCH-ALL
 
-// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
-// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 52e6809a122706..9fea1fdcd5fb46 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -527,6 +527,7 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
 
   // Forward all of the `--offload-opt` and similar options to the device.
   if (linkerSupportsLTO(Args)) {
+    CmdArgs.push_back("-flto");
     for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
       CmdArgs.append(
           {"-Xlinker",

@jdenny-ornl
Copy link
Collaborator

Seems like it does what it intends to do. Thanks for working on it.

However, there's a side effect. Now that -O1 gets passed along, sometimes it triggers an assert fail for AMD GPU:

ld.lld: /tmp/llvm/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp:151: virtual bool llvm::AMDGPUResourceUsageAnalysis::runOnModule(llvm::Module&): Assertion `MF && "function must have been generated already"' failed.

On my AMD GPU test system, I see new test fails. They all use -O1 and fail that assert:

Failed Tests (4):
  libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_amdgpu.c
  libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_mixed_amdgpu.c
  libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51781.c
  libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51982.c

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 13, 2024

I'm very familiar with that error message, it's #64863. I guess I should just manually pass the optimization level at O2 for now.

@jdenny-ornl
Copy link
Collaborator

I guess I should just manually pass the optimization level at O2 for now.

Would -O3 still pass through?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 13, 2024

I guess I should just manually pass the optimization level at O2 for now.

Would -O3 still pass through?

Should, the behavior that was apparently passing was using -O2.

Summary;
Now that we use the linker to do LTO / device linking, we need to inform
the `clang` invocation to use `-flto` so it forwards arguments like
`-On` correctly.
@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 13, 2024

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

@github-actions
Copy link

github-actions bot commented Aug 13, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 13, 2024

We really need to turn off clang-format for the tests, ignoring it.

@jdenny-ornl
Copy link
Collaborator

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

So you just changed the tests not to use -O1? Doesn't this patch then represent a regression in what's supported?

(Rewriting history with force pushes make it harder to quickly spot what's changed since I looked before. Can you not push new commits instead?)

// RUN: env LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE=%t.pre.ll \
// RUN: LIBOMPTARGET_JIT_SKIP_OPT=true \
// RUN: %libomptarget-run-generic
// TODO:
Copy link
Collaborator

@jdenny-ornl jdenny-ornl Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can remove it if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the TODOs please.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 13, 2024

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

So you just changed the tests not to use -O1? Doesn't this patch then represent a regression in what's supported?

I wouldn't consider something "supported" if it only worked because we overruled the user's -O1 flag and replaced it with -O2.

(Rewriting history with force pushes make it harder to quickly spot what's changed since I looked before. Can you not push new commits instead?)

Dealing with multiple commits is a pain, I miss Phabricator's diff based review system. Only the tests changed.

@jdenny-ornl
Copy link
Collaborator

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

So you just changed the tests not to use -O1? Doesn't this patch then represent a regression in what's supported?

I wouldn't consider something "supported" if it only worked because we overruled the user's -O1 flag and replaced it with -O2.

Cases that used to compile successfully (even if more optimized than expected) will no longer. Sounds like a regression.

Copy link
Member

@jdoerfert jdoerfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, but keep the TODOs.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 13, 2024

Cases that used to compile successfully (even if more optimized than expected) will no longer. Sounds like a regression.

It kind of is, but I don't think we should work around backend bugs in the clang driver.

@jhuber6 jhuber6 merged commit dcc27ea into llvm:main Aug 13, 2024
@ajarmusch
Copy link
Contributor

This caused a nvlink error with clang. This was found by a OpenMP CI https://gitlab.e4s.io/uo-public/llvm-openmp-offloading/-/jobs/301520

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 14, 2024

This caused a nvlink error with clang. This was found by a OpenMP CI https://gitlab.e4s.io/uo-public/llvm-openmp-offloading/-/jobs/301520

Probably an LTO flag I forgot to handle correctly, can you compile it with -v to show what it's passing?

@jhuber6 jhuber6 deleted the lto branch September 23, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category offload

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants