[LinkerWrapper] Always pass `-flto` if the linker supports it #102972

jhuber6 · 2024-08-12T20:12:35Z

Summary;
Now that we use the linker to do LTO / device linking, we need to inform
the clang invocation to use -flto so it forwards arguments like
-On correctly.

llvmbot · 2024-08-12T20:13:05Z

@llvm/pr-subscribers-offload
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary;
Now that we use the linker to do LTO / device linking, we need to inform
the clang invocation to use -flto so it forwards arguments like
-On correctly.

Full diff: https://github.com/llvm/llvm-project/pull/102972.diff

2 Files Affected:

(modified) clang/test/Driver/linker-wrapper.c (+10-10)
(modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+1)

diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index e70715d2a9bd7e..068ea2d7d3c663 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -21,7 +21,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=NVPTX-LINK
 
-// NVPTX-LINK: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 {{.*}}.o {{.*}}.o
+// NVPTX-LINK: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 -flto {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
@@ -30,7 +30,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --device-debug -O0 \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=NVPTX-LINK-DEBUG
 
-// NVPTX-LINK-DEBUG: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 {{.*}}.o {{.*}}.o -g 
+// NVPTX-LINK-DEBUG: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 -O2 -flto {{.*}}.o {{.*}}.o -g 
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 \
@@ -39,7 +39,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LINK
 
-// AMDGPU-LINK: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMDGPU-LINK: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.amdgpu.bc,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx1030 \
@@ -48,7 +48,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --save-temps -O2 \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LTO-TEMPS
 
-// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -Wl,--no-undefined {{.*}}.o -save-temps
+// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -flto -Wl,--no-undefined {{.*}}.o -save-temps
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
@@ -59,7 +59,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN:   --linker-path=/usr/bin/ld.lld --whole-archive %t.a --no-whole-archive \
 // RUN:   %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CPU-LINK
 
-// CPU-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu -march=native -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o -Wl,-Bsymbolic -shared -Wl,--whole-archive {{.*}}.a -Wl,--no-whole-archive
+// CPU-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu -march=native -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o -Wl,-Bsymbolic -shared -Wl,--whole-archive {{.*}}.a -Wl,--no-whole-archive
 
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o
 // RUN: clang-linker-wrapper --dry-run --host-triple=x86_64-unknown-linux-gnu -mllvm -openmp-opt-disable \
@@ -148,7 +148,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --clang-backend \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CLANG-BACKEND
 
-// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o
+// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -flto -Wl,--no-undefined {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
@@ -171,8 +171,8 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t-on.o %t-off.o %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=AMD-TARGET-ID
 
-// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
-// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack- -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack- -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t-lib.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=generic
@@ -187,8 +187,8 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t1.o %t2.o %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=ARCH-ALL
 
-// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
-// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 52e6809a122706..9fea1fdcd5fb46 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -527,6 +527,7 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
 
   // Forward all of the `--offload-opt` and similar options to the device.
   if (linkerSupportsLTO(Args)) {
+    CmdArgs.push_back("-flto");
     for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
       CmdArgs.append(
           {"-Xlinker",

jdenny-ornl · 2024-08-13T14:49:11Z

Seems like it does what it intends to do. Thanks for working on it.

However, there's a side effect. Now that -O1 gets passed along, sometimes it triggers an assert fail for AMD GPU:

ld.lld: /tmp/llvm/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp:151: virtual bool llvm::AMDGPUResourceUsageAnalysis::runOnModule(llvm::Module&): Assertion `MF && "function must have been generated already"' failed.

On my AMD GPU test system, I see new test fails. They all use -O1 and fail that assert:

Failed Tests (4):
  libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_amdgpu.c
  libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_mixed_amdgpu.c
  libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51781.c
  libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51982.c

jhuber6 · 2024-08-13T15:13:14Z

I'm very familiar with that error message, it's #64863. I guess I should just manually pass the optimization level at O2 for now.

jdenny-ornl · 2024-08-13T15:24:38Z

I guess I should just manually pass the optimization level at O2 for now.

Would -O3 still pass through?

jhuber6 · 2024-08-13T15:25:14Z

I guess I should just manually pass the optimization level at O2 for now.

Would -O3 still pass through?

Should, the behavior that was apparently passing was using -O2.

Summary; Now that we use the linker to do LTO / device linking, we need to inform the `clang` invocation to use `-flto` so it forwards arguments like `-On` correctly.

jhuber6 · 2024-08-13T15:37:01Z

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

github-actions · 2024-08-13T15:40:23Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jhuber6 · 2024-08-13T15:41:31Z

We really need to turn off clang-format for the tests, ignoring it.

jdenny-ornl · 2024-08-13T15:50:25Z

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

So you just changed the tests not to use -O1? Doesn't this patch then represent a regression in what's supported?

(Rewriting history with force pushes make it harder to quickly spot what's changed since I looked before. Can you not push new commits instead?)

jdenny-ornl · 2024-08-13T15:50:46Z

offload/test/jit/empty_kernel_lvl2.c

 // RUN: env LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE=%t.pre.ll     \
 // RUN:     LIBOMPTARGET_JIT_SKIP_OPT=true                   \
 // RUN:     %libomptarget-run-generic
-// TODO:


Unrelated change?

Yeah, can remove it if needed.

Leave the TODOs please.

jhuber6 · 2024-08-13T15:53:17Z

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

So you just changed the tests not to use -O1? Doesn't this patch then represent a regression in what's supported?

I wouldn't consider something "supported" if it only worked because we overruled the user's -O1 flag and replaced it with -O2.

(Rewriting history with force pushes make it harder to quickly spot what's changed since I looked before. Can you not push new commits instead?)

Dealing with multiple commits is a pain, I miss Phabricator's diff based review system. Only the tests changed.

jdenny-ornl · 2024-08-13T16:00:25Z

Updated to pass -O2 instead, this was the actual behavior that passed, so it's mostly not a functional change.

So you just changed the tests not to use -O1? Doesn't this patch then represent a regression in what's supported?

I wouldn't consider something "supported" if it only worked because we overruled the user's -O1 flag and replaced it with -O2.

Cases that used to compile successfully (even if more optimized than expected) will no longer. Sounds like a regression.

jdoerfert

LG, but keep the TODOs.

jhuber6 · 2024-08-13T16:23:42Z

Cases that used to compile successfully (even if more optimized than expected) will no longer. Sounds like a regression.

It kind of is, but I don't think we should work around backend bugs in the clang driver.

ajarmusch · 2024-08-14T01:28:28Z

This caused a nvlink error with clang. This was found by a OpenMP CI https://gitlab.e4s.io/uo-public/llvm-openmp-offloading/-/jobs/301520

jhuber6 · 2024-08-14T01:45:11Z

This caused a nvlink error with clang. This was found by a OpenMP CI https://gitlab.e4s.io/uo-public/llvm-openmp-offloading/-/jobs/301520

Probably an LTO flag I forgot to handle correctly, can you compile it with -v to show what it's passing?

jhuber6 requested review from doru1004, jdenny-ornl, jdoerfert, jplehr and saiislam August 12, 2024 20:12

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Aug 12, 2024

[LinkerWrapper] Always pass -flto if the linker supports it

4e2834e

Summary; Now that we use the linker to do LTO / device linking, we need to inform the `clang` invocation to use `-flto` so it forwards arguments like `-On` correctly.

jhuber6 force-pushed the lto branch from b89e843 to 4e2834e Compare August 13, 2024 15:36

llvmbot added backend:AMDGPU offload labels Aug 13, 2024

jdenny-ornl reviewed Aug 13, 2024

View reviewed changes

update

b0cbe64

jdoerfert approved these changes Aug 13, 2024

View reviewed changes

jhuber6 merged commit dcc27ea into llvm:main Aug 13, 2024

jhuber6 deleted the lto branch September 23, 2024 13:26

[LinkerWrapper] Always pass -flto if the linker supports it #102972

[LinkerWrapper] Always pass -flto if the linker supports it #102972

Uh oh!

Conversation

jhuber6 commented Aug 12, 2024

Uh oh!

llvmbot commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdenny-ornl commented Aug 13, 2024

Uh oh!

jhuber6 commented Aug 13, 2024

Uh oh!

jdenny-ornl commented Aug 13, 2024

Uh oh!

jhuber6 commented Aug 13, 2024

Uh oh!

jhuber6 commented Aug 13, 2024

Uh oh!

github-actions bot commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhuber6 commented Aug 13, 2024

Uh oh!

jdenny-ornl commented Aug 13, 2024

Uh oh!

jdenny-ornl Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhuber6 Aug 13, 2024

Choose a reason for hiding this comment

Uh oh!

jdoerfert Aug 13, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Aug 13, 2024

Uh oh!

jdenny-ornl commented Aug 13, 2024

Uh oh!

jdoerfert left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Aug 13, 2024

Uh oh!

ajarmusch commented Aug 14, 2024

Uh oh!

jhuber6 commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[LinkerWrapper] Always pass `-flto` if the linker supports it #102972

[LinkerWrapper] Always pass `-flto` if the linker supports it #102972

llvmbot commented Aug 12, 2024 •

edited

Loading

github-actions bot commented Aug 13, 2024 •

edited

Loading

jdenny-ornl Aug 13, 2024 •

edited

Loading