Skip to content

Conversation

mstorsjo
Copy link
Member

@mstorsjo mstorsjo commented Nov 5, 2023

Normally, this shouldn't happen. It can happen in exceptional circumstances, if the compiled output of a bitcode object file references symbols that weren't listed as undefined in the bitcode object file itself.

This can at least happen in the following cases:

  • A custom SEH personality is set via asm()
  • Compiler generated calls to builtin helper functions, such as __chkstk, or __rt_sdiv on arm

Both of these produce undefined references to symbols after compiling to a regular object file, that aren't visible on the level of the IR object file.

This is only an issue if the referenced symbols are provided as LTO objects themselves; loading regular object files after the LTO compilation works fine.

Custom SEH personalities are rare, but one CRT startup file in mingw-w64 does this. The referenced pesonality function is usually provided via an import library, but for WinStore targets, a local dummy reimplementation in C is used, which can be an LTO object.

Generated calls to builtins is very common, but the builtins aren't usually provided as LTO objects (compiler-rt's builtins explicitly pass -fno-lto when building), and many of the builtins are provided as raw .S assembly files, which don't get built as LTO objects anyway, even if built with -flto.

If hitting this unusual, but possible, situation, error out cleanly with a clear message rather than crashing.

…n LTO compilation

Normally, this shouldn't happen. It can happen in exceptional
circumstances, if the compiled output of a bitcode object file
references symbols that weren't listed as undefined in the bitcode
object file itself.

This can at least happen in the following cases:
- A custom SEH personality is set via asm()
- Compiler generated calls to builtin helper functions, such as
  __chkstk, or __rt_sdiv on arm

Both of these produce undefined references to symbols after
compiling to a regular object file, that aren't visible on the
level of the IR object file.

This is only an issue if the referenced symbols are provided as
LTO objects themselves; loading regular object files after the
LTO compilation works fine.

Custom SEH personalities are rare, but one CRT startup file in
mingw-w64 does this. The referenced pesonality function is usually
provided via an import library, but for WinStore targets, a local
dummy reimplementation in C is used, which can be an LTO object.

Generated calls to builtins is very common, but the builtins aren't
usually provided as LTO objects (compiler-rt's builtins explicitly
pass -fno-lto when building), and many of the builtins are provided
as raw .S assembly files, which don't get built as LTO objects
anyway, even if built with -flto.

If hitting this unusual, but possible, situation, error out cleanly
with a clear message rather than crashing.
@mstorsjo mstorsjo added lld:COFF platform:windows LTO Link time optimization (regular/full LTO or ThinLTO) labels Nov 5, 2023
@mstorsjo mstorsjo requested review from rnk and MaskRay November 5, 2023 22:40
@llvmbot llvmbot added the lld label Nov 5, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2023

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lto

Author: Martin Storsjö (mstorsjo)

Changes

Normally, this shouldn't happen. It can happen in exceptional circumstances, if the compiled output of a bitcode object file references symbols that weren't listed as undefined in the bitcode object file itself.

This can at least happen in the following cases:

  • A custom SEH personality is set via asm()
  • Compiler generated calls to builtin helper functions, such as __chkstk, or __rt_sdiv on arm

Both of these produce undefined references to symbols after compiling to a regular object file, that aren't visible on the level of the IR object file.

This is only an issue if the referenced symbols are provided as LTO objects themselves; loading regular object files after the LTO compilation works fine.

Custom SEH personalities are rare, but one CRT startup file in mingw-w64 does this. The referenced pesonality function is usually provided via an import library, but for WinStore targets, a local dummy reimplementation in C is used, which can be an LTO object.

Generated calls to builtins is very common, but the builtins aren't usually provided as LTO objects (compiler-rt's builtins explicitly pass -fno-lto when building), and many of the builtins are provided as raw .S assembly files, which don't get built as LTO objects anyway, even if built with -flto.

If hitting this unusual, but possible, situation, error out cleanly with a clear message rather than crashing.


Full diff: https://github.com/llvm/llvm-project/pull/71337.diff

4 Files Affected:

  • (modified) lld/COFF/SymbolTable.cpp (+5)
  • (modified) lld/COFF/SymbolTable.h (+1)
  • (added) lld/test/COFF/lto-late-arm.ll (+39)
  • (added) lld/test/COFF/lto-late-personality.ll (+48)
diff --git a/lld/COFF/SymbolTable.cpp b/lld/COFF/SymbolTable.cpp
index d9673e159769fbd..15c76461a84a92c 100644
--- a/lld/COFF/SymbolTable.cpp
+++ b/lld/COFF/SymbolTable.cpp
@@ -61,6 +61,10 @@ void SymbolTable::addFile(InputFile *file) {
     if (auto *f = dyn_cast<ObjFile>(file)) {
       ctx.objFileInstances.push_back(f);
     } else if (auto *f = dyn_cast<BitcodeFile>(file)) {
+      if (ltoCompilationDone) {
+        error("LTO object file " + toString(file) + " linked in after "
+              "doing LTO compilation.");
+      }
       ctx.bitcodeFileInstances.push_back(f);
     } else if (auto *f = dyn_cast<ImportFile>(file)) {
       ctx.importFileInstances.push_back(f);
@@ -876,6 +880,7 @@ Symbol *SymbolTable::addUndefined(StringRef name) {
 }
 
 void SymbolTable::compileBitcodeFiles() {
+  ltoCompilationDone = true;
   if (ctx.bitcodeFileInstances.empty())
     return;
 
diff --git a/lld/COFF/SymbolTable.h b/lld/COFF/SymbolTable.h
index 511e60d1e3a0873..fc623c2840d401b 100644
--- a/lld/COFF/SymbolTable.h
+++ b/lld/COFF/SymbolTable.h
@@ -133,6 +133,7 @@ class SymbolTable {
 
   llvm::DenseMap<llvm::CachedHashStringRef, Symbol *> symMap;
   std::unique_ptr<BitcodeCompiler> lto;
+  bool ltoCompilationDone = false;
 
   COFFLinkerContext &ctx;
 };
diff --git a/lld/test/COFF/lto-late-arm.ll b/lld/test/COFF/lto-late-arm.ll
new file mode 100644
index 000000000000000..0e2f148ef74c6c8
--- /dev/null
+++ b/lld/test/COFF/lto-late-arm.ll
@@ -0,0 +1,39 @@
+; REQUIRES: arm
+
+;; A bitcode file can generate undefined references to symbols that weren't
+;; listed as undefined on the bitcode file itself, when lowering produces
+;; calls to e.g. builtin helper functions. If these functions are provided
+;; as LTO bitcode, the linker would hit an unhandled state. (In practice,
+;; compiler-rt builtins are always compiled with -fno-lto, so this shouldn't
+;; happen.)
+
+; RUN: rm -rf %t.dir
+; RUN: split-file %s %t.dir
+; RUN: llvm-as %t.dir/main.ll -o %t.main.obj
+; RUN: llvm-as %t.dir/sdiv.ll -o %t.sdiv.obj
+; RUN: llvm-ar rcs %t.sdiv.lib %t.sdiv.obj
+
+; RUN: env LLD_IN_TEST=1 not lld-link /entry:entry %t.main.obj %t.sdiv.lib /out:%t.exe /subsystem:console 2>&1 | FileCheck %s
+
+; CHECK: error: LTO object file lto-late-arm.ll.tmp.sdiv.lib(lto-late-arm.ll.tmp.sdiv.obj) linked in after doing LTO compilation.
+
+;--- main.ll
+target datalayout = "e-m:w-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv7-w64-windows-gnu"
+
+@num = dso_local global i32 100
+
+define dso_local arm_aapcs_vfpcc i32 @entry(i32 %param) {
+entry:
+  %0 = load i32, ptr @num
+  %div = sdiv i32 %0, %param
+  ret i32 %div
+}
+;--- sdiv.ll
+target datalayout = "e-m:w-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv7-w64-windows-gnu"
+
+define dso_local arm_aapcs_vfpcc void @__rt_sdiv() {
+entry:
+  ret void
+}
diff --git a/lld/test/COFF/lto-late-personality.ll b/lld/test/COFF/lto-late-personality.ll
new file mode 100644
index 000000000000000..f768f65f4521e40
--- /dev/null
+++ b/lld/test/COFF/lto-late-personality.ll
@@ -0,0 +1,48 @@
+; REQUIRES: x86
+
+;; A bitcode file can generate undefined references to symbols that weren't
+;; listed as undefined on the bitcode file itself, if there's a reference to
+;; an unexpected personality routine via asm(). If the personality function
+;; is provided as LTO bitcode, the linker would hit an unhandled state.
+
+; RUN: rm -rf %t.dir
+; RUN: split-file %s %t.dir
+; RUN: llvm-as %t.dir/main.ll -o %t.main.obj
+; RUN: llvm-as %t.dir/other.ll -o %t.other.obj
+; RUN: llvm-as %t.dir/personality.ll -o %t.personality.obj
+; RUN: llvm-ar rcs %t.personality.lib %t.personality.obj
+
+; RUN: env LLD_IN_TEST=1 not lld-link /entry:entry %t.main.obj %t.other.obj %t.personality.lib /out:%t.exe /subsystem:console /opt:lldlto=0 /debug:symtab 2>&1 | FileCheck %s
+
+; CHECK: error: LTO object file lto-late-personality.ll.tmp.personality.lib(lto-late-personality.ll.tmp.personality.obj) linked in after doing LTO compilation.
+
+;--- main.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+define i32 @entry() {
+entry:
+  tail call void @other()
+  tail call void asm sideeffect ".seh_handler __C_specific_handler, @except\0A", "~{dirflag},~{fpsr},~{flags}"()
+  ret i32 0
+}
+
+declare dso_local void @other()
+
+;--- other.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+define dso_local void @other() {
+entry:
+  ret void
+}
+
+;--- personality.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+define void @__C_specific_handler() {
+entry:
+  ret void
+}

Copy link

github-actions bot commented Nov 5, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 1d95a071d6fc43fb413a0d3f5a9d1e52a18abab0 373f2c7d9e8043f50d6be1849d3feebde44ce9d1 -- lld/COFF/SymbolTable.cpp lld/COFF/SymbolTable.h
View the diff from clang-format here.
diff --git a/lld/COFF/SymbolTable.cpp b/lld/COFF/SymbolTable.cpp
index 15c76461a..27e1bc75d 100644
--- a/lld/COFF/SymbolTable.cpp
+++ b/lld/COFF/SymbolTable.cpp
@@ -62,7 +62,8 @@ void SymbolTable::addFile(InputFile *file) {
       ctx.objFileInstances.push_back(f);
     } else if (auto *f = dyn_cast<BitcodeFile>(file)) {
       if (ltoCompilationDone) {
-        error("LTO object file " + toString(file) + " linked in after "
+        error("LTO object file " + toString(file) +
+              " linked in after "
               "doing LTO compilation.");
       }
       ctx.bitcodeFileInstances.push_back(f);

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@mstorsjo mstorsjo merged commit 7f9a004 into llvm:main Nov 7, 2023
@mstorsjo mstorsjo deleted the lld-lto-late-lto branch November 7, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lld:COFF lld LTO Link time optimization (regular/full LTO or ThinLTO) platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants