Skip to content

Conversation

@fhahn
Copy link
Contributor

@fhahn fhahn commented Aug 15, 2025

Add an extra check to the compare handling to make sure the other
compared pointer cannot be based on a different object.

For example, consider the IR below, which may write @B via a pointer constructed
from @A using PtrToInt and a compare against @B.

  %cmp = icmp eq ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr) , @b
  br i1 %cmp, label %then, label %else

then:
  store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr)
  br label %exit

else:
  br label %exit

exit:
  %l = load i64, ptr @b
  ret i64 %l
}

@llvmbot
Copy link
Member

llvmbot commented Aug 15, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Add an extra check to the compare handling to make sure the other
compared pointer cannot be based on a different object.

For example, consider the IR below, which may write @<!-- -->B via a pointer constructed
from @<!-- -->A using PtrToInt and a compare against @<!-- -->B.

  %cmp = icmp eq ptr inttoptr (i64 add (i64 ptrtoint (ptr @<!-- -->A to i64), i64 8) to ptr) , @<!-- -->b
  br i1 %cmp, label %then, label %else

then:
  store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @<!-- -->A to i64), i64 8) to ptr)
  br label %exit

else:
  br label %exit

exit:
  %l = load i64, ptr @<!-- -->b
  ret i64 %l
}

Full diff: https://github.com/llvm/llvm-project/pull/153789.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/GlobalStatus.cpp (+42)
  • (added) llvm/test/Transforms/GlobalOpt/globals-compares-with-ptrtoint.ll (+148)
diff --git a/llvm/lib/Transforms/Utils/GlobalStatus.cpp b/llvm/lib/Transforms/Utils/GlobalStatus.cpp
index 0b3016a86e287..2452737e97278 100644
--- a/llvm/lib/Transforms/Utils/GlobalStatus.cpp
+++ b/llvm/lib/Transforms/Utils/GlobalStatus.cpp
@@ -61,6 +61,42 @@ bool llvm::isSafeToDestroyConstant(const Constant *C) {
   return true;
 }
 
+static bool maybePointerToDifferentObjectRecursive(Value *V, SmallPtrSetImpl<Value *> &Visited) {
+  if (!Visited.insert(V).second)
+    return false;
+
+  if (Visited.size() > 32)
+    return true;
+
+  // PtrToInt may be used to construct a pointer to a different object. Loads and calls may return a pointer for a different object.
+  if (isa<PtrToIntInst, LoadInst, CallInst>(V))
+    return true;
+
+  if (auto *CE = dyn_cast<ConstantExpr>(V)) {
+    if (CE->getOpcode() == Instruction::PtrToInt)
+      return true;
+
+    for (auto &Op : CE->operands()) {
+      if (maybePointerToDifferentObjectRecursive(Op.get(), Visited))
+        return true;
+    }
+    return false;
+  }
+
+  if (auto *U = dyn_cast<User>(V)) {
+    for (auto &Op : U->operands()) {
+      if (maybePointerToDifferentObjectRecursive(Op.get(), Visited))
+        return true;
+    }
+  }
+  return false;
+}
+
+bool maybePointerToDifferentObject(Value *V) {
+  SmallPtrSet<Value *, 32> Visited;
+  return maybePointerToDifferentObjectRecursive(V, Visited);
+}
+
 static bool analyzeGlobalAux(const Value *V, GlobalStatus &GS,
                              SmallPtrSetImpl<const Value *> &VisitedUsers) {
   if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(V))
@@ -158,6 +194,12 @@ static bool analyzeGlobalAux(const Value *V, GlobalStatus &GS,
             return true;
       } else if (isa<CmpInst>(I)) {
         GS.IsCompared = true;
+
+        if (VisitedUsers.insert(I).second) {
+          for (Value *Op : I->operands())
+            if ( Op != V && maybePointerToDifferentObject(Op))
+              return true;
+        }
       } else if (const MemTransferInst *MTI = dyn_cast<MemTransferInst>(I)) {
         if (MTI->isVolatile())
           return true;
diff --git a/llvm/test/Transforms/GlobalOpt/globals-compares-with-ptrtoint.ll b/llvm/test/Transforms/GlobalOpt/globals-compares-with-ptrtoint.ll
new file mode 100644
index 0000000000000..25495aa6de559
--- /dev/null
+++ b/llvm/test/Transforms/GlobalOpt/globals-compares-with-ptrtoint.ll
@@ -0,0 +1,148 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 5
+; RUN: opt -p globalopt -S %s | FileCheck %s
+
+@A = internal global i64 zeroinitializer
+@B = internal global i64 zeroinitializer
+
+@C = internal global i64 zeroinitializer
+@D = internal global i64 zeroinitializer
+
+@E = internal global i64 zeroinitializer
+@F = internal global i64 zeroinitializer
+
+@G = internal global i64 zeroinitializer
+@H = internal global i64 zeroinitializer
+
+
+
+;.
+; CHECK: @A = internal global i64 0
+; CHECK: @B = internal global i64 0
+; CHECK: @C = internal global i64 0
+; CHECK: @D = internal global i64 0
+; CHECK: @G = internal global i64 0
+; CHECK: @H = internal global i64 0
+;.
+define i64 @A_and_B_cmp_ptrtoint_constant_expr() {
+; CHECK-LABEL: define i64 @A_and_B_cmp_ptrtoint_constant_expr() local_unnamed_addr {
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr), @B
+; CHECK-NEXT:    br i1 [[CMP]], label %[[THEN:.*]], label %[[ELSE:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr), align 4
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[ELSE]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[L:%.*]] = load i64, ptr @B, align 4
+; CHECK-NEXT:    ret i64 [[L]]
+;
+  %cmp = icmp eq ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr) , @B
+  br i1 %cmp, label %then, label %else
+
+then:
+  store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr)
+  br label %exit
+
+else:
+  br label %exit
+
+exit:
+  %l = load i64, ptr @B
+  ret i64 %l
+}
+
+define i64 @G_and_H_cmp_ptrtoint_constant_expr_cmp_ops_swapped() {
+; CHECK-LABEL: define i64 @G_and_H_cmp_ptrtoint_constant_expr_cmp_ops_swapped() local_unnamed_addr {
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr inttoptr (i64 add (i64 ptrtoint (ptr @G to i64), i64 8) to ptr), @H
+; CHECK-NEXT:    br i1 [[CMP]], label %[[THEN:.*]], label %[[ELSE:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @G to i64), i64 8) to ptr), align 4
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[ELSE]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[L:%.*]] = load i64, ptr @H, align 4
+; CHECK-NEXT:    ret i64 [[L]]
+;
+  %cmp = icmp eq ptr inttoptr (i64 add (i64 ptrtoint (ptr @G to i64), i64 8) to ptr) , @H
+  br i1 %cmp, label %then, label %else
+
+then:
+  store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @G to i64), i64 8) to ptr)
+  br label %exit
+
+else:
+  br label %exit
+
+exit:
+  %l = load i64, ptr @H
+  ret i64 %l
+}
+
+define i64 @C_and_D_cmp_ptr_load() {
+; CHECK-LABEL: define i64 @C_and_D_cmp_ptr_load() local_unnamed_addr {
+; CHECK-NEXT:    [[P:%.*]] = alloca ptr, align 8
+; CHECK-NEXT:    store ptr inttoptr (i64 add (i64 ptrtoint (ptr @C to i64), i64 8) to ptr), ptr [[P]], align 8
+; CHECK-NEXT:    [[L_P:%.*]] = load ptr, ptr [[P]], align 8
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr [[L_P]], @D
+; CHECK-NEXT:    br i1 [[CMP]], label %[[THEN:.*]], label %[[ELSE:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @C to i64), i64 8) to ptr), align 4
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[ELSE]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[L:%.*]] = load i64, ptr @D, align 4
+; CHECK-NEXT:    ret i64 [[L]]
+;
+  %p = alloca ptr
+  store ptr inttoptr (i64 add (i64 ptrtoint (ptr @C to i64), i64 8) to ptr), ptr %p
+  %l.p = load ptr, ptr %p
+  %cmp = icmp eq ptr %l.p, @D
+  br i1 %cmp, label %then, label %else
+
+then:
+  store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @C to i64), i64 8) to ptr)
+  br label %exit
+
+else:
+  br label %exit
+
+exit:
+  %l = load i64, ptr @D
+  ret i64 %l
+}
+
+define i64 @D_and_E_cmp_ptrtoint_constant_expr() {
+; CHECK-LABEL: define i64 @D_and_E_cmp_ptrtoint_constant_expr() local_unnamed_addr {
+; CHECK-NEXT:    [[PTR2INT:%.*]] = ptrtoint ptr @A to i64
+; CHECK-NEXT:    [[ADD:%.*]] = add i64 [[PTR2INT]], 8
+; CHECK-NEXT:    [[INT2PTR:%.*]] = inttoptr i64 [[ADD]] to ptr
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr [[INT2PTR]], @B
+; CHECK-NEXT:    br i1 [[CMP]], label %[[THEN:.*]], label %[[ELSE:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr), align 4
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[ELSE]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[L:%.*]] = load i64, ptr @B, align 4
+; CHECK-NEXT:    ret i64 [[L]]
+;
+  %ptr2int = ptrtoint ptr @A to i64
+  %add = add i64 %ptr2int, 8
+  %int2ptr = inttoptr i64 %add to ptr
+  %cmp = icmp eq ptr %int2ptr , @B
+  br i1 %cmp, label %then, label %else
+
+then:
+  store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr)
+  br label %exit
+
+else:
+  br label %exit
+
+exit:
+  %l = load i64, ptr @B
+  ret i64 %l
+}

@github-actions
Copy link

github-actions bot commented Aug 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn fhahn force-pushed the global-opt-check-cmp branch from fceb7b1 to ce3c5d8 Compare August 15, 2025 11:22
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems like it's working the wrong way: in general, you have to assume the pointer points to a different object unless you can prove it points to the same object.

If you want to check if a pointer points to a certain object, you can just use getUnderlyingObject().

fhahn added 3 commits August 18, 2025 17:32
Add an extra check to the compare handling to make sure the other
compared pointer cannot be based on a different object.

For example, consider the IR below, which may write @b via a pointer constructed
from @A using PtrToInt and a compare against @b.

```
  %cmp = icmp eq ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr) , @b
  br i1 %cmp, label %then, label %else

then:
  store i64 10, ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr)
  br label %exit

else:
  br label %exit

exit:
  %l = load i64, ptr @b
  ret i64 %l
}
```
@fhahn fhahn force-pushed the global-opt-check-cmp branch from ce3c5d8 to 05c75e6 Compare August 18, 2025 16:38
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems like it's working the wrong way: in general, you have to assume the pointer points to a different object unless you can prove it points to the same object.

If you want to check if a pointer points to a certain object, you can just use getUnderlyingObject().

Hm, originally I was worried that restricting to same UOs may be too restrictive and cause a large number of regressions.

But I checked this against a large corpus of IR, and there are only changes in 4 files, so it's probably not too bad.

They all seem to come from compares with a constant and a function argument, like in @compare_arg

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


const Value *UO = nullptr;
for (Value *Op : I->operands()) {
if (match(Op, m_Zero()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work when the null pointer is defined. If the function attribute is not suitable, check if AS == 0.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this change. Of course this depends on specific provenance semantics, but assuming a typical provenance exposure model, you cannot in fact use ptr inttoptr (i64 add (i64 ptrtoint (ptr @A to i64), i64 8) to ptr) to write to ptr @b simply because you compared the two values. Comparisons do not expose provenance. This requires the presence of ptrtoint @b.

Of course this breaks down because we optimize away the ptrtoint. But this patch is not the right way to address it. I'd be willing to accept it as a temporary workaround if this is addressing a miscompile observed in real-world code -- is it?

@efriedma-quic
Copy link
Collaborator

Have we explicitly said somewhere that icmp never exposes provenance? I mean, we can make that call I guess, but it has some weird implications. Like, we currently combine ptrtoint(x) == ptrtoint(y) to x == y.

I think this came out of the discussion on #125205; I don't know of any real-world testcase.

@nikic
Copy link
Contributor

nikic commented Aug 18, 2025

Have we explicitly said somewhere that icmp never exposes provenance?

Not quite in these terms, but we do specify now that icmp does not capture the provenance (in the "Pointer Capture" section).

By the way, there has been some recent interest in actually specifying out provenance semantics, so I'm hoping we'll get some clarity around these issues soon(TM).

@fhahn
Copy link
Contributor Author

fhahn commented Aug 19, 2025

Yeah the main motivation was @efriedma-quic 's comments here #125205

@fhahn
Copy link
Contributor Author

fhahn commented Sep 11, 2025

Sounds good, I'll close this PR for now, as my motivating case also disappeared

@fhahn fhahn closed this Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants