Skip to content

Conversation

steakhal
Copy link
Contributor

@steakhal steakhal commented Sep 8, 2025

Tldr;
We can't unconditionally trivially copy empty classes because that would clobber the stored entries in the object that the optimized empty class overlaps with.

This regression was introduced by #115918, which introduced other clobbering issues, like the handling of [[no_unique_address]] fields in #137252.

Read issue #157467 for the detailed explanation, but in short, I'd propose reverting the original patch because these was a lot of problems with it for arguably not much gain.
In particular, that patch was motivated by unifying the handling of classes so that copy events would be triggered for a class no matter if it had data members or not.
So in hindsight, it was not worth it.

I plan to backport this to clang-21 as well, and mention in the release notes that this should fix the regression from clang-20.

PS: Also an interesting read D43714 in hindsight.

Fixes #157467
CPP-6574

… again

Tldr;
We can't unconditionally trivially copy empty classes because that would
clobber the stored entries in the object that the optimized empty class
overlaps with.

This regression was introduced by llvm#115918, which introduced other
clobbering issues, like the handling of `[[no_unique_address]]` fields
in llvm#137252.

Read issue llvm#157467 for the detailed explanation, but in short, I'd
propose reverting the original patch because these was a lot of problems
with it for arguably not much gain.
In particular, that patch was motivated by unifying the handling of
classes so that copy events would be triggered for a class no matter if
it had data members or not.
So in hindsight, it was not worth it.

I plan to backport this to clang-21 as well, and mention in the release
notes that this should fix the regression from clang-20.

Fixes llvm#157467
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Sep 8, 2025
@steakhal
Copy link
Contributor Author

steakhal commented Sep 8, 2025

FYI @ziqingluo-90

@llvmbot
Copy link
Member

llvmbot commented Sep 8, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

Tldr;
We can't unconditionally trivially copy empty classes because that would clobber the stored entries in the object that the optimized empty class overlaps with.

This regression was introduced by #115918, which introduced other clobbering issues, like the handling of [[no_unique_address]] fields in #137252.

Read issue #157467 for the detailed explanation, but in short, I'd propose reverting the original patch because these was a lot of problems with it for arguably not much gain.
In particular, that patch was motivated by unifying the handling of classes so that copy events would be triggered for a class no matter if it had data members or not.
So in hindsight, it was not worth it.

I plan to backport this to clang-21 as well, and mention in the release notes that this should fix the regression from clang-20.

Fixes #157467


Full diff: https://github.com/llvm/llvm-project/pull/157480.diff

4 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp (+22-13)
  • (modified) clang/test/Analysis/ctor-trivial-copy.cpp (+3-14)
  • (added) clang/test/Analysis/issue-157467.cpp (+39)
  • (modified) clang/test/Analysis/taint-generic.cpp (+2-1)
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
index c0b28d2ebb212..dee34e3e9d6a5 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
@@ -71,21 +71,30 @@ void ExprEngine::performTrivialCopy(NodeBuilder &Bldr, ExplodedNode *Pred,
   Bldr.takeNodes(Pred);
 
   assert(ThisRD);
-  SVal V = Call.getArgSVal(0);
-  const Expr *VExpr = Call.getArgExpr(0);
 
-  // If the value being copied is not unknown, load from its location to get
-  // an aggregate rvalue.
-  if (std::optional<Loc> L = V.getAs<Loc>())
-    V = Pred->getState()->getSVal(*L);
-  else
-    assert(V.isUnknownOrUndef());
+  if (!ThisRD->isEmpty()) {
+    SVal V = Call.getArgSVal(0);
+    const Expr *VExpr = Call.getArgExpr(0);
 
-  ExplodedNodeSet Tmp;
-  evalLocation(Tmp, CallExpr, VExpr, Pred, Pred->getState(), V,
-               /*isLoad=*/true);
-  for (ExplodedNode *N : Tmp)
-    evalBind(Dst, CallExpr, N, ThisVal, V, !AlwaysReturnsLValue);
+    // If the value being copied is not unknown, load from its location to get
+    // an aggregate rvalue.
+    if (std::optional<Loc> L = V.getAs<Loc>())
+      V = Pred->getState()->getSVal(*L);
+    else
+      assert(V.isUnknownOrUndef());
+
+    ExplodedNodeSet Tmp;
+    evalLocation(Tmp, CallExpr, VExpr, Pred, Pred->getState(), V,
+                 /*isLoad=*/true);
+    for (ExplodedNode *N : Tmp)
+      evalBind(Dst, CallExpr, N, ThisVal, V, !AlwaysReturnsLValue);
+  } else {
+    // We can't copy empty classes because of empty base class optimization.
+    // In that case, copying the empty base class subobject would overwrite the
+    // object that it overlaps with - so let's not do that.
+    // See issue-157467.cpp for an example.
+    Dst.Add(Pred);
+  }
 
   PostStmt PS(CallExpr, LCtx);
   for (ExplodedNode *N : Dst) {
diff --git a/clang/test/Analysis/ctor-trivial-copy.cpp b/clang/test/Analysis/ctor-trivial-copy.cpp
index 45c8ca4c51776..940ff9ba3ed9c 100644
--- a/clang/test/Analysis/ctor-trivial-copy.cpp
+++ b/clang/test/Analysis/ctor-trivial-copy.cpp
@@ -46,15 +46,10 @@ void _01_empty_structs() {
   empty Empty = conjure<empty>();
   empty Empty2 = Empty;
   empty Empty3 = Empty2;
-  // All of these should refer to the exact same symbol, because all of
-  // these trivial copies refer to the original conjured value.
-  // There were Unknown before:
-  clang_analyzer_denote(Empty, "$Empty");
-  clang_analyzer_express(Empty);  // expected-warning {{$Empty}}
-  clang_analyzer_express(Empty2); // expected-warning {{$Empty}}
-  clang_analyzer_express(Empty3); // expected-warning {{$Empty}}
 
-  // We should have the same Conjured symbol for "Empty", "Empty2" and "Empty3".
+  // We only have binding for the original Empty object, because copying empty
+  // objects is a no-op in the performTrivialCopy. This is fine, because empty
+  // objects don't have any data members that could be accessed anyway.
   clang_analyzer_printState();
   // CHECK:       "store": { "pointer": "0x{{[0-9a-f]+}}", "items": [
   // CHECK-NEXT:    { "cluster": "GlobalInternalSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
@@ -65,12 +60,6 @@ void _01_empty_structs() {
   // CHECK-NEXT:    ]},
   // CHECK-NEXT:    { "cluster": "Empty", "pointer": "0x{{[0-9a-f]+}}", "items": [
   // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ:conj_\$[0-9]+{int, LC[0-9]+, S[0-9]+, #[0-9]+}]]" }
-  // CHECK-NEXT:    ]},
-  // CHECK-NEXT:    { "cluster": "Empty2", "pointer": "0x{{[0-9a-f]+}}", "items": [
-  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ]]" }
-  // CHECK-NEXT:    ]},
-  // CHECK-NEXT:    { "cluster": "Empty3", "pointer": "0x{{[0-9a-f]+}}", "items": [
-  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ]]" }
   // CHECK-NEXT:    ]}
   // CHECK-NEXT:  ]},
 
diff --git a/clang/test/Analysis/issue-157467.cpp b/clang/test/Analysis/issue-157467.cpp
new file mode 100644
index 0000000000000..8281ea1ee1aed
--- /dev/null
+++ b/clang/test/Analysis/issue-157467.cpp
@@ -0,0 +1,39 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core -verify %s
+// expected-no-diagnostics
+
+template <class T, int Idx, bool CanBeEmptyBase = __is_empty(T) && (!__is_final(T))>
+struct compressed_pair_elem {
+  explicit compressed_pair_elem(T u) : value(u) {}
+  T value;
+};
+
+template <class T, int Idx>
+struct compressed_pair_elem<T, Idx, /*CanBeEmptyBase=*/true> : T {
+  explicit compressed_pair_elem(T u) : T(u) {}
+};
+
+template <class T1, class T2, class Base1 = compressed_pair_elem<T1, 0>, class Base2 = compressed_pair_elem<T2, 1>>
+struct compressed_pair : Base1, Base2 {
+  explicit compressed_pair(T1 t1, T2 t2) : Base1(t1), Base2(t2) {}
+};
+
+// empty deleter object
+template <class T>
+struct default_delete {
+  void operator()(T* p) {
+    delete p;
+  }
+};
+
+template <class T, class Deleter = default_delete<T> >
+struct some_unique_ptr {
+  // compressed_pair will employ the empty base class optimization, thus overlapping
+  // the `int*` and the empty `Deleter` object, clobbering the pointer.
+  compressed_pair<int*, Deleter> ptr;
+  some_unique_ptr(int* p, Deleter d) : ptr(p, d) {}
+  ~some_unique_ptr();
+};
+
+void entry_point() {
+  some_unique_ptr<int, default_delete<int> > u3(new int(12), default_delete<int>());
+}
diff --git a/clang/test/Analysis/taint-generic.cpp b/clang/test/Analysis/taint-generic.cpp
index c080313e4d172..fc7c37300d3fc 100644
--- a/clang/test/Analysis/taint-generic.cpp
+++ b/clang/test/Analysis/taint-generic.cpp
@@ -153,8 +153,9 @@ void top() {
   int Int = mySource1<int>();
   clang_analyzer_isTainted(Int); // expected-warning {{YES}}
 
+  // It's fine to not propagate taint to empty classes, since they don't have any data members.
   Empty E = mySource1<Empty>();
-  clang_analyzer_isTainted(E); // expected-warning {{YES}}
+  clang_analyzer_isTainted(E); // expected-warning {{NO}}
 
   Aggr A = mySource1<Aggr>();
   clang_analyzer_isTainted(A);      // expected-warning {{YES}}

@NagyDonat
Copy link
Contributor

Sounds reasonable.

I feel a bit guilty that I wasn't able to predict this regression when I reviewed the reverted commit as "LGTM, clean little patch", but I don't think that I could've done better without investing drastically more effort into the review.

By the way does this revert affect any of the commits that were related to the reverted one?

@steakhal
Copy link
Contributor Author

steakhal commented Sep 8, 2025

Sounds reasonable.

I feel a bit guilty that I wasn't able to predict this regression when I reviewed the reverted commit as "LGTM, clean little patch", but I don't think that I could've done better without investing drastically more effort into the review.

You actually challenged it in #114835 (comment).
I don't think it was strictly on us. It's a difficult subject, and a tempting little patch. We had lacking test coverage to surface this too. And I personally don't have the hardware/infra to do full scale A/B testing.
There is no easy way to test changes for real, pre or post merge.
I also raised this at Sonar, because we feel we observe more RT regressions or quality issues year over year, which adds burden to moving to a new llvm release; making it really costly.

Probably more costly than keeping some bots and contributors the right to verify changes pre-merge; but this is this definitely won't happen short/mid term (if ever).

By the way does this revert affect any of the commits that were related to the reverted one?

Great question. I was really tempted to revert the sibling patch of this one: b96c24b
In the end, my current opinion is to keep it because I didn't find evidence that it harms.

Another patch that comes in my mind is made by @ziqingluo-90 db38cc2. I didn't check if this revert would make that code unnecessary, but in theory, they could coexist AFAIU.
That said, #115917 also caused trouble from the same patch-stack, but I'd insist that the intention there is warranted. So if it still has some issues, we should just fix those instead reverting that.

@steakhal steakhal merged commit 38b948b into llvm:main Sep 11, 2025
12 checks passed
@steakhal steakhal deleted the bb/fix-empty-base-class-copy-modeling branch September 11, 2025 14:48
@steakhal
Copy link
Contributor Author

Backport requested in #160450 to clang-21.

dyung pushed a commit to steakhal/llvm-project that referenced this pull request Oct 1, 2025
… again (llvm#157480)

Tldr;
We can't unconditionally trivially copy empty classes because that would
clobber the stored entries in the object that the optimized empty class
overlaps with.

This regression was introduced by llvm#115918, which introduced other
clobbering issues, like the handling of `[[no_unique_address]]` fields
in llvm#137252.

Read issue llvm#157467 for the detailed explanation, but in short, I'd
propose reverting the original patch because these was a lot of problems
with it for arguably not much gain.
In particular, that patch was motivated by unifying the handling of
classes so that copy events would be triggered for a class no matter if
it had data members or not.
So in hindsight, it was not worth it.

I plan to backport this to clang-21 as well, and mention in the release
notes that this should fix the regression from clang-20.

PS: Also an interesting read [D43714](https://reviews.llvm.org/D43714)
in hindsight.

Fixes llvm#157467
CPP-6574

(cherry picked from commit 38b948b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:static analyzer clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[analyzer] New FP with libcxx unique_ptr which uses empty base class optimization

4 participants