-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[analyzer] Revert #115918, so empty base class optimization works again #157480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Revert #115918, so empty base class optimization works again #157480
Conversation
… again Tldr; We can't unconditionally trivially copy empty classes because that would clobber the stored entries in the object that the optimized empty class overlaps with. This regression was introduced by llvm#115918, which introduced other clobbering issues, like the handling of `[[no_unique_address]]` fields in llvm#137252. Read issue llvm#157467 for the detailed explanation, but in short, I'd propose reverting the original patch because these was a lot of problems with it for arguably not much gain. In particular, that patch was motivated by unifying the handling of classes so that copy events would be triggered for a class no matter if it had data members or not. So in hindsight, it was not worth it. I plan to backport this to clang-21 as well, and mention in the release notes that this should fix the regression from clang-20. Fixes llvm#157467
FYI @ziqingluo-90 |
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) ChangesTldr; This regression was introduced by #115918, which introduced other clobbering issues, like the handling of Read issue #157467 for the detailed explanation, but in short, I'd propose reverting the original patch because these was a lot of problems with it for arguably not much gain. I plan to backport this to clang-21 as well, and mention in the release notes that this should fix the regression from clang-20. Fixes #157467 Full diff: https://github.com/llvm/llvm-project/pull/157480.diff 4 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
index c0b28d2ebb212..dee34e3e9d6a5 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
@@ -71,21 +71,30 @@ void ExprEngine::performTrivialCopy(NodeBuilder &Bldr, ExplodedNode *Pred,
Bldr.takeNodes(Pred);
assert(ThisRD);
- SVal V = Call.getArgSVal(0);
- const Expr *VExpr = Call.getArgExpr(0);
- // If the value being copied is not unknown, load from its location to get
- // an aggregate rvalue.
- if (std::optional<Loc> L = V.getAs<Loc>())
- V = Pred->getState()->getSVal(*L);
- else
- assert(V.isUnknownOrUndef());
+ if (!ThisRD->isEmpty()) {
+ SVal V = Call.getArgSVal(0);
+ const Expr *VExpr = Call.getArgExpr(0);
- ExplodedNodeSet Tmp;
- evalLocation(Tmp, CallExpr, VExpr, Pred, Pred->getState(), V,
- /*isLoad=*/true);
- for (ExplodedNode *N : Tmp)
- evalBind(Dst, CallExpr, N, ThisVal, V, !AlwaysReturnsLValue);
+ // If the value being copied is not unknown, load from its location to get
+ // an aggregate rvalue.
+ if (std::optional<Loc> L = V.getAs<Loc>())
+ V = Pred->getState()->getSVal(*L);
+ else
+ assert(V.isUnknownOrUndef());
+
+ ExplodedNodeSet Tmp;
+ evalLocation(Tmp, CallExpr, VExpr, Pred, Pred->getState(), V,
+ /*isLoad=*/true);
+ for (ExplodedNode *N : Tmp)
+ evalBind(Dst, CallExpr, N, ThisVal, V, !AlwaysReturnsLValue);
+ } else {
+ // We can't copy empty classes because of empty base class optimization.
+ // In that case, copying the empty base class subobject would overwrite the
+ // object that it overlaps with - so let's not do that.
+ // See issue-157467.cpp for an example.
+ Dst.Add(Pred);
+ }
PostStmt PS(CallExpr, LCtx);
for (ExplodedNode *N : Dst) {
diff --git a/clang/test/Analysis/ctor-trivial-copy.cpp b/clang/test/Analysis/ctor-trivial-copy.cpp
index 45c8ca4c51776..940ff9ba3ed9c 100644
--- a/clang/test/Analysis/ctor-trivial-copy.cpp
+++ b/clang/test/Analysis/ctor-trivial-copy.cpp
@@ -46,15 +46,10 @@ void _01_empty_structs() {
empty Empty = conjure<empty>();
empty Empty2 = Empty;
empty Empty3 = Empty2;
- // All of these should refer to the exact same symbol, because all of
- // these trivial copies refer to the original conjured value.
- // There were Unknown before:
- clang_analyzer_denote(Empty, "$Empty");
- clang_analyzer_express(Empty); // expected-warning {{$Empty}}
- clang_analyzer_express(Empty2); // expected-warning {{$Empty}}
- clang_analyzer_express(Empty3); // expected-warning {{$Empty}}
- // We should have the same Conjured symbol for "Empty", "Empty2" and "Empty3".
+ // We only have binding for the original Empty object, because copying empty
+ // objects is a no-op in the performTrivialCopy. This is fine, because empty
+ // objects don't have any data members that could be accessed anyway.
clang_analyzer_printState();
// CHECK: "store": { "pointer": "0x{{[0-9a-f]+}}", "items": [
// CHECK-NEXT: { "cluster": "GlobalInternalSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
@@ -65,12 +60,6 @@ void _01_empty_structs() {
// CHECK-NEXT: ]},
// CHECK-NEXT: { "cluster": "Empty", "pointer": "0x{{[0-9a-f]+}}", "items": [
// CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ:conj_\$[0-9]+{int, LC[0-9]+, S[0-9]+, #[0-9]+}]]" }
- // CHECK-NEXT: ]},
- // CHECK-NEXT: { "cluster": "Empty2", "pointer": "0x{{[0-9a-f]+}}", "items": [
- // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ]]" }
- // CHECK-NEXT: ]},
- // CHECK-NEXT: { "cluster": "Empty3", "pointer": "0x{{[0-9a-f]+}}", "items": [
- // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ]]" }
// CHECK-NEXT: ]}
// CHECK-NEXT: ]},
diff --git a/clang/test/Analysis/issue-157467.cpp b/clang/test/Analysis/issue-157467.cpp
new file mode 100644
index 0000000000000..8281ea1ee1aed
--- /dev/null
+++ b/clang/test/Analysis/issue-157467.cpp
@@ -0,0 +1,39 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core -verify %s
+// expected-no-diagnostics
+
+template <class T, int Idx, bool CanBeEmptyBase = __is_empty(T) && (!__is_final(T))>
+struct compressed_pair_elem {
+ explicit compressed_pair_elem(T u) : value(u) {}
+ T value;
+};
+
+template <class T, int Idx>
+struct compressed_pair_elem<T, Idx, /*CanBeEmptyBase=*/true> : T {
+ explicit compressed_pair_elem(T u) : T(u) {}
+};
+
+template <class T1, class T2, class Base1 = compressed_pair_elem<T1, 0>, class Base2 = compressed_pair_elem<T2, 1>>
+struct compressed_pair : Base1, Base2 {
+ explicit compressed_pair(T1 t1, T2 t2) : Base1(t1), Base2(t2) {}
+};
+
+// empty deleter object
+template <class T>
+struct default_delete {
+ void operator()(T* p) {
+ delete p;
+ }
+};
+
+template <class T, class Deleter = default_delete<T> >
+struct some_unique_ptr {
+ // compressed_pair will employ the empty base class optimization, thus overlapping
+ // the `int*` and the empty `Deleter` object, clobbering the pointer.
+ compressed_pair<int*, Deleter> ptr;
+ some_unique_ptr(int* p, Deleter d) : ptr(p, d) {}
+ ~some_unique_ptr();
+};
+
+void entry_point() {
+ some_unique_ptr<int, default_delete<int> > u3(new int(12), default_delete<int>());
+}
diff --git a/clang/test/Analysis/taint-generic.cpp b/clang/test/Analysis/taint-generic.cpp
index c080313e4d172..fc7c37300d3fc 100644
--- a/clang/test/Analysis/taint-generic.cpp
+++ b/clang/test/Analysis/taint-generic.cpp
@@ -153,8 +153,9 @@ void top() {
int Int = mySource1<int>();
clang_analyzer_isTainted(Int); // expected-warning {{YES}}
+ // It's fine to not propagate taint to empty classes, since they don't have any data members.
Empty E = mySource1<Empty>();
- clang_analyzer_isTainted(E); // expected-warning {{YES}}
+ clang_analyzer_isTainted(E); // expected-warning {{NO}}
Aggr A = mySource1<Aggr>();
clang_analyzer_isTainted(A); // expected-warning {{YES}}
|
Sounds reasonable. I feel a bit guilty that I wasn't able to predict this regression when I reviewed the reverted commit as "LGTM, clean little patch", but I don't think that I could've done better without investing drastically more effort into the review. By the way does this revert affect any of the commits that were related to the reverted one? |
You actually challenged it in #114835 (comment). Probably more costly than keeping some bots and contributors the right to verify changes pre-merge; but this is this definitely won't happen short/mid term (if ever).
Great question. I was really tempted to revert the sibling patch of this one: b96c24b Another patch that comes in my mind is made by @ziqingluo-90 db38cc2. I didn't check if this revert would make that code unnecessary, but in theory, they could coexist AFAIU. |
Backport requested in #160450 to clang-21. |
… again (llvm#157480) Tldr; We can't unconditionally trivially copy empty classes because that would clobber the stored entries in the object that the optimized empty class overlaps with. This regression was introduced by llvm#115918, which introduced other clobbering issues, like the handling of `[[no_unique_address]]` fields in llvm#137252. Read issue llvm#157467 for the detailed explanation, but in short, I'd propose reverting the original patch because these was a lot of problems with it for arguably not much gain. In particular, that patch was motivated by unifying the handling of classes so that copy events would be triggered for a class no matter if it had data members or not. So in hindsight, it was not worth it. I plan to backport this to clang-21 as well, and mention in the release notes that this should fix the regression from clang-20. PS: Also an interesting read [D43714](https://reviews.llvm.org/D43714) in hindsight. Fixes llvm#157467 CPP-6574 (cherry picked from commit 38b948b)
Tldr;
We can't unconditionally trivially copy empty classes because that would clobber the stored entries in the object that the optimized empty class overlaps with.
This regression was introduced by #115918, which introduced other clobbering issues, like the handling of
[[no_unique_address]]
fields in #137252.Read issue #157467 for the detailed explanation, but in short, I'd propose reverting the original patch because these was a lot of problems with it for arguably not much gain.
In particular, that patch was motivated by unifying the handling of classes so that copy events would be triggered for a class no matter if it had data members or not.
So in hindsight, it was not worth it.
I plan to backport this to clang-21 as well, and mention in the release notes that this should fix the regression from clang-20.
PS: Also an interesting read D43714 in hindsight.
Fixes #157467
CPP-6574