Skip to content

Commit 8058196

Browse files
committed
[InstCombine] Process newly inserted instructions in the correct order
InstCombine operates on the basic premise that the operands of the currently processed instruction have already been simplified. It achieves this by pushing instructions to the worklist in reverse program order, so that instructions are popped off in program order. The worklist management in the main combining loop also makes sure to uphold this invariant. However, the same is not true for all the code that is performing manual worklist management. The largest problem (addressed in this patch) are instructions inserted by InstCombine's IRBuilder. These will be pushed onto the worklist in order of insertion (generally matching program order), which means that a) the users of the original instruction will be visited first, as they are pushed later in the main loop and b) the newly inserted instructions will be visited in reverse program order. This causes a number of problems: First, folds operate on instructions that have not had their operands simplified, which may result in optimizations being missed (ran into this in https://reviews.llvm.org/D72048#1800424, which was the original motivation for this patch). Additionally, this increases the amount of folds InstCombine has to perform, both within one iteration, and by increasing the number of total iterations. This patch addresses the issue by adding a Worklist.AddDeferred() method, which is used for instructions inserted by IRBuilder. These will only be added to the real worklist after the combine finished, and in reverse order, so they will end up processed in program order. I should note that the same should also be done to nearly all other uses of Worklist.Add(), but I'm starting with just this occurrence, which has by far the largest test fallout. Most of the test changes are due to https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where we don't canonicalize something. These are neutral. One regression has been addressed in D73575 and D73647. The remaining regression in an shl+sdiv fold can't really be fixed without dropping another transform, but does not seem particularly problematic in the first place. Differential Revision: https://reviews.llvm.org/D73411
1 parent a03ec58 commit 8058196

21 files changed

+149
-131
lines changed

llvm/include/llvm/Transforms/InstCombine/InstCombineWorklist.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ namespace llvm {
2626
class InstCombineWorklist {
2727
SmallVector<Instruction *, 256> Worklist;
2828
DenseMap<Instruction *, unsigned> WorklistMap;
29+
/// These instructions will be added in reverse order after the current
30+
/// combine has finished. This means that these instructions will be visited
31+
/// in the order they have been added.
32+
SmallSetVector<Instruction *, 16> Deferred;
2933

3034
public:
3135
InstCombineWorklist() = default;
@@ -52,6 +56,17 @@ class InstCombineWorklist {
5256
Add(I);
5357
}
5458

59+
void AddDeferred(Instruction *I) {
60+
if (Deferred.insert(I))
61+
LLVM_DEBUG(dbgs() << "IC: ADD DEFERRED: " << *I << '\n');
62+
}
63+
64+
void AddDeferredInstructions() {
65+
for (Instruction *I : reverse(Deferred))
66+
Add(I);
67+
Deferred.clear();
68+
}
69+
5570
/// AddInitialGroup - Add the specified batch of stuff in reverse order.
5671
/// which should only be done when the worklist is empty and when the group
5772
/// has no duplicates.
@@ -77,6 +92,7 @@ class InstCombineWorklist {
7792
Worklist[It->second] = nullptr;
7893

7994
WorklistMap.erase(It);
95+
Deferred.remove(I);
8096
}
8197

8298
Instruction *RemoveOne() {
@@ -99,6 +115,7 @@ class InstCombineWorklist {
99115
/// the map if it is large.
100116
void Zap() {
101117
assert(WorklistMap.empty() && "Worklist empty, but map not?");
118+
assert(Deferred.empty() && "Deferred instructions left over");
102119

103120
// Do an explicit clear, this shrinks the map if needed.
104121
WorklistMap.clear();

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3499,6 +3499,7 @@ bool InstCombiner::run() {
34993499
}
35003500
MadeIRChange = true;
35013501
}
3502+
Worklist.AddDeferredInstructions();
35023503
}
35033504

35043505
Worklist.Zap();
@@ -3664,7 +3665,7 @@ static bool combineInstructionsOverFunction(
36643665
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
36653666
F.getContext(), TargetFolder(DL),
36663667
IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {
3667-
Worklist.Add(I);
3668+
Worklist.AddDeferred(I);
36683669
if (match(I, m_Intrinsic<Intrinsic::assume>()))
36693670
AC.registerAssumption(cast<CallInst>(I));
36703671
}));

llvm/test/Transforms/InstCombine/canonicalize-clamp-like-pattern-between-negative-and-positive-thresholds.ll

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,9 @@
2626
define i32 @t0_ult_slt_128(i32 %x, i32 %replacement_low, i32 %replacement_high) {
2727
; CHECK-LABEL: @t0_ult_slt_128(
2828
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
29-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
29+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
3030
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
31-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
31+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
3232
; CHECK-NEXT: ret i32 [[R]]
3333
;
3434
%t0 = icmp slt i32 %x, 128
@@ -41,9 +41,9 @@ define i32 @t0_ult_slt_128(i32 %x, i32 %replacement_low, i32 %replacement_high)
4141
define i32 @t1_ult_slt_0(i32 %x, i32 %replacement_low, i32 %replacement_high) {
4242
; CHECK-LABEL: @t1_ult_slt_0(
4343
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
44-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
44+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
4545
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
46-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
46+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
4747
; CHECK-NEXT: ret i32 [[R]]
4848
;
4949
%t0 = icmp slt i32 %x, -16
@@ -57,9 +57,9 @@ define i32 @t1_ult_slt_0(i32 %x, i32 %replacement_low, i32 %replacement_high) {
5757
define i32 @t2_ult_sgt_128(i32 %x, i32 %replacement_low, i32 %replacement_high) {
5858
; CHECK-LABEL: @t2_ult_sgt_128(
5959
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
60-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
60+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
6161
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
62-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
62+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
6363
; CHECK-NEXT: ret i32 [[R]]
6464
;
6565
%t0 = icmp sgt i32 %x, 127
@@ -72,9 +72,9 @@ define i32 @t2_ult_sgt_128(i32 %x, i32 %replacement_low, i32 %replacement_high)
7272
define i32 @t3_ult_sgt_neg1(i32 %x, i32 %replacement_low, i32 %replacement_high) {
7373
; CHECK-LABEL: @t3_ult_sgt_neg1(
7474
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
75-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
75+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
7676
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
77-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
77+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
7878
; CHECK-NEXT: ret i32 [[R]]
7979
;
8080
%t0 = icmp sgt i32 %x, -17
@@ -88,9 +88,9 @@ define i32 @t3_ult_sgt_neg1(i32 %x, i32 %replacement_low, i32 %replacement_high)
8888
define i32 @t4_ugt_slt_128(i32 %x, i32 %replacement_low, i32 %replacement_high) {
8989
; CHECK-LABEL: @t4_ugt_slt_128(
9090
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
91-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
91+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
9292
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
93-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
93+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
9494
; CHECK-NEXT: ret i32 [[R]]
9595
;
9696
%t0 = icmp slt i32 %x, 128
@@ -103,9 +103,9 @@ define i32 @t4_ugt_slt_128(i32 %x, i32 %replacement_low, i32 %replacement_high)
103103
define i32 @t5_ugt_slt_0(i32 %x, i32 %replacement_low, i32 %replacement_high) {
104104
; CHECK-LABEL: @t5_ugt_slt_0(
105105
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
106-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
106+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
107107
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
108-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
108+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
109109
; CHECK-NEXT: ret i32 [[R]]
110110
;
111111
%t0 = icmp slt i32 %x, -16
@@ -119,9 +119,9 @@ define i32 @t5_ugt_slt_0(i32 %x, i32 %replacement_low, i32 %replacement_high) {
119119
define i32 @t6_ugt_sgt_128(i32 %x, i32 %replacement_low, i32 %replacement_high) {
120120
; CHECK-LABEL: @t6_ugt_sgt_128(
121121
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
122-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
122+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
123123
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
124-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
124+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
125125
; CHECK-NEXT: ret i32 [[R]]
126126
;
127127
%t0 = icmp sgt i32 %x, 127
@@ -134,9 +134,9 @@ define i32 @t6_ugt_sgt_128(i32 %x, i32 %replacement_low, i32 %replacement_high)
134134
define i32 @t7_ugt_sgt_neg1(i32 %x, i32 %replacement_low, i32 %replacement_high) {
135135
; CHECK-LABEL: @t7_ugt_sgt_neg1(
136136
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], -16
137-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
137+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
138138
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
139-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
139+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
140140
; CHECK-NEXT: ret i32 [[R]]
141141
;
142142
%t0 = icmp sgt i32 %x, -17
@@ -198,9 +198,9 @@ define i32 @t10_oneuse0(i32 %x, i32 %replacement_low, i32 %replacement_high) {
198198
; CHECK-NEXT: [[T0:%.*]] = icmp slt i32 [[X:%.*]], 64
199199
; CHECK-NEXT: call void @use1(i1 [[T0]])
200200
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X]], -16
201-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
201+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
202202
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
203-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
203+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
204204
; CHECK-NEXT: ret i32 [[R]]
205205
;
206206
%t0 = icmp slt i32 %x, 64
@@ -236,9 +236,9 @@ define i32 @t12_oneuse2(i32 %x, i32 %replacement_low, i32 %replacement_high) {
236236
; CHECK-NEXT: [[T2:%.*]] = add i32 [[X:%.*]], 16
237237
; CHECK-NEXT: call void @use32(i32 [[T2]])
238238
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X]], -16
239-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[X]], 128
239+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[X]], 127
240240
; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP1]], i32 [[REPLACEMENT_LOW:%.*]], i32 [[X]]
241-
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 [[REPLACEMENT_HIGH:%.*]]
241+
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP2]], i32 [[REPLACEMENT_HIGH:%.*]], i32 [[TMP3]]
242242
; CHECK-NEXT: ret i32 [[R]]
243243
;
244244
%t0 = icmp slt i32 %x, 64
@@ -407,9 +407,9 @@ define i32 @n19_oneuse9(i32 %x, i32 %replacement_low, i32 %replacement_high) {
407407
define <2 x i32> @t20_ult_slt_vec_splat(<2 x i32> %x, <2 x i32> %replacement_low, <2 x i32> %replacement_high) {
408408
; CHECK-LABEL: @t20_ult_slt_vec_splat(
409409
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt <2 x i32> [[X:%.*]], <i32 -16, i32 -16>
410-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <2 x i32> [[X]], <i32 128, i32 128>
410+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <2 x i32> [[X]], <i32 127, i32 127>
411411
; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> [[REPLACEMENT_LOW:%.*]], <2 x i32> [[X]]
412-
; CHECK-NEXT: [[R:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[REPLACEMENT_HIGH:%.*]]
412+
; CHECK-NEXT: [[R:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[REPLACEMENT_HIGH:%.*]], <2 x i32> [[TMP3]]
413413
; CHECK-NEXT: ret <2 x i32> [[R]]
414414
;
415415
%t0 = icmp slt <2 x i32> %x, <i32 128, i32 128>
@@ -422,9 +422,9 @@ define <2 x i32> @t20_ult_slt_vec_splat(<2 x i32> %x, <2 x i32> %replacement_low
422422
define <2 x i32> @t21_ult_slt_vec_nonsplat(<2 x i32> %x, <2 x i32> %replacement_low, <2 x i32> %replacement_high) {
423423
; CHECK-LABEL: @t21_ult_slt_vec_nonsplat(
424424
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt <2 x i32> [[X:%.*]], <i32 -16, i32 -8>
425-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <2 x i32> [[X]], <i32 128, i32 256>
425+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <2 x i32> [[X]], <i32 127, i32 255>
426426
; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> [[REPLACEMENT_LOW:%.*]], <2 x i32> [[X]]
427-
; CHECK-NEXT: [[R:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[REPLACEMENT_HIGH:%.*]]
427+
; CHECK-NEXT: [[R:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[REPLACEMENT_HIGH:%.*]], <2 x i32> [[TMP3]]
428428
; CHECK-NEXT: ret <2 x i32> [[R]]
429429
;
430430
%t0 = icmp slt <2 x i32> %x, <i32 128, i32 64>
@@ -464,9 +464,9 @@ define <2 x i32> @t23_ult_sge(<2 x i32> %x, <2 x i32> %replacement_low, <2 x i32
464464
; CHECK-NEXT: [[T0:%.*]] = icmp sge <2 x i32> [[X:%.*]], <i32 128, i32 -2147483648>
465465
; CHECK-NEXT: call void @use2xi1(<2 x i1> [[T0]])
466466
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt <2 x i32> [[X]], <i32 -16, i32 -2147483648>
467-
; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <2 x i32> [[X]], <i32 128, i32 2147483647>
467+
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <2 x i32> [[X]], <i32 127, i32 2147483646>
468468
; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> [[REPLACEMENT_LOW:%.*]], <2 x i32> [[X]]
469-
; CHECK-NEXT: [[R:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[REPLACEMENT_HIGH:%.*]]
469+
; CHECK-NEXT: [[R:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[REPLACEMENT_HIGH:%.*]], <2 x i32> [[TMP3]]
470470
; CHECK-NEXT: ret <2 x i32> [[R]]
471471
;
472472
%t0 = icmp sge <2 x i32> %x, <i32 128, i32 -2147483648>

0 commit comments

Comments
 (0)