-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[SLP]Do extra analysis int minbitwidth if some checks return false. #84363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP]Do extra analysis int minbitwidth if some checks return false. #84363
Conversation
Created using spr 1.3.5
@llvm/pr-subscribers-llvm-transforms Author: Alexey Bataev (alexey-bataev) ChangesThe instruction itself can be considered good for minbitwidth casting, Full diff: https://github.com/llvm/llvm-project/pull/84363.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 1889bc09e85028..c6502c9f77f001 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -13942,7 +13942,7 @@ bool BoUpSLP::collectValuesToDemote(
!collectValuesToDemote(I->getOperand(1), IsProfitableToDemoteRoot,
BitWidth, ToDemote, DemotedConsts, Visited,
Level2, IsProfitableToDemote))
- return false;
+ return IsProfitableToDemote && IsPotentiallyTruncated(I, BitWidth);
MaxDepthLevel = std::max(Level1, Level2);
break;
}
@@ -13958,7 +13958,7 @@ bool BoUpSLP::collectValuesToDemote(
!collectValuesToDemote(SI->getFalseValue(), IsProfitableToDemoteRoot,
BitWidth, ToDemote, DemotedConsts, Visited,
Level2, IsProfitableToDemote))
- return false;
+ return IsProfitableToDemote && IsPotentiallyTruncated(I, BitWidth);
MaxDepthLevel = std::max(Level1, Level2);
break;
}
@@ -13971,7 +13971,7 @@ bool BoUpSLP::collectValuesToDemote(
if (!collectValuesToDemote(IncValue, IsProfitableToDemoteRoot, BitWidth,
ToDemote, DemotedConsts, Visited,
MaxDepthLevel, IsProfitableToDemote))
- return false;
+ return IsProfitableToDemote && IsPotentiallyTruncated(I, BitWidth);
break;
}
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll
index 1986b51ec94828..02d1f9f60d0ca1 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll
@@ -228,7 +228,7 @@ for.end: ; preds = %for.end.loopexit, %
; YAML-NEXT: Function: test_unrolled_select
; YAML-NEXT: Args:
; YAML-NEXT: - String: 'Vectorized horizontal reduction with cost '
-; YAML-NEXT: - Cost: '-36'
+; YAML-NEXT: - Cost: '-40'
; YAML-NEXT: - String: ' and with tree size '
; YAML-NEXT: - TreeSize: '10'
@@ -246,15 +246,17 @@ define i32 @test_unrolled_select(ptr noalias nocapture readonly %blk1, ptr noali
; CHECK-NEXT: [[P2_045:%.*]] = phi ptr [ [[BLK2:%.*]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR88:%.*]], [[IF_END_86]] ]
; CHECK-NEXT: [[P1_044:%.*]] = phi ptr [ [[BLK1:%.*]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.*]], [[IF_END_86]] ]
; CHECK-NEXT: [[TMP0:%.*]] = load <8 x i8>, ptr [[P1_044]], align 1
-; CHECK-NEXT: [[TMP1:%.*]] = zext <8 x i8> [[TMP0]] to <8 x i32>
+; CHECK-NEXT: [[TMP1:%.*]] = zext <8 x i8> [[TMP0]] to <8 x i16>
; CHECK-NEXT: [[TMP2:%.*]] = load <8 x i8>, ptr [[P2_045]], align 1
-; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
-; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <8 x i32> [[TMP1]], [[TMP3]]
-; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <8 x i32> [[TMP4]], zeroinitializer
-; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> zeroinitializer, [[TMP4]]
-; CHECK-NEXT: [[TMP7:%.*]] = select <8 x i1> [[TMP5]], <8 x i32> [[TMP6]], <8 x i32> [[TMP4]]
-; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP7]])
-; CHECK-NEXT: [[OP_RDX]] = add i32 [[TMP8]], [[S_047]]
+; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
+; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i16> [[TMP1]], [[TMP3]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc <8 x i16> [[TMP4]] to <8 x i1>
+; CHECK-NEXT: [[TMP6:%.*]] = icmp slt <8 x i1> [[TMP5]], zeroinitializer
+; CHECK-NEXT: [[TMP7:%.*]] = sub <8 x i16> zeroinitializer, [[TMP4]]
+; CHECK-NEXT: [[TMP8:%.*]] = select <8 x i1> [[TMP6]], <8 x i16> [[TMP7]], <8 x i16> [[TMP4]]
+; CHECK-NEXT: [[TMP9:%.*]] = zext <8 x i16> [[TMP8]] to <8 x i32>
+; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP9]])
+; CHECK-NEXT: [[OP_RDX]] = add i32 [[TMP10]], [[S_047]]
; CHECK-NEXT: [[CMP83:%.*]] = icmp slt i32 [[OP_RDX]], [[LIM:%.*]]
; CHECK-NEXT: br i1 [[CMP83]], label [[IF_END_86]], label [[FOR_END_LOOPEXIT:%.*]]
; CHECK: if.end.86:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Created using spr 1.3.5
Created using spr 1.3.5
The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: #84363
This caused a crash building libjpeg's jdsample.c. Here's a reduced testcase:
Compile with:
(I didn't try to reduce the flags more than removing preprocessing and warning flags) A debug build shows this:
|
The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: #84363
Committed in da118c9 |
I found two new cases that trigger failed asserts, based on the new relanded commit, in openh264 and libschroedinger. Those cases are: typedef int a;
short *b;
short c;
void d() {
a e, f;
e = 0;
for (; e < 16; e += 4) {
b[e] = (f ^ (f ^ b[e] - f) * c >> 16) - f;
f = b[e + 1] >> 1;
b[e + 1] = f ^ ((f ^ b[e + 1] - f) * c >> 16) - f;
b[e + 2] = f ^ ((f ^ b[e + 2] - f) * c >> 16) - f;
b[e + 3] = f ^ ((f ^ b[e + 3] - f) * c >> 16) - f;
}
} typedef struct {
short a
} b;
int c, d;
b *e;
b f, g;
short h, i, j;
void k() {
short l;
d = 0;
for (; d < 8; d++) {
f = e[d];
l = f.a < -1 ? -1 : f.a > 1 ? 1 : f.a;
h = f.a < 0 ? 0 : f.a;
i = h * c;
j = i >> 2;
g.a = j * l;
e[d] = g;
}
} Both can be triggered with the following command - but they appear to trigger failed asserts on a couple different source lines:
|
Thanks, reverting it for the proper fix. |
The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: #84363
Fixed in 81d9ed6 |
Thanks - from a quick rerun it seems like all these cases are fixed now, and large parts of my builds pass (knowing conclusively takes much longer though). |
Hi @alexey-bataev
It fails like:
|
Thanks, will fix it ASAP. |
Fixed in 34f0a8a |
Yep, thanks! |
The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#84363
…false." This reverts commit da118c9 to fix crashes reported in llvm#84363.
The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#84363
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.