Skip to content

[VectorCombine] Fold bitwise operations of bitcasts into bitcast of bitwise operation #137322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 26, 2025

Conversation

vortex73
Copy link
Contributor

@vortex73 vortex73 commented Apr 25, 2025

Currently, LLVM fails to convert certain pblendvb intrinsics into select instructions when the blend mask is derived from complex boolean logic operations. This occurs even when the mask is ultimately based on sign-extended comparison results, preventing further optimization opportunities.

For example:

auto tricky(__m128i a, __m128i b, __m128i c, __m128i src) {
    __m128i aValid = _mm_cmpgt_epi32(a, _mm_setzero_si128());
    __m128i bValid = _mm_cmpgt_epi32(b, _mm_setzero_si128());
    __m128i cValid = _mm_cmpgt_epi32(c, _mm_setzero_si128());
    __m128i bothValid = _mm_and_si128(aValid, bValid);
    __m128i allValid = _mm_xor_si128(bothValid, cValid);
    
    __m128i forceA = _mm_and_si128(allValid, aValid);
    __m128i forceB = _mm_and_si128(allValid, bValid);
    
    __m128i out = _mm_and_si128(src, bothValid);
    out = _mm_blendv_epi8(out, a, forceA);
    out = _mm_blendv_epi8(out, b, forceB);
    return out;
}

Fixes #66513

@vortex73 vortex73 changed the title [InstCombine] Pre-Commit Tests [InstCombine] [X86] pblendvb intrinsics must be replaced by select when possible Apr 25, 2025
Copy link

github-actions bot commented Apr 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@vortex73
Copy link
Contributor Author

@RKSimon This is still a WIP, but I'd like any feedback on the approach so I can correct it right away. I've done a minor bit of refactoring. The replacement still doesn't seem to happen, I'm figuring out why. And as per earlier discussions I'm trying not to play around with OneUse checks.

@RKSimon RKSimon self-requested a review April 25, 2025 13:54
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the new blend.ll file - we already have test coverage in PhaseOrdering/X86

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 25, 2025

I'm not convinced all this is necessary - a small VectorCombine pass for bitops that matches:

  bitop(bitcast(x),bitcast(y)) -> bitcast(bitop(x,y))

along with cost checks should be enough.

@vortex73
Copy link
Contributor Author

vortex73 commented Apr 25, 2025

You don't need the new blend.ll file - we already have test coverage in PhaseOrdering/X86

Oh alright I'll rectify this.

I'm not convinced all this is necessary - a small VectorCombine pass for bitops that matches:

  bitop(bitcast(x),bitcast(y)) -> bitcast(bitop(x,y))

along with cost checks should be enough.

I had considered this, but my main motivator for the decomposition approach was the fact that it could handle deeper logical expressions without the need for additional passes. Anyways I'm open to switch back but I'm guessing some sort of hybrid approach could be taken later to handle more complex patterns.

// Handle AND of sign-extended vectors: (sext A) & (sext B) -> sext(A & B)
Value *LHS, *RHS;
Value *LHSInner, *RHSInner;
if (match(Mask, m_And(m_Value(LHS), m_Value(RHS)))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do go with this route, you should be able to merge the and/or/xor matches with a single m_BitwiseLogic (and use Builder::CreateBinOp() instead).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@vortex73
Copy link
Contributor Author

@RKSimon To the best of my knowledge, I've written logical pattern matching and it should be replacing with select but, it just keeps giving me pblendvb . I'll sit with this again tomorrow(have exams + have no clue what I did wrong). If you can spot anything please do tell!

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 12, 2025

@vortex73 ping?

@vortex73
Copy link
Contributor Author

@vortex73 ping?

So sorry about the delay...
I tried a lot doing my way of the recursion - still haven't quire figured out why that didn't work (maybe just doesn't identify complex patterns)

I've implemented it as a vectorcombine transform like you suggested initially and it seems to be working. I shall clean up the tree and push the changes :)

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 16, 2025

Please regenerate Transforms\VectorCombine\AArch64\shrink-types.ll to see if it fixes the CI failure

@RKSimon RKSimon requested a review from davemgreen June 16, 2025 17:55
@vortex73 vortex73 marked this pull request as ready for review June 16, 2025 18:09
@llvmbot
Copy link
Member

llvmbot commented Jun 16, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Narayan (vortex73)

Changes

Currently, LLVM fails to convert certain pblendvb intrinsics into select instructions when the blend mask is derived from complex boolean logic operations. This occurs even when the mask is ultimately based on sign-extended comparison results, preventing further optimization opportunities.

For example:

auto tricky(__m128i a, __m128i b, __m128i c, __m128i src) {
    __m128i aValid = _mm_cmpgt_epi32(a, _mm_setzero_si128());
    __m128i bValid = _mm_cmpgt_epi32(b, _mm_setzero_si128());
    __m128i cValid = _mm_cmpgt_epi32(c, _mm_setzero_si128());
    __m128i bothValid = _mm_and_si128(aValid, bValid);
    __m128i allValid = _mm_xor_si128(bothValid, cValid);
    
    __m128i forceA = _mm_and_si128(allValid, aValid);
    __m128i forceB = _mm_and_si128(allValid, bValid);
    
    __m128i out = _mm_and_si128(src, bothValid);
    out = _mm_blendv_epi8(out, a, forceA);
    out = _mm_blendv_epi8(out, b, forceB);
    return out;
}

Fixes #66513


Full diff: https://github.com/llvm/llvm-project/pull/137322.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+71)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/blendv-select.ll (+11-19)
  • (modified) llvm/test/Transforms/VectorCombine/AArch64/shrink-types.ll (+12-17)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 04c084ffdda97..6b1419210f363 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -111,6 +111,7 @@ class VectorCombine {
   bool foldInsExtFNeg(Instruction &I);
   bool foldInsExtBinop(Instruction &I);
   bool foldInsExtVectorToShuffle(Instruction &I);
+  bool foldBitOpOfBitcasts(Instruction &I);
   bool foldBitcastShuffle(Instruction &I);
   bool scalarizeBinopOrCmp(Instruction &I);
   bool scalarizeVPIntrinsic(Instruction &I);
@@ -801,6 +802,71 @@ bool VectorCombine::foldInsExtBinop(Instruction &I) {
   return true;
 }
 
+bool VectorCombine::foldBitOpOfBitcasts(Instruction &I) {
+  // Match: bitop(bitcast(x), bitcast(y)) -> bitcast(bitop(x, y))
+  auto *BinOp = dyn_cast<BinaryOperator>(&I);
+  if (!BinOp || !BinOp->isBitwiseLogicOp())
+    return false;
+
+  Value *LHS = BinOp->getOperand(0);
+  Value *RHS = BinOp->getOperand(1);
+
+  // Both operands must be bitcasts
+  auto *LHSCast = dyn_cast<BitCastInst>(LHS);
+  auto *RHSCast = dyn_cast<BitCastInst>(RHS);
+  if (!LHSCast || !RHSCast)
+    return false;
+
+  Value *LHSSrc = LHSCast->getOperand(0);
+  Value *RHSSrc = RHSCast->getOperand(0);
+
+  // Source types must match
+  if (LHSSrc->getType() != RHSSrc->getType())
+    return false;
+
+  if (!LHSSrc->getType()->getScalarType()->isIntegerTy())
+    return false;
+
+  // Only handle vector types
+  auto *SrcVecTy = dyn_cast<FixedVectorType>(LHSSrc->getType());
+  auto *DstVecTy = dyn_cast<FixedVectorType>(I.getType());
+  if (!SrcVecTy || !DstVecTy)
+    return false;
+
+  // Same total bit width
+  assert(SrcVecTy->getPrimitiveSizeInBits() ==
+             DstVecTy->getPrimitiveSizeInBits() &&
+         "Bitcast should preserve total bit width");
+
+  // Cost Check :
+  // OldCost = bitlogic + 2*bitcasts
+  // NewCost = bitlogic + bitcast
+  auto OldCost =
+      TTI.getArithmeticInstrCost(BinOp->getOpcode(), DstVecTy) +
+      TTI.getCastInstrCost(Instruction::BitCast, DstVecTy, LHSSrc->getType(),
+                           TTI::CastContextHint::None) +
+      TTI.getCastInstrCost(Instruction::BitCast, DstVecTy, RHSSrc->getType(),
+                           TTI::CastContextHint::None);
+
+  auto NewCost = TTI.getArithmeticInstrCost(BinOp->getOpcode(), SrcVecTy) +
+                 TTI.getCastInstrCost(Instruction::BitCast, DstVecTy, SrcVecTy,
+                                      TTI::CastContextHint::None);
+
+  if (NewCost > OldCost)
+    return false;
+
+  // Create the operation on the source type
+  Value *NewOp = Builder.CreateBinOp(BinOp->getOpcode(), LHSSrc, RHSSrc,
+                                     BinOp->getName() + ".inner");
+  if (auto *NewBinOp = dyn_cast<BinaryOperator>(NewOp))
+    NewBinOp->copyIRFlags(BinOp);
+
+  // Bitcast the result back
+  Value *Result = Builder.CreateBitCast(NewOp, I.getType());
+  replaceValue(I, *Result);
+  return true;
+}
+
 /// If this is a bitcast of a shuffle, try to bitcast the source vector to the
 /// destination type followed by shuffle. This can enable further transforms by
 /// moving bitcasts or shuffles together.
@@ -3562,6 +3628,11 @@ bool VectorCombine::run() {
       case Instruction::BitCast:
         MadeChange |= foldBitcastShuffle(I);
         break;
+      case Instruction::And:
+      case Instruction::Or:
+      case Instruction::Xor:
+        MadeChange |= foldBitOpOfBitcasts(I);
+        break;
       default:
         MadeChange |= shrinkType(I);
         break;
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/blendv-select.ll b/llvm/test/Transforms/PhaseOrdering/X86/blendv-select.ll
index 22e4239009dd2..daf4a7b799dd4 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/blendv-select.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/blendv-select.ll
@@ -477,30 +477,22 @@ define <2 x i64> @PR66513(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c, <2 x i64> %s
 ; CHECK-LABEL: @PR66513(
 ; CHECK-NEXT:    [[I:%.*]] = bitcast <2 x i64> [[A:%.*]] to <4 x i32>
 ; CHECK-NEXT:    [[CMP_I23:%.*]] = icmp sgt <4 x i32> [[I]], zeroinitializer
-; CHECK-NEXT:    [[SEXT_I24:%.*]] = sext <4 x i1> [[CMP_I23]] to <4 x i32>
-; CHECK-NEXT:    [[I1:%.*]] = bitcast <4 x i32> [[SEXT_I24]] to <2 x i64>
 ; CHECK-NEXT:    [[I2:%.*]] = bitcast <2 x i64> [[B:%.*]] to <4 x i32>
 ; CHECK-NEXT:    [[CMP_I21:%.*]] = icmp sgt <4 x i32> [[I2]], zeroinitializer
-; CHECK-NEXT:    [[SEXT_I22:%.*]] = sext <4 x i1> [[CMP_I21]] to <4 x i32>
-; CHECK-NEXT:    [[I3:%.*]] = bitcast <4 x i32> [[SEXT_I22]] to <2 x i64>
 ; CHECK-NEXT:    [[I4:%.*]] = bitcast <2 x i64> [[C:%.*]] to <4 x i32>
 ; CHECK-NEXT:    [[CMP_I:%.*]] = icmp sgt <4 x i32> [[I4]], zeroinitializer
-; CHECK-NEXT:    [[SEXT_I:%.*]] = sext <4 x i1> [[CMP_I]] to <4 x i32>
+; CHECK-NEXT:    [[NARROW:%.*]] = select <4 x i1> [[CMP_I21]], <4 x i1> [[CMP_I23]], <4 x i1> zeroinitializer
+; CHECK-NEXT:    [[XOR_I_INNER1:%.*]] = xor <4 x i1> [[NARROW]], [[CMP_I]]
+; CHECK-NEXT:    [[NARROW3:%.*]] = select <4 x i1> [[CMP_I23]], <4 x i1> [[XOR_I_INNER1]], <4 x i1> zeroinitializer
+; CHECK-NEXT:    [[AND_I25_INNER2:%.*]] = and <4 x i1> [[XOR_I_INNER1]], [[CMP_I21]]
+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x i64> [[SRC:%.*]] to <4 x i32>
+; CHECK-NEXT:    [[TMP2:%.*]] = select <4 x i1> [[NARROW]], <4 x i32> [[TMP1]], <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP3:%.*]] = bitcast <2 x i64> [[A]] to <4 x i32>
+; CHECK-NEXT:    [[TMP4:%.*]] = select <4 x i1> [[NARROW3]], <4 x i32> [[TMP3]], <4 x i32> [[TMP2]]
+; CHECK-NEXT:    [[TMP5:%.*]] = bitcast <2 x i64> [[B]] to <4 x i32>
+; CHECK-NEXT:    [[SEXT_I:%.*]] = select <4 x i1> [[AND_I25_INNER2]], <4 x i32> [[TMP5]], <4 x i32> [[TMP4]]
 ; CHECK-NEXT:    [[I5:%.*]] = bitcast <4 x i32> [[SEXT_I]] to <2 x i64>
-; CHECK-NEXT:    [[AND_I27:%.*]] = and <2 x i64> [[I3]], [[I1]]
-; CHECK-NEXT:    [[XOR_I:%.*]] = xor <2 x i64> [[AND_I27]], [[I5]]
-; CHECK-NEXT:    [[AND_I26:%.*]] = and <2 x i64> [[XOR_I]], [[I1]]
-; CHECK-NEXT:    [[AND_I25:%.*]] = and <2 x i64> [[XOR_I]], [[I3]]
-; CHECK-NEXT:    [[AND_I:%.*]] = and <2 x i64> [[AND_I27]], [[SRC:%.*]]
-; CHECK-NEXT:    [[I6:%.*]] = bitcast <2 x i64> [[AND_I]] to <16 x i8>
-; CHECK-NEXT:    [[I7:%.*]] = bitcast <2 x i64> [[A]] to <16 x i8>
-; CHECK-NEXT:    [[I8:%.*]] = bitcast <2 x i64> [[AND_I26]] to <16 x i8>
-; CHECK-NEXT:    [[I9:%.*]] = tail call <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8> [[I6]], <16 x i8> [[I7]], <16 x i8> [[I8]])
-; CHECK-NEXT:    [[I12:%.*]] = bitcast <2 x i64> [[B]] to <16 x i8>
-; CHECK-NEXT:    [[I13:%.*]] = bitcast <2 x i64> [[AND_I25]] to <16 x i8>
-; CHECK-NEXT:    [[I14:%.*]] = tail call <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8> [[I9]], <16 x i8> [[I12]], <16 x i8> [[I13]])
-; CHECK-NEXT:    [[I15:%.*]] = bitcast <16 x i8> [[I14]] to <2 x i64>
-; CHECK-NEXT:    ret <2 x i64> [[I15]]
+; CHECK-NEXT:    ret <2 x i64> [[I5]]
 ;
   %i = bitcast <2 x i64> %a to <4 x i32>
   %cmp.i23 = icmp sgt <4 x i32> %i, zeroinitializer
diff --git a/llvm/test/Transforms/VectorCombine/AArch64/shrink-types.ll b/llvm/test/Transforms/VectorCombine/AArch64/shrink-types.ll
index 3c672efbb5a07..761ad80d560e8 100644
--- a/llvm/test/Transforms/VectorCombine/AArch64/shrink-types.ll
+++ b/llvm/test/Transforms/VectorCombine/AArch64/shrink-types.ll
@@ -7,9 +7,8 @@ define i32 @test_and(<16 x i32> %a, ptr %b) {
 ; CHECK-LABEL: @test_and(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[B:%.*]], align 1
-; CHECK-NEXT:    [[TMP0:%.*]] = trunc <16 x i32> [[A:%.*]] to <16 x i8>
-; CHECK-NEXT:    [[TMP1:%.*]] = and <16 x i8> [[WIDE_LOAD]], [[TMP0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = zext <16 x i8> [[TMP1]] to <16 x i32>
+; CHECK-NEXT:    [[TMP0:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i32>
+; CHECK-NEXT:    [[TMP2:%.*]] = and <16 x i32> [[TMP0]], [[A:%.*]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP2]])
 ; CHECK-NEXT:    ret i32 [[TMP3]]
 ;
@@ -26,9 +25,8 @@ define i32 @test_mask_or(<16 x i32> %a, ptr %b) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[B:%.*]], align 1
 ; CHECK-NEXT:    [[A_MASKED:%.*]] = and <16 x i32> [[A:%.*]], splat (i32 16)
-; CHECK-NEXT:    [[TMP0:%.*]] = trunc <16 x i32> [[A_MASKED]] to <16 x i8>
-; CHECK-NEXT:    [[TMP1:%.*]] = or <16 x i8> [[WIDE_LOAD]], [[TMP0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = zext <16 x i8> [[TMP1]] to <16 x i32>
+; CHECK-NEXT:    [[TMP0:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i32>
+; CHECK-NEXT:    [[TMP2:%.*]] = or <16 x i32> [[TMP0]], [[A_MASKED]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP2]])
 ; CHECK-NEXT:    ret i32 [[TMP3]]
 ;
@@ -47,15 +45,13 @@ define i32 @multiuse(<16 x i32> %u, <16 x i32> %v, ptr %b) {
 ; CHECK-NEXT:    [[U_MASKED:%.*]] = and <16 x i32> [[U:%.*]], splat (i32 255)
 ; CHECK-NEXT:    [[V_MASKED:%.*]] = and <16 x i32> [[V:%.*]], splat (i32 255)
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[B:%.*]], align 1
-; CHECK-NEXT:    [[TMP0:%.*]] = lshr <16 x i8> [[WIDE_LOAD]], splat (i8 4)
-; CHECK-NEXT:    [[TMP1:%.*]] = trunc <16 x i32> [[V_MASKED]] to <16 x i8>
-; CHECK-NEXT:    [[TMP2:%.*]] = or <16 x i8> [[TMP0]], [[TMP1]]
-; CHECK-NEXT:    [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i32>
-; CHECK-NEXT:    [[TMP4:%.*]] = and <16 x i8> [[WIDE_LOAD]], splat (i8 15)
-; CHECK-NEXT:    [[TMP5:%.*]] = trunc <16 x i32> [[U_MASKED]] to <16 x i8>
-; CHECK-NEXT:    [[TMP6:%.*]] = or <16 x i8> [[TMP4]], [[TMP5]]
+; CHECK-NEXT:    [[TMP0:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i32>
+; CHECK-NEXT:    [[TMP6:%.*]] = lshr <16 x i8> [[WIDE_LOAD]], splat (i8 4)
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext <16 x i8> [[TMP6]] to <16 x i32>
-; CHECK-NEXT:    [[TMP8:%.*]] = add nuw nsw <16 x i32> [[TMP3]], [[TMP7]]
+; CHECK-NEXT:    [[TMP3:%.*]] = or <16 x i32> [[TMP7]], [[V_MASKED]]
+; CHECK-NEXT:    [[TMP4:%.*]] = and <16 x i32> [[TMP0]], splat (i32 15)
+; CHECK-NEXT:    [[TMP5:%.*]] = or <16 x i32> [[TMP4]], [[U_MASKED]]
+; CHECK-NEXT:    [[TMP8:%.*]] = add nuw nsw <16 x i32> [[TMP3]], [[TMP5]]
 ; CHECK-NEXT:    [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP8]])
 ; CHECK-NEXT:    ret i32 [[TMP9]]
 ;
@@ -81,9 +77,8 @@ define i32 @phi_bug(<16 x i32> %a, ptr %b) {
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[A_PHI:%.*]] = phi <16 x i32> [ [[A:%.*]], [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    [[WIDE_LOAD_PHI:%.*]] = phi <16 x i8> [ [[WIDE_LOAD]], [[ENTRY]] ]
-; CHECK-NEXT:    [[TMP0:%.*]] = trunc <16 x i32> [[A_PHI]] to <16 x i8>
-; CHECK-NEXT:    [[TMP1:%.*]] = and <16 x i8> [[WIDE_LOAD_PHI]], [[TMP0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = zext <16 x i8> [[TMP1]] to <16 x i32>
+; CHECK-NEXT:    [[TMP0:%.*]] = zext <16 x i8> [[WIDE_LOAD_PHI]] to <16 x i32>
+; CHECK-NEXT:    [[TMP2:%.*]] = and <16 x i32> [[TMP0]], [[A_PHI]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP2]])
 ; CHECK-NEXT:    ret i32 [[TMP3]]
 ;

@vortex73
Copy link
Contributor Author

@RKSimon Anything else I need to do regarding this?

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minors

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 24, 2025

please can you update the title + summary now that its a VectorCombine fold?

@vortex73 vortex73 changed the title [InstCombine] [X86] pblendvb intrinsics must be replaced by select when possible [VectorCombine] pblendvb intrinsics must be replaced by select when possible Jun 25, 2025
@vortex73 vortex73 changed the title [VectorCombine] pblendvb intrinsics must be replaced by select when possible [VectorCombine] Fold bitwise operations of bitcasts into bitcast of bitwise operation Jun 25, 2025
@vortex73
Copy link
Contributor Author

please can you update the title + summary now that its a VectorCombine fold?

Done!

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers!

@RKSimon RKSimon merged commit d05634d into llvm:main Jun 26, 2025
7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 26, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-msan running on sanitizer-buildbot5 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/164/builds/11204

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) (timed out)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) All available features: add-latomic-workaround, buildhost=linux, c++26, can-create-symlinks, character-conversion-warnings, clang, clang-21, clang-21.0, clang-21.0.0, diagnose-if-support, enable-benchmarks=no, gcc-style-warnings, glibc-old-ru_RU-decimal-point, has-1024-bit-atomics, has-64-bit-atomics, has-fblocks, has-fconstexpr-steps, has-unix-headers, large_tests, libcpp-abi-version=1, libcpp-has-no-availability-markup, libcpp-has-no-experimental-syncstream, libcpp-has-no-experimental-tzdb, libcpp-has-no-incomplete-pstl, libcpp-has-thread-api-pthread, linux, locale.en_US.UTF-8, long_tests, msan, objective-c++, optimization=none, sanitizer-new-delete, std-at-least-c++03, std-at-least-c++11, std-at-least-c++14, std-at-least-c++17, std-at-least-c++20, std-at-least-c++23, std-at-least-c++26, stdlib=libc++, stdlib=llvm-libc++, target=x86_64-unknown-linux-gnu, thread-safety, verify-support
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{cxx} substitution: '/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build0/bin/clang++'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{flags} substitution: '-pthread --target=x86_64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=memory'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{compile_flags} substitution: '-nostdinc++ -I %{target-include-dir} -I %{include-dir} -I %{libcxx-dir}/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wshift-negative-value -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -Wno-nullability-completeness -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{link_flags} substitution: '-lc++experimental -nostdlib++ -L %{lib-dir} -Wl,-rpath,%{lib-dir} -lc++ -latomic'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{benchmark_flags} substitution: '-I /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxx/test/benchmarks/google-benchmark/include -L /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxx/test/benchmarks/google-benchmark/lib -L /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxx/test/benchmarks/google-benchmark/lib64 -l benchmark'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{exec} substitution: '%{executor} --execdir %T -- '
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) All available features: add-latomic-workaround, buildhost=linux, c++26, c++experimental, can-create-symlinks, character-conversion-warnings, clang, clang-21, clang-21.0, clang-21.0.0, diagnose-if-support, enable-benchmarks=no, gcc-style-warnings, glibc-old-ru_RU-decimal-point, has-1024-bit-atomics, has-64-bit-atomics, has-fblocks, has-fconstexpr-steps, has-unix-headers, large_tests, libcpp-abi-version=1, libcpp-hardening-mode=none, libcpp-has-no-availability-markup, libcpp-has-thread-api-pthread, linux, locale.en_US.UTF-8, msan, objective-c++, optimization=none, sanitizer-new-delete, std-at-least-c++03, std-at-least-c++11, std-at-least-c++14, std-at-least-c++17, std-at-least-c++20, std-at-least-c++23, std-at-least-c++26, stdlib=libc++, stdlib=llvm-libc++, target=x86_64-unknown-linux-gnu, thread-safety, verify-support
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/main.py:73: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1500 seconds was requested on the command line. Forcing timeout to be 1500 seconds.
-- Testing: 10819 of 10833 tests, 88 workers --
command timed out: 1200 seconds without output running [b'python', b'../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=2544.479359
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
Step 10 (stage2/msan check-cxx) failure: stage2/msan check-cxx (failure)
...
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cfloat.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cinttypes.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/climits.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/clocale.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cmath.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/csetjmp.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/csignal.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cstdarg.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cstddef.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cstdint.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cstdio.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cstdlib.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cstring.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/ctime.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cuchar.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cwchar.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat/cwctype.inc
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.cppm
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/share/libc++/v1/std.compat.cppm
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/lib/libc++.modules.json
-- Install configuration: "Release"
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/lib/libc++.so.1.0
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/lib/libc++.so.1
-- Set non-toolchain portion of runtime path of "/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/lib/libc++.so.1.0" to ""
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/lib/libc++.so
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/lib/libc++.a
-- Installing: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxxabi/test-suite-install/lib/libc++experimental.a
[4/5] Running runtimes regression tests
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) Using %{cxx} substitution: '/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build0/bin/clang++'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) Using %{flags} substitution: ' --target=x86_64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=memory'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) Using %{compile_flags} substitution: '-nostdinc++ -I %{include} -I %{cxx-include} -I %{cxx-target-include} %{maybe-include-libunwind} -I %{libcxx}/test/support -I %{libcxx}/src -D_LIBCPP_ENABLE_CXX17_REMOVED_UNEXPECTED_FUNCTIONS -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wshift-negative-value -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -Wno-nullability-completeness -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -Werror=thread-safety -Wuser-defined-warnings'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) Using %{link_flags} substitution: '-nostdlib++ -L %{lib} -Wl,-rpath,%{lib} -lc++ -lc++abi -pthread -latomic'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) Using %{benchmark_flags} substitution: ''
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) Using %{exec} substitution: '%{executor} --execdir %T -- '
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++abi-shared.cfg.in) All available features: add-latomic-workaround, buildhost=linux, c++26, can-create-symlinks, character-conversion-warnings, clang, clang-21, clang-21.0, clang-21.0.0, diagnose-if-support, enable-benchmarks=no, gcc-style-warnings, glibc-old-ru_RU-decimal-point, has-1024-bit-atomics, has-64-bit-atomics, has-fblocks, has-fconstexpr-steps, has-unix-headers, large_tests, libcpp-abi-version=1, libcpp-has-no-availability-markup, libcpp-has-no-experimental-syncstream, libcpp-has-no-experimental-tzdb, libcpp-has-no-incomplete-pstl, libcpp-has-thread-api-pthread, linux, locale.en_US.UTF-8, long_tests, msan, objective-c++, optimization=none, sanitizer-new-delete, std-at-least-c++03, std-at-least-c++11, std-at-least-c++14, std-at-least-c++17, std-at-least-c++20, std-at-least-c++23, std-at-least-c++26, stdlib=libc++, stdlib=llvm-libc++, target=x86_64-unknown-linux-gnu, thread-safety, verify-support
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{cxx} substitution: '/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build0/bin/clang++'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{flags} substitution: '-pthread --target=x86_64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=memory'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{compile_flags} substitution: '-nostdinc++ -I %{target-include-dir} -I %{include-dir} -I %{libcxx-dir}/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wshift-negative-value -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -Wno-nullability-completeness -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{link_flags} substitution: '-lc++experimental -nostdlib++ -L %{lib-dir} -Wl,-rpath,%{lib-dir} -lc++ -latomic'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{benchmark_flags} substitution: '-I /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxx/test/benchmarks/google-benchmark/include -L /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxx/test/benchmarks/google-benchmark/lib -L /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/libcxx/test/benchmarks/google-benchmark/lib64 -l benchmark'
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{exec} substitution: '%{executor} --execdir %T -- '
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) All available features: add-latomic-workaround, buildhost=linux, c++26, c++experimental, can-create-symlinks, character-conversion-warnings, clang, clang-21, clang-21.0, clang-21.0.0, diagnose-if-support, enable-benchmarks=no, gcc-style-warnings, glibc-old-ru_RU-decimal-point, has-1024-bit-atomics, has-64-bit-atomics, has-fblocks, has-fconstexpr-steps, has-unix-headers, large_tests, libcpp-abi-version=1, libcpp-hardening-mode=none, libcpp-has-no-availability-markup, libcpp-has-thread-api-pthread, linux, locale.en_US.UTF-8, msan, objective-c++, optimization=none, sanitizer-new-delete, std-at-least-c++03, std-at-least-c++11, std-at-least-c++14, std-at-least-c++17, std-at-least-c++20, std-at-least-c++23, std-at-least-c++26, stdlib=libc++, stdlib=llvm-libc++, target=x86_64-unknown-linux-gnu, thread-safety, verify-support
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/main.py:73: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1500 seconds was requested on the command line. Forcing timeout to be 1500 seconds.
-- Testing: 10819 of 10833 tests, 88 workers --

command timed out: 1200 seconds without output running [b'python', b'../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=2544.479359
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…itwise operation (llvm#137322)

Currently, LLVM fails to convert certain pblendvb intrinsics into select
instructions when the blend mask is derived from complex boolean logic
operations. This occurs even when the mask is ultimately based on
sign-extended comparison results, preventing further optimization
opportunities.

Fixes llvm#66513

---------

Co-authored-by: Simon Pilgrim <[email protected]>
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…itwise operation (llvm#137322)

Currently, LLVM fails to convert certain pblendvb intrinsics into select
instructions when the blend mask is derived from complex boolean logic
operations. This occurs even when the mask is ultimately based on
sign-extended comparison results, preventing further optimization
opportunities.

Fixes llvm#66513

---------

Co-authored-by: Simon Pilgrim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VectorCombine][X86] Failure to replace @llvm.x86.sse41.pblendvb with select
4 participants