[X86] SimplifyDemandedBitsForTargetNode - add handling for VPMADD52L/VPMADD52H #155494

XChy · 2025-08-26T20:38:37Z

Resolves #155387.
The X86ISD::VPMADD52L/VPMADD52H nodes only demand the lower 52 bits of operands 0 / 1.

llvmbot · 2025-08-26T20:39:17Z

@llvm/pr-subscribers-backend-x86

Author: XChy (XChy)

Changes

Resolves #155387.
The X86ISD::VPMADD52L/VPMADD52H nodes only demand the lower 52 bits of operands 0 / 1.

Full diff: https://github.com/llvm/llvm-project/pull/155494.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+30)
(added) llvm/test/CodeGen/X86/vpmadd.ll (+54)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 19131fbd4102b..35f9256bb454d 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -44957,6 +44957,22 @@ bool X86TargetLowering::SimplifyDemandedBitsForTargetNode(
     Known.Zero.setLowBits(Known2.countMinTrailingZeros());
     return false;
   }
+  case X86ISD::VPMADD52L:
+  case X86ISD::VPMADD52H: {
+    KnownBits KnownOp0, KnownOp1;
+    SDValue Op0 = Op.getOperand(0);
+    SDValue Op1 = Op.getOperand(1);
+    //  Only demand the lower 52-bits of operands 0 / 1 (and all 64-bits of operand 2).
+    APInt Low52Bits = APInt::getLowBitsSet(BitWidth, 52);
+    if (SimplifyDemandedBits(Op0, Low52Bits, OriginalDemandedElts, KnownOp0, TLO,
+                             Depth + 1))
+      return true;
+
+    if (SimplifyDemandedBits(Op1, Low52Bits, OriginalDemandedElts, KnownOp1, TLO,
+                             Depth + 1))
+      return true;
+    break;
+  }
   }
 
   return TargetLowering::SimplifyDemandedBitsForTargetNode(
@@ -60068,6 +60084,18 @@ static SDValue combineVPMADD(SDNode *N, SelectionDAG &DAG,
   return SDValue();
 }
 
+// Simplify VPMADD52L/VPMADD52H operations.
+static SDValue combineVPMADD52LH(SDNode *N, SelectionDAG &DAG,
+                               TargetLowering::DAGCombinerInfo &DCI) {
+  MVT VT = N->getSimpleValueType(0);
+  unsigned NumEltBits = VT.getScalarSizeInBits();
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  if (TLI.SimplifyDemandedBits(SDValue(N, 0), APInt::getAllOnes(NumEltBits), DCI))
+    return SDValue(N, 0);
+
+  return SDValue();
+}
+
 static SDValue combineEXTEND_VECTOR_INREG(SDNode *N, SelectionDAG &DAG,
                                           TargetLowering::DAGCombinerInfo &DCI,
                                           const X86Subtarget &Subtarget) {
@@ -60705,6 +60733,8 @@ SDValue X86TargetLowering::PerformDAGCombine(SDNode *N,
   case X86ISD::PMULUDQ:     return combinePMULDQ(N, DAG, DCI, Subtarget);
   case X86ISD::VPMADDUBSW:
   case X86ISD::VPMADDWD:    return combineVPMADD(N, DAG, DCI);
+  case X86ISD::VPMADD52L:
+  case X86ISD::VPMADD52H:    return combineVPMADD52LH(N, DAG, DCI);
   case X86ISD::KSHIFTL:
   case X86ISD::KSHIFTR:     return combineKSHIFT(N, DAG, DCI);
   case ISD::FP16_TO_FP:     return combineFP16_TO_FP(N, DAG, Subtarget);
diff --git a/llvm/test/CodeGen/X86/vpmadd.ll b/llvm/test/CodeGen/X86/vpmadd.ll
new file mode 100644
index 0000000000000..21027190ef318
--- /dev/null
+++ b/llvm/test/CodeGen/X86/vpmadd.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512ifma,+avx512vl | FileCheck %s --check-prefixes=CHECK,X86
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512ifma,+avx512vl | FileCheck %s --check-prefixes=CHECK,X64
+
+define <2 x i64> @test_vpmadd52l(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2) {
+; CHECK-LABEL: test_vpmadd52l:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    vpmadd52luq %xmm1, %xmm1, %xmm0
+; CHECK-NEXT:    ret{{[l|q]}}
+  %shl1 = shl <2 x i64> %x1, <i64 52, i64 52>
+  %shl2 = shl <2 x i64> %x2, <i64 52, i64 52>
+  %1 = call <2 x i64> @llvm.x86.avx512.vpmadd52l.uq.128(<2 x i64> %x0, <2 x i64> %shl1, <2 x i64> %shl2)
+  ret <2 x i64> %1
+}
+
+define <2 x i64> @test_vpmadd52l_wrong_shift(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2) {
+; CHECK-LABEL: test_vpmadd52l_wrong_shift:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpsllq $51, %xmm1, %xmm1
+; CHECK-NEXT:    vpsllq $51, %xmm2, %xmm2
+; CHECK-NEXT:    vpmadd52luq %xmm2, %xmm1, %xmm0
+; CHECK-NEXT:    ret{{[l|q]}}
+  %shl1 = shl <2 x i64> %x1, <i64 51, i64 51>
+  %shl2 = shl <2 x i64> %x2, <i64 51, i64 51>
+  %1 = call <2 x i64> @llvm.x86.avx512.vpmadd52l.uq.128(<2 x i64> %x0, <2 x i64> %shl1, <2 x i64> %shl2)
+  ret <2 x i64> %1
+}
+
+define <2 x i64> @test_vpmadd52l_wrong_op(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2) {
+; CHECK-LABEL: test_vpmadd52l_wrong_op:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpsllq $52, %xmm0, %xmm0
+; CHECK-NEXT:    vpmadd52luq %xmm2, %xmm1, %xmm0
+; CHECK-NEXT:    ret{{[l|q]}}
+  %shl0 = shl <2 x i64> %x0, <i64 52, i64 52>
+  %1 = call <2 x i64> @llvm.x86.avx512.vpmadd52l.uq.128(<2 x i64> %shl0, <2 x i64> %x1, <2 x i64> %x2)
+  ret <2 x i64> %1
+}
+
+define <2 x i64> @test_vpmadd52h(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2) {
+; CHECK-LABEL: test_vpmadd52h:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    vpmadd52huq %xmm1, %xmm1, %xmm0
+; CHECK-NEXT:    ret{{[l|q]}}
+  %shl1 = shl <2 x i64> %x1, <i64 52, i64 52>
+  %shl2 = shl <2 x i64> %x2, <i64 52, i64 52>
+  %1 = call <2 x i64> @llvm.x86.avx512.vpmadd52h.uq.128(<2 x i64> %x0, <2 x i64> %shl1, <2 x i64> %shl2)
+  ret <2 x i64> %1
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; X64: {{.*}}
+; X86: {{.*}}

github-actions · 2025-08-26T20:41:09Z

✅ With the latest revision this PR passed the C/C++ code formatter.

RKSimon

A few minors

llvm/test/CodeGen/X86/vpmadd.ll

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/vpmadd.ll

llvm/test/CodeGen/X86/combine-vpmadd52.ll

…VPMADD52H

RKSimon

LGTM - cheers

XChy · 2025-08-27T10:47:23Z

Thanks for your review. I would like to work on the constant fold of VPMADD52L/VPMADD52H. Can you assign it to me? Or I post an issue myself.

RKSimon · 2025-08-27T10:57:39Z

@houngkoungting is currently working on #155386

Building on this - please can you handle the vpmadd52(x,0,y) -> y fold in SimplifyDemandedBitsForTargetNode? If the lower 52bits of either multiplicand are known zero.

XChy · 2025-08-27T11:00:01Z

Building on this - please can you handle the vpmadd52(x,0,y) -> y fold in SimplifyDemandedBitsForTargetNode? If the lower 52bits of either multiplicand are known zero.

Sure, I will work on it after merging the knownbits patch.

Resolves comment in #155494 (comment)

Resolves comment in llvm/llvm-project#155494 (comment)

XChy requested review from KanRobert, RKSimon and phoebewang August 26, 2025 20:38

llvmbot added the backend:X86 label Aug 26, 2025

RKSimon reviewed Aug 26, 2025

View reviewed changes

llvm/test/CodeGen/X86/vpmadd.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/X86/vpmadd.ll Outdated Show resolved Hide resolved

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

phoebewang reviewed Aug 27, 2025

View reviewed changes

llvm/test/CodeGen/X86/vpmadd.ll Outdated Show resolved Hide resolved

XChy force-pushed the perf/simplify-VPMADD52 branch from 8fbcb6b to f57cb12 Compare August 27, 2025 05:04

RKSimon reviewed Aug 27, 2025

View reviewed changes

llvm/test/CodeGen/X86/combine-vpmadd52.ll Outdated Show resolved Hide resolved

XChy added 2 commits August 27, 2025 18:11

[X86][NFC] Add tests for pr155387

f7b0223

[X86] SimplifyDemandedBitsForTargetNode - add handling for VPMADD52L/…

b258fd6

…VPMADD52H

XChy force-pushed the perf/simplify-VPMADD52 branch from f57cb12 to b258fd6 Compare August 27, 2025 10:13

RKSimon approved these changes Aug 27, 2025

View reviewed changes

RKSimon merged commit 8687ef7 into llvm:main Aug 27, 2025
9 checks passed

XChy mentioned this pull request Aug 29, 2025

[X86] Fold vpmadd52h/l for pattern X * 0 + Y --> Y #156086

Merged

XChy added a commit that referenced this pull request Aug 30, 2025

[X86] Fold vpmadd52h/l for pattern X * 0 + Y --> Y (#156086)

5a33bc5

Resolves comment in #155494 (comment)

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 30, 2025

Automerge: [X86] Fold vpmadd52h/l for pattern X * 0 + Y --> Y (#156086)

082d725

Resolves comment in llvm/llvm-project#155494 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] SimplifyDemandedBitsForTargetNode - add handling for VPMADD52L/VPMADD52H #155494

[X86] SimplifyDemandedBitsForTargetNode - add handling for VPMADD52L/VPMADD52H #155494

Uh oh!

XChy commented Aug 26, 2025

Uh oh!

llvmbot commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025 •

edited

Loading

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RKSimon left a comment

Uh oh!

XChy commented Aug 27, 2025

Uh oh!

Uh oh!

RKSimon commented Aug 27, 2025

Uh oh!

XChy commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[X86] SimplifyDemandedBitsForTargetNode - add handling for VPMADD52L/VPMADD52H #155494

[X86] SimplifyDemandedBitsForTargetNode - add handling for VPMADD52L/VPMADD52H #155494

Uh oh!

Conversation

XChy commented Aug 26, 2025

Uh oh!

llvmbot commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

XChy commented Aug 27, 2025

Uh oh!

Uh oh!

RKSimon commented Aug 27, 2025

Uh oh!

XChy commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Aug 26, 2025 •

edited

Loading