Skip to content

Conversation

@tonykuttai
Copy link
Contributor

@tonykuttai tonykuttai commented Sep 9, 2025

Optimize BUILD_VECTOR having special quadword patterns

This change optimizes BUILD_VECTOR operations by using the lxvkq or xxpltib + vsrq instructions to inline constants matching specific 128-bit patterns:

  • MSB set pattern: 0x8000_0000_0000_0000_0000_0000_0000_0000
  • LSB set pattern: 0x0000_0000_0000_0000_0000_0000_0000_0001

Implementation Details

The lxvkq instruction loads special quadword values into VSX registers:

lxvkq XT, UIM
# When UIM=16: loads 0x8000_0000_0000_0000_0000_0000_0000_0000

The optimization reconstructs the 128-bit register pattern from BUILD_VECTOR operands, accounting for target endianness. For example, the MSB pattern can be represented as:

  • Big-Endian: <i64 -9223372036854775808, i64 0>
  • Little-Endian: <i64 0, i64 -9223372036854775808>

Both produce the same register value: 0x8000_0000_0000_0000_0000_0000_0000_0000

MSB Pattern (0x8000...0000)

All vector types (v2i64, v4i32, v8i16, v16i8) generate:

lxvkq v2, 16

LSB Pattern (0x0000...0001)

All vector types generate:

xxspltib v2, 255
vsrq v2, v2, v2

@tonykuttai tonykuttai force-pushed the tvarghese/vectorconst branch 5 times, most recently from ee47d25 to 85a8234 Compare September 9, 2025 18:48
@tonykuttai tonykuttai marked this pull request as ready for review September 10, 2025 03:03
@llvmbot
Copy link
Member

llvmbot commented Sep 10, 2025

@llvm/pr-subscribers-backend-powerpc

Author: Tony Varghese (tonykuttai)

Changes

This change makes use of lxvkq instruction to inline the build vector constants that have the following patterns.

0x8000_0000_0000_0000_0000_0000_0000_0000 (MSB set pattern)
0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern)

lxvkq loads special quadword values into vsx vectors.

Load VSX Vector Special Value Quadword X-form
lxvkq XT, UIM
if UIM=0b10000 then VSR[32×TX+T] ← 0x8000_0000_0000_0000_0000_0000_0000_0000 /* QP -0.0 */

Note:
lxvkq with UIM=16 always produces 0x8000_0000_0000_0000_0000_0000_0000_0000 in the VSX register.
• When we see a BUILD_VECTOR, we need to determine what 128-bit register pattern would produce that vector under the current target endianness.
For example The following build vectors give rise to the same MSB register pattern:
• Big-Endian: &lt;i64 -9223372036854775808, i64 0&gt; → 0x8000_0000_0000_0000_0000_0000_0000_0000
• Little-Endian: &lt;i64 0, i64 -9223372036854775808&gt; → 0x8000_0000_0000_0000_0000_0000_0000_0000
- This 128-bit value represents what's in the VSX register, not memory layout.

For emitting the pattern 0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern), a combination of xxspltib and vsrq is used.

Following table will provide the Code generated for MSB and LSB Patterns based on Endianess:

For the MSB Pattern: 0x8000_0000_0000_0000_0000_0000_0000_0000

Vector Type Big-Endian vector Little-Endian vector Code Generated
&lt;2 x i64&gt; &lt;-9223372036854775808, 0&gt; &lt;0, -9223372036854775808&gt; lxvkq v2, 16
&lt;4 x i32&gt; &lt;-2147483648, 0, 0, 0&gt; &lt;0, 0, 0, -2147483648&gt; lxvkq v2, 16
&lt;8 x i16&gt; &lt;-32768, 0, 0, 0, 0, 0, 0, 0&gt; &lt;0, 0, 0, 0, 0, 0, 0, -32768&gt; lxvkq v2, 16
&lt;16 x i8&gt; &lt;-128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0&gt; &lt;0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -128&gt; lxvkq v2, 16

For the LSB Pattern: 0x0000_0000_0000_0000_0000_0000_0000_0001

Vector Type Big-Endian Vector Little-Endian Vector Code Generated
&lt;2 x i64&gt; &lt;0, 1&gt; &lt;1, 0&gt; xxspltib v2, 255<br>vsrq v2, v2, v2
&lt;4 x i32&gt; &lt;0, 0, 0, 1&gt; &lt;1, 0, 0, 0&gt; xxspltib v2, 255<br>vsrq v2, v2, v2
&lt;8 x i16&gt; &lt;0, 0, 0, 0, 0, 0, 0, 1&gt; &lt;1, 0, 0, 0, 0, 0, 0, 0&gt; xxspltib v2, 255<br>vsrq v2, v2, v2
&lt;16 x i8&gt; &lt;0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1&gt; &lt;1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0&gt; xxspltib v2, 255<br>vsrq v2, v2, v2

Patch is 23.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157625.diff

4 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+134)
  • (modified) llvm/lib/Target/PowerPC/PPCISelLowering.h (+3)
  • (added) llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll (+307)
  • (modified) llvm/test/CodeGen/PowerPC/vector-reduce-add.ll (+8-9)
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index fa104e4f69d7f..c76347d48bc62 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -9679,6 +9679,13 @@ SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,
   BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(Op.getNode());
   assert(BVN && "Expected a BuildVectorSDNode in LowerBUILD_VECTOR");
 
+  // Recognize build vector patterns to emit VSX vector instructions
+  // instead of loading value from memory.
+  if (Subtarget.isISA3_1() && Subtarget.hasVSX()) {
+    if (SDValue VecPat = combineBVLoadsSpecialValue(Op, DAG))
+      return VecPat;
+  }
+
   if (Subtarget.hasP10Vector()) {
     APInt BitMask(32, 0);
     // If the value of the vector is all zeros or all ones,
@@ -15657,6 +15664,133 @@ combineElementTruncationToVectorTruncation(SDNode *N,
   return SDValue();
 }
 
+// LXVKQ instruction load VSX vector with a special quadword value
+// based on an immediate value. This helper method returns the details of the
+// match as a tuple of {LXVKQ unsigned IMM Value, right_shift_amount}
+// to help generate the LXVKQ instruction and the subsequent shift instruction
+// required to match the original build vector pattern.
+
+// LXVKQPattern: {LXVKQ unsigned IMM Value, right_shift_amount}
+using LXVKQPattern = std::tuple<uint32_t, uint8_t>;
+
+static std::optional<LXVKQPattern> getPatternInfo(const APInt &FullVal) {
+
+  static const auto BaseLXVKQPatterns = []() {
+    // LXVKQ instruction loads the Quadword value:
+    // 0x8000_0000_0000_0000_0000_0000_0000_0000 when imm = 0b10000
+    return std::array<std::pair<APInt, uint32_t>, 1>{
+        {{APInt(128, 0x8000000000000000ULL) << 64, 16}}};
+  }();
+
+  // Check for direct LXVKQ match (no shift needed)
+  for (const auto &[BasePattern, Uim] : BaseLXVKQPatterns) {
+    if (FullVal == BasePattern)
+      return std::make_tuple(Uim, uint8_t{0});
+  }
+
+  // Check if FullValue can be generated by (right) shifting a base pattern
+  for (const auto &[BasePattern, Uim] : BaseLXVKQPatterns) {
+    if (BasePattern.lshr(127) == FullVal)
+      return std::make_tuple(Uim, uint8_t{127});
+  }
+
+  return std::nullopt;
+}
+
+/// Combine vector loads to a single load by recognising patterns in the Build
+/// Vector. LXVKQ instruction load VSX vector with a special quadword value
+/// based on an immediate value.
+SDValue PPCTargetLowering::combineBVLoadsSpecialValue(SDValue Op,
+                                                      SelectionDAG &DAG) const {
+
+  assert((Op.getNode() && Op.getOpcode() == ISD::BUILD_VECTOR) &&
+         "Expected a BuildVectorSDNode in combineBVLoadsSpecialValue");
+
+  // This transformation is only supported if we are loading either a byte,
+  // halfword, word, or doubleword.
+  EVT VT = Op.getValueType();
+  if (!(VT == MVT::v8i16 || VT == MVT::v16i8 || VT == MVT::v4i32 ||
+        VT == MVT::v2i64))
+    return SDValue();
+
+  LLVM_DEBUG(llvm::dbgs() << "\ncombineBVLoadsSpecialValue: Build vector ("
+                          << VT.getEVTString() << "): ";
+             Op->dump());
+
+  unsigned NumElems = VT.getVectorNumElements();
+  unsigned ElemBits = VT.getScalarSizeInBits();
+
+  bool IsLittleEndian = DAG.getDataLayout().isLittleEndian();
+
+  // Check for Non-constant operand in the build vector.
+  for (const SDValue &Operand : Op.getNode()->op_values()) {
+    if (!isa<ConstantSDNode>(Operand))
+      return SDValue();
+  }
+
+  // Assemble build vector operands as a 128-bit register value
+  // We need to reconstruct what the 128-bit register pattern would be
+  // that produces this vector when interpreted with the current endianness
+  APInt FullVal = APInt::getZero(128);
+
+  for (unsigned Index = 0; Index < NumElems; ++Index) {
+    auto *C = cast<ConstantSDNode>(Op.getOperand(Index));
+
+    // Get element value as raw bits (zero-extended)
+    uint64_t ElemValue = C->getZExtValue();
+
+    // Mask to element size to ensure we only get the relevant bits
+    if (ElemBits < 64)
+      ElemValue &= ((1ULL << ElemBits) - 1);
+
+    // Calculate bit position for this element in the 128-bit register
+    unsigned BitPos =
+        (IsLittleEndian) ? (Index * ElemBits) : (128 - (Index + 1) * ElemBits);
+
+    // Create APInt for the element value and shift it to correct position
+    APInt ElemAPInt(128, ElemValue);
+    ElemAPInt <<= BitPos;
+
+    // Place the element value at the correct bit position
+    FullVal |= ElemAPInt;
+  }
+
+  if (auto UIMOpt = getPatternInfo(FullVal)) {
+    const auto &[Uim, ShiftAmount] = *UIMOpt;
+    SDLoc Dl(Op);
+
+    // Generate LXVKQ instruction if the shift amount is zero.
+    if (ShiftAmount == 0) {
+      SDValue UimVal = DAG.getTargetConstant(Uim, Dl, MVT::i32);
+      SDValue LxvkqInstr =
+          SDValue(DAG.getMachineNode(PPC::LXVKQ, Dl, VT, UimVal), 0);
+      LLVM_DEBUG(llvm::dbgs()
+                     << "combineBVLoadsSpecialValue: Instruction Emitted ";
+                 LxvkqInstr.dump());
+      return LxvkqInstr;
+    }
+
+    // The right shifted pattern can be constructed using a combination of
+    // XXSPLITIB and VSRQ instruction. VSRQ uses the shift amount from the lower
+    // 7 bits of byte 15. This can be specified using XXSPLITIB with immediate
+    // value 255.
+    SDValue ShiftAmountVec =
+        SDValue(DAG.getMachineNode(PPC::XXSPLTIB, Dl, MVT::v4i32,
+                                   DAG.getTargetConstant(255, Dl, MVT::i32)),
+                0);
+    // Generate appropriate right shift instruction
+    SDValue ShiftVec = SDValue(
+        DAG.getMachineNode(PPC::VSRQ, Dl, VT, ShiftAmountVec, ShiftAmountVec),
+        0);
+    LLVM_DEBUG(llvm::dbgs()
+                   << "\n combineBVLoadsSpecialValue: Instruction Emitted ";
+               ShiftVec.dump());
+    return ShiftVec;
+  }
+  // No patterns matched for build vectors.
+  return SDValue();
+}
+
 /// Reduce the number of loads when building a vector.
 ///
 /// Building a vector out of multiple loads can be converted to a load
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 669430550f4e6..97382cd8f613c 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1471,6 +1471,9 @@ namespace llvm {
     combineElementTruncationToVectorTruncation(SDNode *N,
                                                DAGCombinerInfo &DCI) const;
 
+    SDValue combineBVLoadsSpecialValue(SDValue Operand,
+                                       SelectionDAG &DAG) const;
+
     /// lowerToVINSERTH - Return the SDValue if this VECTOR_SHUFFLE can be
     /// handled by the VINSERTH instruction introduced in ISA 3.0. This is
     /// essentially any shuffle of v8i16 vectors that just inserts one element
diff --git a/llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll b/llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll
new file mode 100644
index 0000000000000..0ee4524a6c68a
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll
@@ -0,0 +1,307 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr10 -mtriple=powerpc64le-unknown-unknown \
+; RUN:   -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC64-LE-10
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr10 -mtriple=powerpc64-unknown-unknown \
+; RUN:   -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC64-BE-10
+
+; Test LXVKQ instruction generation for special vector constants matching 128 bit patterns:
+; 0x8000_0000_0000_0000_0000_0000_0000_0000 (MSB set pattern)
+; 0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern)
+
+; =============================================================================
+; v2i64 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-9223372036854775808, 0>
+define dso_local noundef <2 x i64> @test_v2i64_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI0_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 -9223372036854775808, i64 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, -9223372036854775808>
+define dso_local noundef <2 x i64> @test_v2i64_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI1_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI1_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 0, i64 -9223372036854775808>
+}
+
+; =============================================================================
+; v4i32 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-2147483648, 0, 0, 0>
+define dso_local noundef <4 x i32> @test_v4i32_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI2_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 -2147483648, i32 0, i32 0, i32 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, 0, 0, -2147483648>
+define dso_local noundef <4 x i32> @test_v4i32_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI3_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI3_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 0, i32 0, i32 0, i32 -2147483648>
+}
+
+; =============================================================================
+; v8i16 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-32768, 0, 0, 0, 0, 0, 0, 0>
+define dso_local noundef <8 x i16> @test_v8i16_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI4_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 -32768, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, 0, 0, 0, 0, 0, 0, -32768>
+define dso_local noundef <8 x i16> @test_v8i16_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI5_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI5_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 -32768>
+}
+
+; =============================================================================
+; v16i8 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
+define dso_local noundef <16 x i8> @test_v16i8_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v16i8_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI6_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v16i8_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <16 x i8> <i8 -128, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -128>
+define dso_local noundef <16 x i8> @test_v16i8_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v16i8_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v16i8_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI7_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI7_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <16 x i8> <i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 -128>
+}
+
+; =============================================================================
+; v2i64 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 1>
+define dso_local noundef <2 x i64> @test_v2i64_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI8_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 0, i64 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0>
+define dso_local noundef <2 x i64> @test_v2i64_lsb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_lsb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-LE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_lsb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI9_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI9_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 1, i64 0>
+}
+
+; =============================================================================
+; v4i32 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 0, 0, 1>
+define dso_local noundef <4 x i32> @test_v4i32_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI10_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 0, i32 0, i32 0, i32 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0, 0, 0>
+define dso_local noundef <4 x i32> @test_v4i32_lsb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_lsb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-LE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_lsb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI11_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI11_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 1, i32 0, i32 0, i32 0>
+}
+
+; =============================================================================
+; v8i16 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 0, 0, 0, 0, 0, 0, 1>
+define dso_local noundef <8 x i16> @test_v8i16_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI12_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0, 0, 0, 0, 0, 0, 0>
+define dso_local noundef <8 x i16> @test_v8i16_lsb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_lsb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-LE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_lsb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI13_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI13_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 1, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>
+}
+
+; =============================================================================
+; v16i8 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1>
+define dso_local noundef <16 x i8> @test_v16i8_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v16i8_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI14_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v16i8_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <16 x i8> <i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
[truncated]

@tonykuttai tonykuttai closed this Sep 19, 2025
@tonykuttai tonykuttai deleted the tvarghese/vectorconst branch September 19, 2025 07:12
@tonykuttai tonykuttai restored the tvarghese/vectorconst branch September 19, 2025 07:12
@tonykuttai tonykuttai reopened this Sep 19, 2025
@lei137
Copy link
Contributor

lei137 commented Sep 19, 2025

The commit message for this PR need to be updated to summarize what's done and details added as a comment to this PR instead. I don't think the charts will show up nicely in the git log as well.

@tonykuttai
Copy link
Contributor Author

This change makes use of lxvkq instruction to inline the build vector constants that have the following patterns.

0x8000_0000_0000_0000_0000_0000_0000_0000 (MSB set pattern)
0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern)

lxvkq loads special quadword values into vsx vectors.

Load VSX Vector Special Value Quadword X-form
lxvkq XT, UIM
if UIM=0b10000 then VSR[32×TX+T] ← 0x8000_0000_0000_0000_0000_0000_0000_0000 /* QP -0.0 */

Note:
lxvkq with UIM=16 always produces 0x8000_0000_0000_0000_0000_0000_0000_0000 in the VSX register.
• When we see a BUILD_VECTOR, we need to determine what 128-bit register pattern would produce that vector under the current target endianness.
For example The following build vectors give rise to the same MSB register pattern:
• Big-Endian: <i64 -9223372036854775808, i64 0> → 0x8000_0000_0000_0000_0000_0000_0000_0000
• Little-Endian: <i64 0, i64 -9223372036854775808> → 0x8000_0000_0000_0000_0000_0000_0000_0000
- This 128-bit value represents what's in the VSX register, not memory layout.

For emitting the pattern 0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern), a combination of xxspltib and vsrq is used.

Following table will provide the Code generated for MSB and LSB Patterns based on Endianess:

For the MSB Pattern: 0x8000_0000_0000_0000_0000_0000_0000_0000

Vector Type Big-Endian vector Little-Endian vector Code Generated
<2 x i64> <-9223372036854775808, 0> <0, -9223372036854775808> lxvkq v2, 16
<4 x i32> <-2147483648, 0, 0, 0> <0, 0, 0, -2147483648> lxvkq v2, 16
<8 x i16> <-32768, 0, 0, 0, 0, 0, 0, 0> <0, 0, 0, 0, 0, 0, 0, -32768> lxvkq v2, 16
<16 x i8> <-128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -128> lxvkq v2, 16

For the LSB Pattern: 0x0000_0000_0000_0000_0000_0000_0000_0001

Vector Type Big-Endian Vector Little-Endian Vector Code Generated
<2 x i64> <0, 1> <1, 0> xxspltib v2, 255
vsrq v2, v2, v2
<4 x i32> <0, 0, 0, 1> <1, 0, 0, 0> xxspltib v2, 255
vsrq v2, v2, v2
<8 x i16> <0, 0, 0, 0, 0, 0, 0, 1> <1, 0, 0, 0, 0, 0, 0, 0> xxspltib v2, 255
vsrq v2, v2, v2
<16 x i8> <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1> <1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> xxspltib v2, 255
vsrq v2, v2, v2

@tonykuttai tonykuttai force-pushed the tvarghese/vectorconst branch from 1e1290d to 7e187da Compare October 14, 2025 03:04
@tonykuttai tonykuttai requested a review from RolandF77 October 14, 2025 03:04
Copy link
Collaborator

@RolandF77 RolandF77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tonykuttai tonykuttai merged commit 60ee515 into llvm:main Oct 15, 2025
10 checks passed
@tonykuttai tonykuttai deleted the tvarghese/vectorconst branch November 14, 2025 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants