[SDAG] Only apply sincos/frexp stack slot folding to IR pointers #115346

MacDue · 2024-11-07T17:20:02Z

If the store is to a pointer that does not exist in the IR (i.e. it was materialized for ABI reasons), it may be part of the setup for call (e.g. setting up arguments on the stack). In this case, folding the store into the library call as part of the expansion could result in nested call sequences (and incorrect codegen).

Fixes #115323

zmodem · 2024-11-07T19:38:55Z

Confirmed that this fixes the problem we hit in #115323. I'm not familiar enough with this part of SelectionDAG to say whether it's the Right Fix(tm) though.

Fixes llvm#115323

llvmbot · 2024-11-07T20:17:53Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Benjamin Maxwell (MacDue)

Changes

If the pointer passed to the library call does not exist in the IR (i.e. it was materialized for ABI reasons), passing it to the library call could result in unexpected aliasing/clobbering, as the area of the stack for callee arguments could alias the result stack slots in the parent call frame.

Fixes #115323

Full diff: https://github.com/llvm/llvm-project/pull/115346.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+6-1)
(modified) llvm/test/CodeGen/RISCV/llvm.frexp.ll (+112-80)
(modified) llvm/test/CodeGen/X86/llvm.frexp.ll (+33-12)
(added) llvm/test/CodeGen/X86/sincos-stack-args.ll (+35)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 203e14f6cde3e3..7db226b24e5f2e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2518,7 +2518,12 @@ bool SelectionDAG::expandMultipleResultFPLibCall(
     SDValue StoreValue = ST->getValue();
     unsigned ResNo = StoreValue.getResNo();
     Type *StoreType = StoreValue.getValueType().getTypeForEVT(Ctx);
-    if (CallRetResNo == ResNo || !ST->isSimple() ||
+    // If the pointer value does not come from the IR, it could come from ABI
+    // lowering and may alias with the arguments of the library call if they are
+    // passed via the stack.
+    const Value *PointerValue =
+        dyn_cast_or_null<const Value *>(ST->getPointerInfo().V);
+    if (!PointerValue || CallRetResNo == ResNo || !ST->isSimple() ||
         ST->getAddressSpace() != 0 ||
         ST->getAlign() <
             getDataLayout().getABITypeAlign(StoreType->getScalarType()) ||
diff --git a/llvm/test/CodeGen/RISCV/llvm.frexp.ll b/llvm/test/CodeGen/RISCV/llvm.frexp.ll
index 3f615d23d3eaf6..e85a7118f5ff83 100644
--- a/llvm/test/CodeGen/RISCV/llvm.frexp.ll
+++ b/llvm/test/CodeGen/RISCV/llvm.frexp.ll
@@ -543,42 +543,50 @@ define i32 @test_frexp_f32_i32_only_use_exp(float %a) nounwind {
 define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) nounwind {
 ; RV32IFD-LABEL: test_frexp_v4f32_v4i32:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -48
-; RV32IFD-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs0, 32(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs1, 24(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs2, 16(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs3, 8(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    addi sp, sp, -64
+; RV32IFD-NEXT:    sw ra, 60(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    sw s0, 56(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs0, 48(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs1, 40(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs2, 32(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs3, 24(sp) # 8-byte Folded Spill
 ; RV32IFD-NEXT:    fmv.s fs0, fa3
 ; RV32IFD-NEXT:    fmv.s fs1, fa2
 ; RV32IFD-NEXT:    fmv.s fs2, fa1
 ; RV32IFD-NEXT:    mv s0, a0
-; RV32IFD-NEXT:    addi a0, a0, 16
+; RV32IFD-NEXT:    addi a0, sp, 8
 ; RV32IFD-NEXT:    call frexpf
 ; RV32IFD-NEXT:    fmv.s fs3, fa0
-; RV32IFD-NEXT:    addi a0, s0, 20
+; RV32IFD-NEXT:    addi a0, sp, 12
 ; RV32IFD-NEXT:    fmv.s fa0, fs2
 ; RV32IFD-NEXT:    call frexpf
 ; RV32IFD-NEXT:    fmv.s fs2, fa0
-; RV32IFD-NEXT:    addi a0, s0, 24
+; RV32IFD-NEXT:    addi a0, sp, 16
 ; RV32IFD-NEXT:    fmv.s fa0, fs1
 ; RV32IFD-NEXT:    call frexpf
 ; RV32IFD-NEXT:    fmv.s fs1, fa0
-; RV32IFD-NEXT:    addi a0, s0, 28
+; RV32IFD-NEXT:    addi a0, sp, 20
 ; RV32IFD-NEXT:    fmv.s fa0, fs0
 ; RV32IFD-NEXT:    call frexpf
+; RV32IFD-NEXT:    lw a0, 8(sp)
+; RV32IFD-NEXT:    lw a1, 12(sp)
+; RV32IFD-NEXT:    lw a2, 16(sp)
+; RV32IFD-NEXT:    lw a3, 20(sp)
+; RV32IFD-NEXT:    sw a0, 16(s0)
+; RV32IFD-NEXT:    sw a1, 20(s0)
+; RV32IFD-NEXT:    sw a2, 24(s0)
+; RV32IFD-NEXT:    sw a3, 28(s0)
 ; RV32IFD-NEXT:    fsw fs3, 0(s0)
 ; RV32IFD-NEXT:    fsw fs2, 4(s0)
 ; RV32IFD-NEXT:    fsw fs1, 8(s0)
 ; RV32IFD-NEXT:    fsw fa0, 12(s0)
-; RV32IFD-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    fld fs0, 32(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs1, 24(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs2, 16(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs3, 8(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 48
+; RV32IFD-NEXT:    lw ra, 60(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    lw s0, 56(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    fld fs0, 48(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs1, 40(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs2, 32(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs3, 24(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    addi sp, sp, 64
 ; RV32IFD-NEXT:    ret
 ;
 ; RV64IFD-LABEL: test_frexp_v4f32_v4i32:
@@ -631,44 +639,52 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) nounwi
 ;
 ; RV32IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -32
-; RV32IZFINXZDINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s3, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s4, 8(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, -48
+; RV32IZFINXZDINX-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s1, 36(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s2, 32(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s3, 28(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s4, 24(sp) # 4-byte Folded Spill
 ; RV32IZFINXZDINX-NEXT:    mv s0, a4
 ; RV32IZFINXZDINX-NEXT:    mv s1, a3
 ; RV32IZFINXZDINX-NEXT:    mv s2, a2
 ; RV32IZFINXZDINX-NEXT:    mv a2, a1
 ; RV32IZFINXZDINX-NEXT:    mv s3, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, a0, 16
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 8
 ; RV32IZFINXZDINX-NEXT:    mv a0, a2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
 ; RV32IZFINXZDINX-NEXT:    mv s4, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 20
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 12
 ; RV32IZFINXZDINX-NEXT:    mv a0, s2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
 ; RV32IZFINXZDINX-NEXT:    mv s2, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 24
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 16
 ; RV32IZFINXZDINX-NEXT:    mv a0, s1
 ; RV32IZFINXZDINX-NEXT:    call frexpf
 ; RV32IZFINXZDINX-NEXT:    mv s1, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 28
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 20
 ; RV32IZFINXZDINX-NEXT:    mv a0, s0
 ; RV32IZFINXZDINX-NEXT:    call frexpf
+; RV32IZFINXZDINX-NEXT:    lw a1, 8(sp)
+; RV32IZFINXZDINX-NEXT:    lw a2, 12(sp)
+; RV32IZFINXZDINX-NEXT:    lw a3, 16(sp)
+; RV32IZFINXZDINX-NEXT:    lw a4, 20(sp)
+; RV32IZFINXZDINX-NEXT:    sw a1, 16(s3)
+; RV32IZFINXZDINX-NEXT:    sw a2, 20(s3)
+; RV32IZFINXZDINX-NEXT:    sw a3, 24(s3)
+; RV32IZFINXZDINX-NEXT:    sw a4, 28(s3)
 ; RV32IZFINXZDINX-NEXT:    sw s4, 0(s3)
 ; RV32IZFINXZDINX-NEXT:    sw s2, 4(s3)
 ; RV32IZFINXZDINX-NEXT:    sw s1, 8(s3)
 ; RV32IZFINXZDINX-NEXT:    sw a0, 12(s3)
-; RV32IZFINXZDINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s3, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s4, 8(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 32
+; RV32IZFINXZDINX-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s1, 36(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s2, 32(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s3, 28(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s4, 24(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, 48
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32:
@@ -1080,34 +1096,41 @@ define <4 x float> @test_frexp_v4f32_v4i32_only_use_fract(<4 x float> %a) nounwi
 define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) nounwind {
 ; RV32IFD-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -32
-; RV32IFD-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs0, 16(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs1, 8(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs2, 0(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fmv.s fs0, fa2
-; RV32IFD-NEXT:    fmv.s fs1, fa1
-; RV32IFD-NEXT:    fmv.s fs2, fa0
+; RV32IFD-NEXT:    addi sp, sp, -48
+; RV32IFD-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs0, 32(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs1, 24(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs2, 16(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fmv.s fs0, fa3
+; RV32IFD-NEXT:    fmv.s fs1, fa2
+; RV32IFD-NEXT:    fmv.s fs2, fa1
 ; RV32IFD-NEXT:    mv s0, a0
-; RV32IFD-NEXT:    addi a0, a0, 12
-; RV32IFD-NEXT:    fmv.s fa0, fa3
+; RV32IFD-NEXT:    mv a0, sp
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    addi a0, s0, 8
-; RV32IFD-NEXT:    fmv.s fa0, fs0
+; RV32IFD-NEXT:    addi a0, sp, 4
+; RV32IFD-NEXT:    fmv.s fa0, fs2
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    addi a0, s0, 4
+; RV32IFD-NEXT:    addi a0, sp, 8
 ; RV32IFD-NEXT:    fmv.s fa0, fs1
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    fmv.s fa0, fs2
-; RV32IFD-NEXT:    mv a0, s0
+; RV32IFD-NEXT:    addi a0, sp, 12
+; RV32IFD-NEXT:    fmv.s fa0, fs0
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    fld fs0, 16(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs1, 8(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs2, 0(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 32
+; RV32IFD-NEXT:    lw a0, 0(sp)
+; RV32IFD-NEXT:    lw a1, 4(sp)
+; RV32IFD-NEXT:    lw a2, 8(sp)
+; RV32IFD-NEXT:    lw a3, 12(sp)
+; RV32IFD-NEXT:    sw a0, 0(s0)
+; RV32IFD-NEXT:    sw a1, 4(s0)
+; RV32IFD-NEXT:    sw a2, 8(s0)
+; RV32IFD-NEXT:    sw a3, 12(s0)
+; RV32IFD-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    fld fs0, 32(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs1, 24(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs2, 16(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    addi sp, sp, 48
 ; RV32IFD-NEXT:    ret
 ;
 ; RV64IFD-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
@@ -1151,34 +1174,43 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) nounwind {
 ;
 ; RV32IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -32
-; RV32IZFINXZDINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s3, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    mv s0, a3
-; RV32IZFINXZDINX-NEXT:    mv s1, a2
-; RV32IZFINXZDINX-NEXT:    mv s2, a1
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, -48
+; RV32IZFINXZDINX-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s1, 36(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s2, 32(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s3, 28(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    mv s0, a4
+; RV32IZFINXZDINX-NEXT:    mv s1, a3
+; RV32IZFINXZDINX-NEXT:    mv s2, a2
+; RV32IZFINXZDINX-NEXT:    mv a2, a1
 ; RV32IZFINXZDINX-NEXT:    mv s3, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, a0, 12
-; RV32IZFINXZDINX-NEXT:    mv a0, a4
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 12
+; RV32IZFINXZDINX-NEXT:    mv a0, a2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 8
-; RV32IZFINXZDINX-NEXT:    mv a0, s0
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 16
+; RV32IZFINXZDINX-NEXT:    mv a0, s2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 4
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 20
 ; RV32IZFINXZDINX-NEXT:    mv a0, s1
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    mv a0, s2
-; RV32IZFINXZDINX-NEXT:    mv a1, s3
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 24
+; RV32IZFINXZDINX-NEXT:    mv a0, s0
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s3, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 32
+; RV32IZFINXZDINX-NEXT:    lw a0, 12(sp)
+; RV32IZFINXZDINX-NEXT:    lw a1, 16(sp)
+; RV32IZFINXZDINX-NEXT:    lw a2, 20(sp)
+; RV32IZFINXZDINX-NEXT:    lw a3, 24(sp)
+; RV32IZFINXZDINX-NEXT:    sw a0, 0(s3)
+; RV32IZFINXZDINX-NEXT:    sw a1, 4(s3)
+; RV32IZFINXZDINX-NEXT:    sw a2, 8(s3)
+; RV32IZFINXZDINX-NEXT:    sw a3, 12(s3)
+; RV32IZFINXZDINX-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s1, 36(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s2, 32(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s3, 28(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, 48
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
diff --git a/llvm/test/CodeGen/X86/llvm.frexp.ll b/llvm/test/CodeGen/X86/llvm.frexp.ll
index 96de34519556d0..cd560ad627de4c 100644
--- a/llvm/test/CodeGen/X86/llvm.frexp.ll
+++ b/llvm/test/CodeGen/X86/llvm.frexp.ll
@@ -325,27 +325,28 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
 ;
 ; WIN32-LABEL: test_frexp_v4f32_v4i32:
 ; WIN32:       # %bb.0:
+; WIN32-NEXT:    pushl %edi
 ; WIN32-NEXT:    pushl %esi
-; WIN32-NEXT:    subl $44, %esp
+; WIN32-NEXT:    subl $60, %esp
 ; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT:    leal 24(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT:    leal 20(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT:    leal 16(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
-; WIN32-NEXT:    leal 28(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
@@ -360,13 +361,22 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edi
+; WIN32-NEXT:    movl %edi, 28(%esi)
+; WIN32-NEXT:    movl %edx, 24(%esi)
+; WIN32-NEXT:    movl %ecx, 20(%esi)
+; WIN32-NEXT:    movl %eax, 16(%esi)
 ; WIN32-NEXT:    fstps 12(%esi)
 ; WIN32-NEXT:    fstps 8(%esi)
 ; WIN32-NEXT:    fstps 4(%esi)
 ; WIN32-NEXT:    fstps (%esi)
 ; WIN32-NEXT:    movl %esi, %eax
-; WIN32-NEXT:    addl $44, %esp
+; WIN32-NEXT:    addl $60, %esp
 ; WIN32-NEXT:    popl %esi
+; WIN32-NEXT:    popl %edi
 ; WIN32-NEXT:    retl
   %result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
   ret { <4 x float>, <4 x i32> } %result
@@ -489,35 +499,46 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) {
 ;
 ; WIN32-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
 ; WIN32:       # %bb.0:
+; WIN32-NEXT:    pushl %edi
 ; WIN32-NEXT:    pushl %esi
-; WIN32-NEXT:    subl $12, %esp
+; WIN32-NEXT:    subl $28, %esp
 ; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT:    leal 8(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    leal 4(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    leal 12(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    movl %esi, {{[0-9]+}}(%esp)
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edi
+; WIN32-NEXT:    movl %edi, 12(%esi)
+; WIN32-NEXT:    movl %edx, 8(%esi)
+; WIN32-NEXT:    movl %ecx, 4(%esi)
+; WIN32-NEXT:    movl %eax, (%esi)
 ; WIN32-NEXT:    movl %esi, %eax
-; WIN32-NEXT:    addl $12, %esp
+; WIN32-NEXT:    addl $28, %esp
 ; WIN32-NEXT:    popl %esi
+; WIN32-NEXT:    popl %edi
 ; WIN32-NEXT:    retl
   %result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
   %result.1 = extractvalue { <4 x float>, <4 x i32> } %result, 1
diff --git a/llvm/test/CodeGen/X86/sincos-stack-args.ll b/llvm/test/CodeGen/X86/sincos-stack-args.ll
new file mode 100644
index 00000000000000..9fb3a6769fda11
--- /dev/null
+++ b/llvm/test/CodeGen/X86/sincos-stack-args.ll
@@ -0,0 +1,35 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_sp --version 5
+; RUN: llc < %s -mtriple=i386-unknown-linux-gnu  | FileCheck %s
+; Test for issue https://github.com/llvm/llvm-project/issues/115323
+
+declare double @g(double, double)
+
+define double @f(double %a) {
+; CHECK-LABEL: f:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    subl $44, %esp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    fldl 48(%esp)
+; CHECK-NEXT:    leal 24(%esp), %eax
+; CHECK-NEXT:    movl %eax, 12(%esp)
+; CHECK-NEXT:    leal 32(%esp), %eax
+; CHECK-NEXT:    movl %eax, 8(%esp)
+; CHECK-NEXT:    fstpl (%esp)
+; CHECK-NEXT:    calll sincos
+; CHECK-NEXT:    fldl 32(%esp)
+; CHECK-NEXT:    fldl 24(%esp)
+; CHECK-NEXT:    faddl {{\.?LCPI[0-9]+_[0-9]+}}
+; CHECK-NEXT:    fxch %st(1)
+; CHECK-NEXT:    fstpl 8(%esp)
+; CHECK-NEXT:    fstpl (%esp)
+; CHECK-NEXT:    calll g@PLT
+; CHECK-NEXT:    addl $44, %esp
+; CHECK-NEXT:    .cfi_def_cfa_offset 4
+; CHECK-NEXT:    retl
+entry:
+  %0 = tail call double @llvm.sin.f64(double %a)
+  %1 = tail call double @llvm.cos.f64(double %a)
+  %add = fadd double %1, 3.140000e+00
+  %call = tail call double @g(double %add, double %0)
+  ret double %call
+}

MacDue · 2024-11-07T20:18:15Z

Confirmed that this fixes the problem we hit in #115323. I'm not familiar enough with this part of SelectionDAG to say whether it's the Right Fix(tm) though.

No worries 👍 Thanks for checking!

efriedma-quic · 2024-11-07T22:26:12Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

-    if (CallRetResNo == ResNo || !ST->isSimple() ||
+    // If the pointer value does not come from the IR, it could come from ABI
+    // lowering and may alias with the arguments of the library call if they are
+    // passed via the stack.


If I'm understanding correctly, the issue here is that the store destination could be part of the argument list of another call? The correct way to handle that case should involve callseq_start/callseq_end. Assuming the libcall isn't nested inside another callseq_start/end, there should be a callseq_end and a callseq_start between the libcall and the store. And those nodes should prevent the transform because they "alias" the store.

Blanket ignoring all non-IR values maybe works for this particular testcase, but I don't think it solves aliasing in general.

My understanding (possibly wrong) is you have two calls:

sincos(x, ptr, ptr) g(double, double)

All the arguments are passed on the stack, and the call to sincos is rewritten to be passed the stack locations of the arguments to g directly. The issue is the slack locations of g's arguments alias the stack locations of sincos's arguments, so when sincos writes to one of the pointers it clobbers its own arguments.

The current code is already checking the two pointers for sin and cos don't alias, but it does not expect the pointers could alias its own arguments.

The correct way to handle that case should involve callseq_start/callseq_end. Assuming the libcall isn't nested inside another callseq_start/end, there should be a callseq_end and a callseq_start between the libcall and the store.

It looks like this is the case. Would a potential fix be to check that a CALLSEQ_START does not occur in the chain of the store, or have I misunderstood?

You shouldn't have to special case CALLSEQ_START. This restriction should be implicitly present in the chain

Could you clarify that? Before this expansion you have an frexp or fsincos where the users are stores within the call sequence of another function (g in the previous example). There are no aliasing issues at this point. The issue occurs when the node is replaced with a library call. It takes its input chains from the previous stores, so when expanded you have a call sequence for the node nested inside a call sequence for g 😬. This is where the issues come from as both calls used the same area for stack arguments.

Having thought about the issue with folding stores into sincos while expanding it to a library call, the possible solutions I came up with are:

Check the stores are not within a CALLSEQ_START-CALLSEQ_END pair

As far as I can tell nobody has done this before -- grepping the codebase (and it's possibly a costly graph search)

Disallow stores to pointers that don't exist in the IR

Much simpler -- and I think could solve the issue, but it's assuming a store to an IR pointer won't occur within a CALLSEQ_START - CALLSEQ_END pair

Maybe this is a valid assumption (the stores I've seen have always been to special stack slots for the call), but maybe this could be broken.

Be very conservative and restrict this to trivial cases (e.g. the case where the store input chains are the entry node)

The issue this is trying to avoid is if the stores that are to be folded into sincos are within a CALLSEQ_START-CALLSEQ_END pair, then the expansion will result in nested call sequences, which is where the issues come from.

MacDue · 2024-11-12T11:02:35Z

Any thoughts on this? I think it'd be good to find a solution here 🙂

zmodem · 2024-11-12T16:04:15Z

Any thoughts on this? I think it'd be good to find a solution here 🙂

Until we find the proper solution, maybe we should consider rolling this back, or doing the conservative suggestion 3) above?

MacDue · 2024-11-12T17:07:39Z

Any thoughts on this? I think it'd be good to find a solution here 🙂

Until we find the proper solution, maybe we should consider rolling this back, or doing the conservative suggestion 3) above?

I've created a PR for the conservative check here: #115906

MacDue mentioned this pull request Nov 7, 2024

sincos args may get clobbered after #108401 when passed on the stack #115323

Closed

MacDue force-pushed the sincos_issue branch from 79b50e9 to 86cc854 Compare November 7, 2024 17:44

MacDue changed the title ~~[SDAG] Only apply sincos stack slot folding to IR pointers~~ [SDAG] Only apply sincos/frexp stack slot folding to IR pointers Nov 7, 2024

MacDue requested a review from zmodem November 7, 2024 17:55

MacDue force-pushed the sincos_issue branch from 86cc854 to dc92067 Compare November 7, 2024 18:44

[SDAG] Only apply sincos stack slot folding to IR pointers

89153cf

Fixes llvm#115323

MacDue force-pushed the sincos_issue branch from dc92067 to 89153cf Compare November 7, 2024 20:16

MacDue marked this pull request as ready for review November 7, 2024 20:17

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Nov 7, 2024

MacDue requested review from arsenm, efriedma-quic and sdesmalen-arm November 7, 2024 20:18

efriedma-quic reviewed Nov 7, 2024

View reviewed changes

Update comment

7b86938

MacDue closed this Nov 14, 2024

[SDAG] Only apply sincos/frexp stack slot folding to IR pointers #115346

[SDAG] Only apply sincos/frexp stack slot folding to IR pointers #115346

Uh oh!

Conversation

MacDue commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zmodem commented Nov 7, 2024

Uh oh!

llvmbot commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue commented Nov 7, 2024

Uh oh!

efriedma-quic Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

MacDue commented Nov 12, 2024

Uh oh!

zmodem commented Nov 12, 2024

Uh oh!

MacDue commented Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MacDue commented Nov 7, 2024 •

edited

Loading

llvmbot commented Nov 7, 2024 •

edited

Loading

efriedma-quic Nov 7, 2024 •

edited

Loading

MacDue Nov 7, 2024 •

edited

Loading

MacDue Nov 7, 2024 •

edited

Loading