Skip to content

Conversation

@MacDue
Copy link
Member

@MacDue MacDue commented Nov 7, 2024

If the store is to a pointer that does not exist in the IR (i.e. it was materialized for ABI reasons), it may be part of the setup for call (e.g. setting up arguments on the stack). In this case, folding the store into the library call as part of the expansion could result in nested call sequences (and incorrect codegen).

Fixes #115323

@MacDue MacDue changed the title [SDAG] Only apply sincos stack slot folding to IR pointers [SDAG] Only apply sincos/frexp stack slot folding to IR pointers Nov 7, 2024
@MacDue MacDue requested a review from zmodem November 7, 2024 17:55
@zmodem
Copy link
Collaborator

zmodem commented Nov 7, 2024

Confirmed that this fixes the problem we hit in #115323. I'm not familiar enough with this part of SelectionDAG to say whether it's the Right Fix(tm) though.

@MacDue MacDue marked this pull request as ready for review November 7, 2024 20:17
@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Nov 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2024

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Benjamin Maxwell (MacDue)

Changes

If the pointer passed to the library call does not exist in the IR (i.e. it was materialized for ABI reasons), passing it to the library call could result in unexpected aliasing/clobbering, as the area of the stack for callee arguments could alias the result stack slots in the parent call frame.

Fixes #115323


Full diff: https://github.com/llvm/llvm-project/pull/115346.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+6-1)
  • (modified) llvm/test/CodeGen/RISCV/llvm.frexp.ll (+112-80)
  • (modified) llvm/test/CodeGen/X86/llvm.frexp.ll (+33-12)
  • (added) llvm/test/CodeGen/X86/sincos-stack-args.ll (+35)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 203e14f6cde3e3..7db226b24e5f2e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2518,7 +2518,12 @@ bool SelectionDAG::expandMultipleResultFPLibCall(
     SDValue StoreValue = ST->getValue();
     unsigned ResNo = StoreValue.getResNo();
     Type *StoreType = StoreValue.getValueType().getTypeForEVT(Ctx);
-    if (CallRetResNo == ResNo || !ST->isSimple() ||
+    // If the pointer value does not come from the IR, it could come from ABI
+    // lowering and may alias with the arguments of the library call if they are
+    // passed via the stack.
+    const Value *PointerValue =
+        dyn_cast_or_null<const Value *>(ST->getPointerInfo().V);
+    if (!PointerValue || CallRetResNo == ResNo || !ST->isSimple() ||
         ST->getAddressSpace() != 0 ||
         ST->getAlign() <
             getDataLayout().getABITypeAlign(StoreType->getScalarType()) ||
diff --git a/llvm/test/CodeGen/RISCV/llvm.frexp.ll b/llvm/test/CodeGen/RISCV/llvm.frexp.ll
index 3f615d23d3eaf6..e85a7118f5ff83 100644
--- a/llvm/test/CodeGen/RISCV/llvm.frexp.ll
+++ b/llvm/test/CodeGen/RISCV/llvm.frexp.ll
@@ -543,42 +543,50 @@ define i32 @test_frexp_f32_i32_only_use_exp(float %a) nounwind {
 define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) nounwind {
 ; RV32IFD-LABEL: test_frexp_v4f32_v4i32:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -48
-; RV32IFD-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs0, 32(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs1, 24(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs2, 16(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs3, 8(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    addi sp, sp, -64
+; RV32IFD-NEXT:    sw ra, 60(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    sw s0, 56(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs0, 48(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs1, 40(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs2, 32(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs3, 24(sp) # 8-byte Folded Spill
 ; RV32IFD-NEXT:    fmv.s fs0, fa3
 ; RV32IFD-NEXT:    fmv.s fs1, fa2
 ; RV32IFD-NEXT:    fmv.s fs2, fa1
 ; RV32IFD-NEXT:    mv s0, a0
-; RV32IFD-NEXT:    addi a0, a0, 16
+; RV32IFD-NEXT:    addi a0, sp, 8
 ; RV32IFD-NEXT:    call frexpf
 ; RV32IFD-NEXT:    fmv.s fs3, fa0
-; RV32IFD-NEXT:    addi a0, s0, 20
+; RV32IFD-NEXT:    addi a0, sp, 12
 ; RV32IFD-NEXT:    fmv.s fa0, fs2
 ; RV32IFD-NEXT:    call frexpf
 ; RV32IFD-NEXT:    fmv.s fs2, fa0
-; RV32IFD-NEXT:    addi a0, s0, 24
+; RV32IFD-NEXT:    addi a0, sp, 16
 ; RV32IFD-NEXT:    fmv.s fa0, fs1
 ; RV32IFD-NEXT:    call frexpf
 ; RV32IFD-NEXT:    fmv.s fs1, fa0
-; RV32IFD-NEXT:    addi a0, s0, 28
+; RV32IFD-NEXT:    addi a0, sp, 20
 ; RV32IFD-NEXT:    fmv.s fa0, fs0
 ; RV32IFD-NEXT:    call frexpf
+; RV32IFD-NEXT:    lw a0, 8(sp)
+; RV32IFD-NEXT:    lw a1, 12(sp)
+; RV32IFD-NEXT:    lw a2, 16(sp)
+; RV32IFD-NEXT:    lw a3, 20(sp)
+; RV32IFD-NEXT:    sw a0, 16(s0)
+; RV32IFD-NEXT:    sw a1, 20(s0)
+; RV32IFD-NEXT:    sw a2, 24(s0)
+; RV32IFD-NEXT:    sw a3, 28(s0)
 ; RV32IFD-NEXT:    fsw fs3, 0(s0)
 ; RV32IFD-NEXT:    fsw fs2, 4(s0)
 ; RV32IFD-NEXT:    fsw fs1, 8(s0)
 ; RV32IFD-NEXT:    fsw fa0, 12(s0)
-; RV32IFD-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    fld fs0, 32(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs1, 24(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs2, 16(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs3, 8(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 48
+; RV32IFD-NEXT:    lw ra, 60(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    lw s0, 56(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    fld fs0, 48(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs1, 40(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs2, 32(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs3, 24(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    addi sp, sp, 64
 ; RV32IFD-NEXT:    ret
 ;
 ; RV64IFD-LABEL: test_frexp_v4f32_v4i32:
@@ -631,44 +639,52 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) nounwi
 ;
 ; RV32IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -32
-; RV32IZFINXZDINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s3, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s4, 8(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, -48
+; RV32IZFINXZDINX-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s1, 36(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s2, 32(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s3, 28(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s4, 24(sp) # 4-byte Folded Spill
 ; RV32IZFINXZDINX-NEXT:    mv s0, a4
 ; RV32IZFINXZDINX-NEXT:    mv s1, a3
 ; RV32IZFINXZDINX-NEXT:    mv s2, a2
 ; RV32IZFINXZDINX-NEXT:    mv a2, a1
 ; RV32IZFINXZDINX-NEXT:    mv s3, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, a0, 16
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 8
 ; RV32IZFINXZDINX-NEXT:    mv a0, a2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
 ; RV32IZFINXZDINX-NEXT:    mv s4, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 20
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 12
 ; RV32IZFINXZDINX-NEXT:    mv a0, s2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
 ; RV32IZFINXZDINX-NEXT:    mv s2, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 24
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 16
 ; RV32IZFINXZDINX-NEXT:    mv a0, s1
 ; RV32IZFINXZDINX-NEXT:    call frexpf
 ; RV32IZFINXZDINX-NEXT:    mv s1, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 28
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 20
 ; RV32IZFINXZDINX-NEXT:    mv a0, s0
 ; RV32IZFINXZDINX-NEXT:    call frexpf
+; RV32IZFINXZDINX-NEXT:    lw a1, 8(sp)
+; RV32IZFINXZDINX-NEXT:    lw a2, 12(sp)
+; RV32IZFINXZDINX-NEXT:    lw a3, 16(sp)
+; RV32IZFINXZDINX-NEXT:    lw a4, 20(sp)
+; RV32IZFINXZDINX-NEXT:    sw a1, 16(s3)
+; RV32IZFINXZDINX-NEXT:    sw a2, 20(s3)
+; RV32IZFINXZDINX-NEXT:    sw a3, 24(s3)
+; RV32IZFINXZDINX-NEXT:    sw a4, 28(s3)
 ; RV32IZFINXZDINX-NEXT:    sw s4, 0(s3)
 ; RV32IZFINXZDINX-NEXT:    sw s2, 4(s3)
 ; RV32IZFINXZDINX-NEXT:    sw s1, 8(s3)
 ; RV32IZFINXZDINX-NEXT:    sw a0, 12(s3)
-; RV32IZFINXZDINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s3, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s4, 8(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 32
+; RV32IZFINXZDINX-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s1, 36(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s2, 32(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s3, 28(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s4, 24(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, 48
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32:
@@ -1080,34 +1096,41 @@ define <4 x float> @test_frexp_v4f32_v4i32_only_use_fract(<4 x float> %a) nounwi
 define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) nounwind {
 ; RV32IFD-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -32
-; RV32IFD-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs0, 16(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs1, 8(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fsd fs2, 0(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT:    fmv.s fs0, fa2
-; RV32IFD-NEXT:    fmv.s fs1, fa1
-; RV32IFD-NEXT:    fmv.s fs2, fa0
+; RV32IFD-NEXT:    addi sp, sp, -48
+; RV32IFD-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs0, 32(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs1, 24(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fsd fs2, 16(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT:    fmv.s fs0, fa3
+; RV32IFD-NEXT:    fmv.s fs1, fa2
+; RV32IFD-NEXT:    fmv.s fs2, fa1
 ; RV32IFD-NEXT:    mv s0, a0
-; RV32IFD-NEXT:    addi a0, a0, 12
-; RV32IFD-NEXT:    fmv.s fa0, fa3
+; RV32IFD-NEXT:    mv a0, sp
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    addi a0, s0, 8
-; RV32IFD-NEXT:    fmv.s fa0, fs0
+; RV32IFD-NEXT:    addi a0, sp, 4
+; RV32IFD-NEXT:    fmv.s fa0, fs2
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    addi a0, s0, 4
+; RV32IFD-NEXT:    addi a0, sp, 8
 ; RV32IFD-NEXT:    fmv.s fa0, fs1
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    fmv.s fa0, fs2
-; RV32IFD-NEXT:    mv a0, s0
+; RV32IFD-NEXT:    addi a0, sp, 12
+; RV32IFD-NEXT:    fmv.s fa0, fs0
 ; RV32IFD-NEXT:    call frexpf
-; RV32IFD-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    fld fs0, 16(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs1, 8(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    fld fs2, 0(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 32
+; RV32IFD-NEXT:    lw a0, 0(sp)
+; RV32IFD-NEXT:    lw a1, 4(sp)
+; RV32IFD-NEXT:    lw a2, 8(sp)
+; RV32IFD-NEXT:    lw a3, 12(sp)
+; RV32IFD-NEXT:    sw a0, 0(s0)
+; RV32IFD-NEXT:    sw a1, 4(s0)
+; RV32IFD-NEXT:    sw a2, 8(s0)
+; RV32IFD-NEXT:    sw a3, 12(s0)
+; RV32IFD-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT:    fld fs0, 32(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs1, 24(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    fld fs2, 16(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT:    addi sp, sp, 48
 ; RV32IFD-NEXT:    ret
 ;
 ; RV64IFD-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
@@ -1151,34 +1174,43 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) nounwind {
 ;
 ; RV32IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -32
-; RV32IZFINXZDINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    sw s3, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    mv s0, a3
-; RV32IZFINXZDINX-NEXT:    mv s1, a2
-; RV32IZFINXZDINX-NEXT:    mv s2, a1
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, -48
+; RV32IZFINXZDINX-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s1, 36(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s2, 32(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    sw s3, 28(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT:    mv s0, a4
+; RV32IZFINXZDINX-NEXT:    mv s1, a3
+; RV32IZFINXZDINX-NEXT:    mv s2, a2
+; RV32IZFINXZDINX-NEXT:    mv a2, a1
 ; RV32IZFINXZDINX-NEXT:    mv s3, a0
-; RV32IZFINXZDINX-NEXT:    addi a1, a0, 12
-; RV32IZFINXZDINX-NEXT:    mv a0, a4
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 12
+; RV32IZFINXZDINX-NEXT:    mv a0, a2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 8
-; RV32IZFINXZDINX-NEXT:    mv a0, s0
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 16
+; RV32IZFINXZDINX-NEXT:    mv a0, s2
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    addi a1, s3, 4
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 20
 ; RV32IZFINXZDINX-NEXT:    mv a0, s1
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    mv a0, s2
-; RV32IZFINXZDINX-NEXT:    mv a1, s3
+; RV32IZFINXZDINX-NEXT:    addi a1, sp, 24
+; RV32IZFINXZDINX-NEXT:    mv a0, s0
 ; RV32IZFINXZDINX-NEXT:    call frexpf
-; RV32IZFINXZDINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    lw s3, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 32
+; RV32IZFINXZDINX-NEXT:    lw a0, 12(sp)
+; RV32IZFINXZDINX-NEXT:    lw a1, 16(sp)
+; RV32IZFINXZDINX-NEXT:    lw a2, 20(sp)
+; RV32IZFINXZDINX-NEXT:    lw a3, 24(sp)
+; RV32IZFINXZDINX-NEXT:    sw a0, 0(s3)
+; RV32IZFINXZDINX-NEXT:    sw a1, 4(s3)
+; RV32IZFINXZDINX-NEXT:    sw a2, 8(s3)
+; RV32IZFINXZDINX-NEXT:    sw a3, 12(s3)
+; RV32IZFINXZDINX-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s1, 36(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s2, 32(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    lw s3, 28(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT:    addi sp, sp, 48
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
diff --git a/llvm/test/CodeGen/X86/llvm.frexp.ll b/llvm/test/CodeGen/X86/llvm.frexp.ll
index 96de34519556d0..cd560ad627de4c 100644
--- a/llvm/test/CodeGen/X86/llvm.frexp.ll
+++ b/llvm/test/CodeGen/X86/llvm.frexp.ll
@@ -325,27 +325,28 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
 ;
 ; WIN32-LABEL: test_frexp_v4f32_v4i32:
 ; WIN32:       # %bb.0:
+; WIN32-NEXT:    pushl %edi
 ; WIN32-NEXT:    pushl %esi
-; WIN32-NEXT:    subl $44, %esp
+; WIN32-NEXT:    subl $60, %esp
 ; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT:    leal 24(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT:    leal 20(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT:    leal 16(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
-; WIN32-NEXT:    leal 28(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
@@ -360,13 +361,22 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edi
+; WIN32-NEXT:    movl %edi, 28(%esi)
+; WIN32-NEXT:    movl %edx, 24(%esi)
+; WIN32-NEXT:    movl %ecx, 20(%esi)
+; WIN32-NEXT:    movl %eax, 16(%esi)
 ; WIN32-NEXT:    fstps 12(%esi)
 ; WIN32-NEXT:    fstps 8(%esi)
 ; WIN32-NEXT:    fstps 4(%esi)
 ; WIN32-NEXT:    fstps (%esi)
 ; WIN32-NEXT:    movl %esi, %eax
-; WIN32-NEXT:    addl $44, %esp
+; WIN32-NEXT:    addl $60, %esp
 ; WIN32-NEXT:    popl %esi
+; WIN32-NEXT:    popl %edi
 ; WIN32-NEXT:    retl
   %result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
   ret { <4 x float>, <4 x i32> } %result
@@ -489,35 +499,46 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) {
 ;
 ; WIN32-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
 ; WIN32:       # %bb.0:
+; WIN32-NEXT:    pushl %edi
 ; WIN32-NEXT:    pushl %esi
-; WIN32-NEXT:    subl $12, %esp
+; WIN32-NEXT:    subl $28, %esp
 ; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT:    leal 8(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    leal 4(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    leal 12(%esi), %eax
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    movl %esi, {{[0-9]+}}(%esp)
+; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edi
+; WIN32-NEXT:    movl %edi, 12(%esi)
+; WIN32-NEXT:    movl %edx, 8(%esi)
+; WIN32-NEXT:    movl %ecx, 4(%esi)
+; WIN32-NEXT:    movl %eax, (%esi)
 ; WIN32-NEXT:    movl %esi, %eax
-; WIN32-NEXT:    addl $12, %esp
+; WIN32-NEXT:    addl $28, %esp
 ; WIN32-NEXT:    popl %esi
+; WIN32-NEXT:    popl %edi
 ; WIN32-NEXT:    retl
   %result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
   %result.1 = extractvalue { <4 x float>, <4 x i32> } %result, 1
diff --git a/llvm/test/CodeGen/X86/sincos-stack-args.ll b/llvm/test/CodeGen/X86/sincos-stack-args.ll
new file mode 100644
index 00000000000000..9fb3a6769fda11
--- /dev/null
+++ b/llvm/test/CodeGen/X86/sincos-stack-args.ll
@@ -0,0 +1,35 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_sp --version 5
+; RUN: llc < %s -mtriple=i386-unknown-linux-gnu  | FileCheck %s
+; Test for issue https://github.com/llvm/llvm-project/issues/115323
+
+declare double @g(double, double)
+
+define double @f(double %a) {
+; CHECK-LABEL: f:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    subl $44, %esp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    fldl 48(%esp)
+; CHECK-NEXT:    leal 24(%esp), %eax
+; CHECK-NEXT:    movl %eax, 12(%esp)
+; CHECK-NEXT:    leal 32(%esp), %eax
+; CHECK-NEXT:    movl %eax, 8(%esp)
+; CHECK-NEXT:    fstpl (%esp)
+; CHECK-NEXT:    calll sincos
+; CHECK-NEXT:    fldl 32(%esp)
+; CHECK-NEXT:    fldl 24(%esp)
+; CHECK-NEXT:    faddl {{\.?LCPI[0-9]+_[0-9]+}}
+; CHECK-NEXT:    fxch %st(1)
+; CHECK-NEXT:    fstpl 8(%esp)
+; CHECK-NEXT:    fstpl (%esp)
+; CHECK-NEXT:    calll g@PLT
+; CHECK-NEXT:    addl $44, %esp
+; CHECK-NEXT:    .cfi_def_cfa_offset 4
+; CHECK-NEXT:    retl
+entry:
+  %0 = tail call double @llvm.sin.f64(double %a)
+  %1 = tail call double @llvm.cos.f64(double %a)
+  %add = fadd double %1, 3.140000e+00
+  %call = tail call double @g(double %add, double %0)
+  ret double %call
+}

@MacDue
Copy link
Member Author

MacDue commented Nov 7, 2024

Confirmed that this fixes the problem we hit in #115323. I'm not familiar enough with this part of SelectionDAG to say whether it's the Right Fix(tm) though.

No worries 👍 Thanks for checking!

if (CallRetResNo == ResNo || !ST->isSimple() ||
// If the pointer value does not come from the IR, it could come from ABI
// lowering and may alias with the arguments of the library call if they are
// passed via the stack.
Copy link
Collaborator

@efriedma-quic efriedma-quic Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, the issue here is that the store destination could be part of the argument list of another call? The correct way to handle that case should involve callseq_start/callseq_end. Assuming the libcall isn't nested inside another callseq_start/end, there should be a callseq_end and a callseq_start between the libcall and the store. And those nodes should prevent the transform because they "alias" the store.

Blanket ignoring all non-IR values maybe works for this particular testcase, but I don't think it solves aliasing in general.

Copy link
Member Author

@MacDue MacDue Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding (possibly wrong) is you have two calls:

sincos(x, ptr, ptr)
g(double, double)

All the arguments are passed on the stack, and the call to sincos is rewritten to be passed the stack locations of the arguments to g directly. The issue is the slack locations of g's arguments alias the stack locations of sincos's arguments, so when sincos writes to one of the pointers it clobbers its own arguments.

The current code is already checking the two pointers for sin and cos don't alias, but it does not expect the pointers could alias its own arguments.

Copy link
Member Author

@MacDue MacDue Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct way to handle that case should involve callseq_start/callseq_end. Assuming the libcall isn't nested inside another callseq_start/end, there should be a callseq_end and a callseq_start between the libcall and the store.

It looks like this is the case. Would a potential fix be to check that a CALLSEQ_START does not occur in the chain of the store, or have I misunderstood?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't have to special case CALLSEQ_START. This restriction should be implicitly present in the chain

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify that? Before this expansion you have an frexp or fsincos where the users are stores within the call sequence of another function (g in the previous example). There are no aliasing issues at this point. The issue occurs when the node is replaced with a library call. It takes its input chains from the previous stores, so when expanded you have a call sequence for the node nested inside a call sequence for g 😬. This is where the issues come from as both calls used the same area for stack arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about the issue with folding stores into sincos while expanding it to a library call, the possible solutions I came up with are:

  1. Check the stores are not within a CALLSEQ_START-CALLSEQ_END pair
    • As far as I can tell nobody has done this before -- grepping the codebase (and it's possibly a costly graph search)
  2. Disallow stores to pointers that don't exist in the IR
    • Much simpler -- and I think could solve the issue, but it's assuming a store to an IR pointer won't occur within a CALLSEQ_START - CALLSEQ_END pair
    • Maybe this is a valid assumption (the stores I've seen have always been to special stack slots for the call), but maybe this could be broken.
  3. Be very conservative and restrict this to trivial cases (e.g. the case where the store input chains are the entry node)

The issue this is trying to avoid is if the stores that are to be folded into sincos are within a CALLSEQ_START-CALLSEQ_END pair, then the expansion will result in nested call sequences, which is where the issues come from.

@MacDue
Copy link
Member Author

MacDue commented Nov 12, 2024

Any thoughts on this? I think it'd be good to find a solution here 🙂

@zmodem
Copy link
Collaborator

zmodem commented Nov 12, 2024

Any thoughts on this? I think it'd be good to find a solution here 🙂

Until we find the proper solution, maybe we should consider rolling this back, or doing the conservative suggestion 3) above?

@MacDue
Copy link
Member Author

MacDue commented Nov 12, 2024

Any thoughts on this? I think it'd be good to find a solution here 🙂

Until we find the proper solution, maybe we should consider rolling this back, or doing the conservative suggestion 3) above?

I've created a PR for the conservative check here: #115906

@MacDue MacDue closed this Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sincos args may get clobbered after #108401 when passed on the stack

5 participants