-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SDAG] Only apply sincos/frexp stack slot folding to IR pointers #115346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Confirmed that this fixes the problem we hit in #115323. I'm not familiar enough with this part of SelectionDAG to say whether it's the Right Fix(tm) though. |
|
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-x86 Author: Benjamin Maxwell (MacDue) ChangesIf the pointer passed to the library call does not exist in the IR (i.e. it was materialized for ABI reasons), passing it to the library call could result in unexpected aliasing/clobbering, as the area of the stack for callee arguments could alias the result stack slots in the parent call frame. Fixes #115323 Full diff: https://github.com/llvm/llvm-project/pull/115346.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 203e14f6cde3e3..7db226b24e5f2e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2518,7 +2518,12 @@ bool SelectionDAG::expandMultipleResultFPLibCall(
SDValue StoreValue = ST->getValue();
unsigned ResNo = StoreValue.getResNo();
Type *StoreType = StoreValue.getValueType().getTypeForEVT(Ctx);
- if (CallRetResNo == ResNo || !ST->isSimple() ||
+ // If the pointer value does not come from the IR, it could come from ABI
+ // lowering and may alias with the arguments of the library call if they are
+ // passed via the stack.
+ const Value *PointerValue =
+ dyn_cast_or_null<const Value *>(ST->getPointerInfo().V);
+ if (!PointerValue || CallRetResNo == ResNo || !ST->isSimple() ||
ST->getAddressSpace() != 0 ||
ST->getAlign() <
getDataLayout().getABITypeAlign(StoreType->getScalarType()) ||
diff --git a/llvm/test/CodeGen/RISCV/llvm.frexp.ll b/llvm/test/CodeGen/RISCV/llvm.frexp.ll
index 3f615d23d3eaf6..e85a7118f5ff83 100644
--- a/llvm/test/CodeGen/RISCV/llvm.frexp.ll
+++ b/llvm/test/CodeGen/RISCV/llvm.frexp.ll
@@ -543,42 +543,50 @@ define i32 @test_frexp_f32_i32_only_use_exp(float %a) nounwind {
define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) nounwind {
; RV32IFD-LABEL: test_frexp_v4f32_v4i32:
; RV32IFD: # %bb.0:
-; RV32IFD-NEXT: addi sp, sp, -48
-; RV32IFD-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT: sw s0, 40(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT: fsd fs0, 32(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT: fsd fs1, 24(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT: fsd fs2, 16(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT: fsd fs3, 8(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT: addi sp, sp, -64
+; RV32IFD-NEXT: sw ra, 60(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT: sw s0, 56(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT: fsd fs0, 48(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT: fsd fs1, 40(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT: fsd fs2, 32(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT: fsd fs3, 24(sp) # 8-byte Folded Spill
; RV32IFD-NEXT: fmv.s fs0, fa3
; RV32IFD-NEXT: fmv.s fs1, fa2
; RV32IFD-NEXT: fmv.s fs2, fa1
; RV32IFD-NEXT: mv s0, a0
-; RV32IFD-NEXT: addi a0, a0, 16
+; RV32IFD-NEXT: addi a0, sp, 8
; RV32IFD-NEXT: call frexpf
; RV32IFD-NEXT: fmv.s fs3, fa0
-; RV32IFD-NEXT: addi a0, s0, 20
+; RV32IFD-NEXT: addi a0, sp, 12
; RV32IFD-NEXT: fmv.s fa0, fs2
; RV32IFD-NEXT: call frexpf
; RV32IFD-NEXT: fmv.s fs2, fa0
-; RV32IFD-NEXT: addi a0, s0, 24
+; RV32IFD-NEXT: addi a0, sp, 16
; RV32IFD-NEXT: fmv.s fa0, fs1
; RV32IFD-NEXT: call frexpf
; RV32IFD-NEXT: fmv.s fs1, fa0
-; RV32IFD-NEXT: addi a0, s0, 28
+; RV32IFD-NEXT: addi a0, sp, 20
; RV32IFD-NEXT: fmv.s fa0, fs0
; RV32IFD-NEXT: call frexpf
+; RV32IFD-NEXT: lw a0, 8(sp)
+; RV32IFD-NEXT: lw a1, 12(sp)
+; RV32IFD-NEXT: lw a2, 16(sp)
+; RV32IFD-NEXT: lw a3, 20(sp)
+; RV32IFD-NEXT: sw a0, 16(s0)
+; RV32IFD-NEXT: sw a1, 20(s0)
+; RV32IFD-NEXT: sw a2, 24(s0)
+; RV32IFD-NEXT: sw a3, 28(s0)
; RV32IFD-NEXT: fsw fs3, 0(s0)
; RV32IFD-NEXT: fsw fs2, 4(s0)
; RV32IFD-NEXT: fsw fs1, 8(s0)
; RV32IFD-NEXT: fsw fa0, 12(s0)
-; RV32IFD-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT: lw s0, 40(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT: fld fs0, 32(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT: fld fs1, 24(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT: fld fs2, 16(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT: fld fs3, 8(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT: addi sp, sp, 48
+; RV32IFD-NEXT: lw ra, 60(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT: lw s0, 56(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT: fld fs0, 48(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT: fld fs1, 40(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT: fld fs2, 32(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT: fld fs3, 24(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT: addi sp, sp, 64
; RV32IFD-NEXT: ret
;
; RV64IFD-LABEL: test_frexp_v4f32_v4i32:
@@ -631,44 +639,52 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) nounwi
;
; RV32IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32:
; RV32IZFINXZDINX: # %bb.0:
-; RV32IZFINXZDINX-NEXT: addi sp, sp, -32
-; RV32IZFINXZDINX-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s1, 20(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s2, 16(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s3, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s4, 8(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: addi sp, sp, -48
+; RV32IZFINXZDINX-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s1, 36(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s2, 32(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s3, 28(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s4, 24(sp) # 4-byte Folded Spill
; RV32IZFINXZDINX-NEXT: mv s0, a4
; RV32IZFINXZDINX-NEXT: mv s1, a3
; RV32IZFINXZDINX-NEXT: mv s2, a2
; RV32IZFINXZDINX-NEXT: mv a2, a1
; RV32IZFINXZDINX-NEXT: mv s3, a0
-; RV32IZFINXZDINX-NEXT: addi a1, a0, 16
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 8
; RV32IZFINXZDINX-NEXT: mv a0, a2
; RV32IZFINXZDINX-NEXT: call frexpf
; RV32IZFINXZDINX-NEXT: mv s4, a0
-; RV32IZFINXZDINX-NEXT: addi a1, s3, 20
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 12
; RV32IZFINXZDINX-NEXT: mv a0, s2
; RV32IZFINXZDINX-NEXT: call frexpf
; RV32IZFINXZDINX-NEXT: mv s2, a0
-; RV32IZFINXZDINX-NEXT: addi a1, s3, 24
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 16
; RV32IZFINXZDINX-NEXT: mv a0, s1
; RV32IZFINXZDINX-NEXT: call frexpf
; RV32IZFINXZDINX-NEXT: mv s1, a0
-; RV32IZFINXZDINX-NEXT: addi a1, s3, 28
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 20
; RV32IZFINXZDINX-NEXT: mv a0, s0
; RV32IZFINXZDINX-NEXT: call frexpf
+; RV32IZFINXZDINX-NEXT: lw a1, 8(sp)
+; RV32IZFINXZDINX-NEXT: lw a2, 12(sp)
+; RV32IZFINXZDINX-NEXT: lw a3, 16(sp)
+; RV32IZFINXZDINX-NEXT: lw a4, 20(sp)
+; RV32IZFINXZDINX-NEXT: sw a1, 16(s3)
+; RV32IZFINXZDINX-NEXT: sw a2, 20(s3)
+; RV32IZFINXZDINX-NEXT: sw a3, 24(s3)
+; RV32IZFINXZDINX-NEXT: sw a4, 28(s3)
; RV32IZFINXZDINX-NEXT: sw s4, 0(s3)
; RV32IZFINXZDINX-NEXT: sw s2, 4(s3)
; RV32IZFINXZDINX-NEXT: sw s1, 8(s3)
; RV32IZFINXZDINX-NEXT: sw a0, 12(s3)
-; RV32IZFINXZDINX-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s1, 20(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s2, 16(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s3, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s4, 8(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: addi sp, sp, 32
+; RV32IZFINXZDINX-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s1, 36(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s2, 32(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s3, 28(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s4, 24(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: addi sp, sp, 48
; RV32IZFINXZDINX-NEXT: ret
;
; RV64IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32:
@@ -1080,34 +1096,41 @@ define <4 x float> @test_frexp_v4f32_v4i32_only_use_fract(<4 x float> %a) nounwi
define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) nounwind {
; RV32IFD-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
; RV32IFD: # %bb.0:
-; RV32IFD-NEXT: addi sp, sp, -32
-; RV32IFD-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT: fsd fs0, 16(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT: fsd fs1, 8(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT: fsd fs2, 0(sp) # 8-byte Folded Spill
-; RV32IFD-NEXT: fmv.s fs0, fa2
-; RV32IFD-NEXT: fmv.s fs1, fa1
-; RV32IFD-NEXT: fmv.s fs2, fa0
+; RV32IFD-NEXT: addi sp, sp, -48
+; RV32IFD-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT: sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IFD-NEXT: fsd fs0, 32(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT: fsd fs1, 24(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT: fsd fs2, 16(sp) # 8-byte Folded Spill
+; RV32IFD-NEXT: fmv.s fs0, fa3
+; RV32IFD-NEXT: fmv.s fs1, fa2
+; RV32IFD-NEXT: fmv.s fs2, fa1
; RV32IFD-NEXT: mv s0, a0
-; RV32IFD-NEXT: addi a0, a0, 12
-; RV32IFD-NEXT: fmv.s fa0, fa3
+; RV32IFD-NEXT: mv a0, sp
; RV32IFD-NEXT: call frexpf
-; RV32IFD-NEXT: addi a0, s0, 8
-; RV32IFD-NEXT: fmv.s fa0, fs0
+; RV32IFD-NEXT: addi a0, sp, 4
+; RV32IFD-NEXT: fmv.s fa0, fs2
; RV32IFD-NEXT: call frexpf
-; RV32IFD-NEXT: addi a0, s0, 4
+; RV32IFD-NEXT: addi a0, sp, 8
; RV32IFD-NEXT: fmv.s fa0, fs1
; RV32IFD-NEXT: call frexpf
-; RV32IFD-NEXT: fmv.s fa0, fs2
-; RV32IFD-NEXT: mv a0, s0
+; RV32IFD-NEXT: addi a0, sp, 12
+; RV32IFD-NEXT: fmv.s fa0, fs0
; RV32IFD-NEXT: call frexpf
-; RV32IFD-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT: fld fs0, 16(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT: fld fs1, 8(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT: fld fs2, 0(sp) # 8-byte Folded Reload
-; RV32IFD-NEXT: addi sp, sp, 32
+; RV32IFD-NEXT: lw a0, 0(sp)
+; RV32IFD-NEXT: lw a1, 4(sp)
+; RV32IFD-NEXT: lw a2, 8(sp)
+; RV32IFD-NEXT: lw a3, 12(sp)
+; RV32IFD-NEXT: sw a0, 0(s0)
+; RV32IFD-NEXT: sw a1, 4(s0)
+; RV32IFD-NEXT: sw a2, 8(s0)
+; RV32IFD-NEXT: sw a3, 12(s0)
+; RV32IFD-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT: lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IFD-NEXT: fld fs0, 32(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT: fld fs1, 24(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT: fld fs2, 16(sp) # 8-byte Folded Reload
+; RV32IFD-NEXT: addi sp, sp, 48
; RV32IFD-NEXT: ret
;
; RV64IFD-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
@@ -1151,34 +1174,43 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) nounwind {
;
; RV32IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
; RV32IZFINXZDINX: # %bb.0:
-; RV32IZFINXZDINX-NEXT: addi sp, sp, -32
-; RV32IZFINXZDINX-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s1, 20(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s2, 16(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: sw s3, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT: mv s0, a3
-; RV32IZFINXZDINX-NEXT: mv s1, a2
-; RV32IZFINXZDINX-NEXT: mv s2, a1
+; RV32IZFINXZDINX-NEXT: addi sp, sp, -48
+; RV32IZFINXZDINX-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s0, 40(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s1, 36(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s2, 32(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: sw s3, 28(sp) # 4-byte Folded Spill
+; RV32IZFINXZDINX-NEXT: mv s0, a4
+; RV32IZFINXZDINX-NEXT: mv s1, a3
+; RV32IZFINXZDINX-NEXT: mv s2, a2
+; RV32IZFINXZDINX-NEXT: mv a2, a1
; RV32IZFINXZDINX-NEXT: mv s3, a0
-; RV32IZFINXZDINX-NEXT: addi a1, a0, 12
-; RV32IZFINXZDINX-NEXT: mv a0, a4
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 12
+; RV32IZFINXZDINX-NEXT: mv a0, a2
; RV32IZFINXZDINX-NEXT: call frexpf
-; RV32IZFINXZDINX-NEXT: addi a1, s3, 8
-; RV32IZFINXZDINX-NEXT: mv a0, s0
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 16
+; RV32IZFINXZDINX-NEXT: mv a0, s2
; RV32IZFINXZDINX-NEXT: call frexpf
-; RV32IZFINXZDINX-NEXT: addi a1, s3, 4
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 20
; RV32IZFINXZDINX-NEXT: mv a0, s1
; RV32IZFINXZDINX-NEXT: call frexpf
-; RV32IZFINXZDINX-NEXT: mv a0, s2
-; RV32IZFINXZDINX-NEXT: mv a1, s3
+; RV32IZFINXZDINX-NEXT: addi a1, sp, 24
+; RV32IZFINXZDINX-NEXT: mv a0, s0
; RV32IZFINXZDINX-NEXT: call frexpf
-; RV32IZFINXZDINX-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s1, 20(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s2, 16(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: lw s3, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT: addi sp, sp, 32
+; RV32IZFINXZDINX-NEXT: lw a0, 12(sp)
+; RV32IZFINXZDINX-NEXT: lw a1, 16(sp)
+; RV32IZFINXZDINX-NEXT: lw a2, 20(sp)
+; RV32IZFINXZDINX-NEXT: lw a3, 24(sp)
+; RV32IZFINXZDINX-NEXT: sw a0, 0(s3)
+; RV32IZFINXZDINX-NEXT: sw a1, 4(s3)
+; RV32IZFINXZDINX-NEXT: sw a2, 8(s3)
+; RV32IZFINXZDINX-NEXT: sw a3, 12(s3)
+; RV32IZFINXZDINX-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s0, 40(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s1, 36(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s2, 32(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: lw s3, 28(sp) # 4-byte Folded Reload
+; RV32IZFINXZDINX-NEXT: addi sp, sp, 48
; RV32IZFINXZDINX-NEXT: ret
;
; RV64IZFINXZDINX-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
diff --git a/llvm/test/CodeGen/X86/llvm.frexp.ll b/llvm/test/CodeGen/X86/llvm.frexp.ll
index 96de34519556d0..cd560ad627de4c 100644
--- a/llvm/test/CodeGen/X86/llvm.frexp.ll
+++ b/llvm/test/CodeGen/X86/llvm.frexp.ll
@@ -325,27 +325,28 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
;
; WIN32-LABEL: test_frexp_v4f32_v4i32:
; WIN32: # %bb.0:
+; WIN32-NEXT: pushl %edi
; WIN32-NEXT: pushl %esi
-; WIN32-NEXT: subl $44, %esp
+; WIN32-NEXT: subl $60, %esp
; WIN32-NEXT: movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT: leal 24(%esi), %eax
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT: leal 20(%esi), %eax
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT: leal 16(%esi), %eax
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
-; WIN32-NEXT: leal 28(%esi), %eax
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
@@ -360,13 +361,22 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %ecx
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edx
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edi
+; WIN32-NEXT: movl %edi, 28(%esi)
+; WIN32-NEXT: movl %edx, 24(%esi)
+; WIN32-NEXT: movl %ecx, 20(%esi)
+; WIN32-NEXT: movl %eax, 16(%esi)
; WIN32-NEXT: fstps 12(%esi)
; WIN32-NEXT: fstps 8(%esi)
; WIN32-NEXT: fstps 4(%esi)
; WIN32-NEXT: fstps (%esi)
; WIN32-NEXT: movl %esi, %eax
-; WIN32-NEXT: addl $44, %esp
+; WIN32-NEXT: addl $60, %esp
; WIN32-NEXT: popl %esi
+; WIN32-NEXT: popl %edi
; WIN32-NEXT: retl
%result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
ret { <4 x float>, <4 x i32> } %result
@@ -489,35 +499,46 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) {
;
; WIN32-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
; WIN32: # %bb.0:
+; WIN32-NEXT: pushl %edi
; WIN32-NEXT: pushl %esi
-; WIN32-NEXT: subl $12, %esp
+; WIN32-NEXT: subl $28, %esp
; WIN32-NEXT: movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT: leal 8(%esi), %eax
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
-; WIN32-NEXT: leal 4(%esi), %eax
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
-; WIN32-NEXT: leal 12(%esi), %eax
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
-; WIN32-NEXT: movl %esi, {{[0-9]+}}(%esp)
+; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %ecx
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edx
+; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edi
+; WIN32-NEXT: movl %edi, 12(%esi)
+; WIN32-NEXT: movl %edx, 8(%esi)
+; WIN32-NEXT: movl %ecx, 4(%esi)
+; WIN32-NEXT: movl %eax, (%esi)
; WIN32-NEXT: movl %esi, %eax
-; WIN32-NEXT: addl $12, %esp
+; WIN32-NEXT: addl $28, %esp
; WIN32-NEXT: popl %esi
+; WIN32-NEXT: popl %edi
; WIN32-NEXT: retl
%result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
%result.1 = extractvalue { <4 x float>, <4 x i32> } %result, 1
diff --git a/llvm/test/CodeGen/X86/sincos-stack-args.ll b/llvm/test/CodeGen/X86/sincos-stack-args.ll
new file mode 100644
index 00000000000000..9fb3a6769fda11
--- /dev/null
+++ b/llvm/test/CodeGen/X86/sincos-stack-args.ll
@@ -0,0 +1,35 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_sp --version 5
+; RUN: llc < %s -mtriple=i386-unknown-linux-gnu | FileCheck %s
+; Test for issue https://github.com/llvm/llvm-project/issues/115323
+
+declare double @g(double, double)
+
+define double @f(double %a) {
+; CHECK-LABEL: f:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: subl $44, %esp
+; CHECK-NEXT: .cfi_def_cfa_offset 48
+; CHECK-NEXT: fldl 48(%esp)
+; CHECK-NEXT: leal 24(%esp), %eax
+; CHECK-NEXT: movl %eax, 12(%esp)
+; CHECK-NEXT: leal 32(%esp), %eax
+; CHECK-NEXT: movl %eax, 8(%esp)
+; CHECK-NEXT: fstpl (%esp)
+; CHECK-NEXT: calll sincos
+; CHECK-NEXT: fldl 32(%esp)
+; CHECK-NEXT: fldl 24(%esp)
+; CHECK-NEXT: faddl {{\.?LCPI[0-9]+_[0-9]+}}
+; CHECK-NEXT: fxch %st(1)
+; CHECK-NEXT: fstpl 8(%esp)
+; CHECK-NEXT: fstpl (%esp)
+; CHECK-NEXT: calll g@PLT
+; CHECK-NEXT: addl $44, %esp
+; CHECK-NEXT: .cfi_def_cfa_offset 4
+; CHECK-NEXT: retl
+entry:
+ %0 = tail call double @llvm.sin.f64(double %a)
+ %1 = tail call double @llvm.cos.f64(double %a)
+ %add = fadd double %1, 3.140000e+00
+ %call = tail call double @g(double %add, double %0)
+ ret double %call
+}
|
No worries 👍 Thanks for checking! |
| if (CallRetResNo == ResNo || !ST->isSimple() || | ||
| // If the pointer value does not come from the IR, it could come from ABI | ||
| // lowering and may alias with the arguments of the library call if they are | ||
| // passed via the stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, the issue here is that the store destination could be part of the argument list of another call? The correct way to handle that case should involve callseq_start/callseq_end. Assuming the libcall isn't nested inside another callseq_start/end, there should be a callseq_end and a callseq_start between the libcall and the store. And those nodes should prevent the transform because they "alias" the store.
Blanket ignoring all non-IR values maybe works for this particular testcase, but I don't think it solves aliasing in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding (possibly wrong) is you have two calls:
sincos(x, ptr, ptr)
g(double, double)
All the arguments are passed on the stack, and the call to sincos is rewritten to be passed the stack locations of the arguments to g directly. The issue is the slack locations of g's arguments alias the stack locations of sincos's arguments, so when sincos writes to one of the pointers it clobbers its own arguments.
The current code is already checking the two pointers for sin and cos don't alias, but it does not expect the pointers could alias its own arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct way to handle that case should involve callseq_start/callseq_end. Assuming the libcall isn't nested inside another callseq_start/end, there should be a callseq_end and a callseq_start between the libcall and the store.
It looks like this is the case. Would a potential fix be to check that a CALLSEQ_START does not occur in the chain of the store, or have I misunderstood?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't have to special case CALLSEQ_START. This restriction should be implicitly present in the chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify that? Before this expansion you have an frexp or fsincos where the users are stores within the call sequence of another function (g in the previous example). There are no aliasing issues at this point. The issue occurs when the node is replaced with a library call. It takes its input chains from the previous stores, so when expanded you have a call sequence for the node nested inside a call sequence for g 😬. This is where the issues come from as both calls used the same area for stack arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about the issue with folding stores into sincos while expanding it to a library call, the possible solutions I came up with are:
- Check the stores are not within a
CALLSEQ_START-CALLSEQ_ENDpair- As far as I can tell nobody has done this before -- grepping the codebase (and it's possibly a costly graph search)
- Disallow stores to pointers that don't exist in the IR
- Much simpler -- and I think could solve the issue, but it's assuming a store to an IR pointer won't occur within a
CALLSEQ_START-CALLSEQ_ENDpair - Maybe this is a valid assumption (the stores I've seen have always been to special stack slots for the call), but maybe this could be broken.
- Much simpler -- and I think could solve the issue, but it's assuming a store to an IR pointer won't occur within a
- Be very conservative and restrict this to trivial cases (e.g. the case where the store input chains are the entry node)
The issue this is trying to avoid is if the stores that are to be folded into sincos are within a CALLSEQ_START-CALLSEQ_END pair, then the expansion will result in nested call sequences, which is where the issues come from.
|
Any thoughts on this? I think it'd be good to find a solution here 🙂 |
Until we find the proper solution, maybe we should consider rolling this back, or doing the conservative suggestion 3) above? |
I've created a PR for the conservative check here: #115906 |
If the store is to a pointer that does not exist in the IR (i.e. it was materialized for ABI reasons), it may be part of the setup for call (e.g. setting up arguments on the stack). In this case, folding the store into the library call as part of the expansion could result in nested call sequences (and incorrect codegen).
Fixes #115323