Skip to content

Conversation

@ro-i
Copy link
Contributor

@ro-i ro-i commented Sep 29, 2025

In case there is an dynamic alloca / an alloca which is not in the entry block, cs.chain functions do not set up an FP, but are reported to need one. This results in a failed assertion in
SIFrameLowering::emitPrologue() (Assertion (!HasFP || FPSaved) && "Needed to save FP but didn't save it anywhere"' failed.) This commit changes hasFPImpl` so that the need for an SP in a cs.chain function does not directly imply the need for an FP anymore.

This LLVM defect was identified via the AMD Fuzzing project.


Re-opens #132711

In case there is an dynamic alloca / an alloca which is not in the entry
block, cs.chain functions do not set up an FP, but are reported to need
one. This results in a failed assertion in
`SIFrameLowering::emitPrologue()` (Assertion `(!HasFP || FPSaved) &&
"Needed to save FP but didn't save it anywhere"' failed.) This commit
changes `hasFPImpl` so that the need for an SP in a cs.chain function
does not directly imply the need for an FP anymore.

This LLVM defect was identified via the AMD Fuzzing project.
@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Robert Imschweiler (ro-i)

Changes

In case there is an dynamic alloca / an alloca which is not in the entry block, cs.chain functions do not set up an FP, but are reported to need one. This results in a failed assertion in
SIFrameLowering::emitPrologue() (Assertion (!HasFP || FPSaved) && "Needed to save FP but didn't save it anywhere"' failed.) This commit changes hasFPImpl` so that the need for an SP in a cs.chain function does not directly imply the need for an FP anymore.

This LLVM defect was identified via the AMD Fuzzing project.


Re-opens #132711


Full diff: https://github.com/llvm/llvm-project/pull/161194.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+3-1)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll (+360)
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 7c5d4fc2dacf6..7c2ce2737f7be 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -2166,7 +2166,9 @@ bool SIFrameLowering::hasFPImpl(const MachineFunction &MF) const {
     return MFI.getStackSize() != 0;
   }
 
-  return frameTriviallyRequiresSP(MFI) || MFI.isFrameAddressTaken() ||
+  return (frameTriviallyRequiresSP(MFI) &&
+          !MF.getInfo<SIMachineFunctionInfo>()->isChainFunction()) ||
+         MFI.isFrameAddressTaken() ||
          MF.getSubtarget<GCNSubtarget>().getRegisterInfo()->hasStackRealignment(
              MF) ||
          mayReserveScratchForCWSR(MF) ||
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll
new file mode 100644
index 0000000000000..a2696fe160067
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll
@@ -0,0 +1,360 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -o - < %s 2>&1 | FileCheck %s
+
+; These situations are "special" in that they have an alloca not in the entry
+; block, which affects prolog/epilog generation.
+
+declare amdgpu_gfx void @foo()
+
+define amdgpu_cs_chain void @test_alloca() {
+; CHECK-LABEL: test_alloca:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_mov_b32 s0, s32
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_i32 s32, s0, 0x200
+; CHECK-NEXT:    scratch_store_b32 off, v0, s0
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 1, align 4, addrspace(5)
+  store i32 0, ptr addrspace(5) %v, align 4
+  ret void
+}
+
+define amdgpu_cs_chain void @test_alloca_var_uniform(i32 inreg %count) {
+; CHECK-LABEL: test_alloca_var_uniform:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_lshl_b32 s0, s0, 2
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_i32 s0, s0, 15
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_and_b32 s0, s0, -16
+; CHECK-NEXT:    s_mov_b32 s1, s32
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_lshl_b32 s0, s0, 5
+; CHECK-NEXT:    scratch_store_b32 off, v0, s1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_i32 s32, s1, s0
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 %count, align 4, addrspace(5)
+  store i32 0, ptr addrspace(5) %v, align 4
+  ret void
+}
+
+define amdgpu_cs_chain void @test_alloca_var(i32 %count) {
+; CHECK-LABEL: test_alloca_var:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    v_lshl_add_u32 v0, v8, 2, 15
+; CHECK-NEXT:    s_mov_b32 s1, exec_lo
+; CHECK-NEXT:    s_mov_b32 s0, 0
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_and_b32 v1, -16, v0
+; CHECK-NEXT:  .LBB2_1: ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_ctz_i32_b32 s2, s1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_readlane_b32 s3, v1, s2
+; CHECK-NEXT:    s_bitset0_b32 s1, s2
+; CHECK-NEXT:    s_max_u32 s0, s0, s3
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_cmp_lg_u32 s1, 0
+; CHECK-NEXT:    s_cbranch_scc1 .LBB2_1
+; CHECK-NEXT:  ; %bb.2:
+; CHECK-NEXT:    s_mov_b32 s1, s32
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    v_lshl_add_u32 v1, s0, 5, s1
+; CHECK-NEXT:    scratch_store_b32 off, v0, s1
+; CHECK-NEXT:    v_readfirstlane_b32 s32, v1
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 %count, align 4, addrspace(5)
+  store i32 0, ptr addrspace(5) %v, align 4
+  ret void
+}
+
+define amdgpu_cs_chain void @test_alloca_and_call() {
+; CHECK-LABEL: test_alloca_and_call:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_getpc_b64 s[0:1]
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_sext_i32_i16 s1, s1
+; CHECK-NEXT:    s_add_co_u32 s0, s0, foo@gotpcrel32@lo+12
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_ci_u32 s1, s1, foo@gotpcrel32@hi+24
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    s_load_b64 s[0:1], s[0:1], 0x0
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_mov_b32 s2, s32
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_i32 s32, s2, 0x200
+; CHECK-NEXT:    scratch_store_b32 off, v0, s2
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[0:1]
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 1, align 4, addrspace(5)
+  store i32 0, ptr addrspace(5) %v, align 4
+  call amdgpu_gfx void @foo()
+  ret void
+}
+
+define amdgpu_cs_chain void @test_alloca_and_call_var_uniform(i32 inreg %count) {
+; CHECK-LABEL: test_alloca_and_call_var_uniform:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_getpc_b64 s[2:3]
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_sext_i32_i16 s3, s3
+; CHECK-NEXT:    s_add_co_u32 s2, s2, foo@gotpcrel32@lo+12
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_ci_u32 s3, s3, foo@gotpcrel32@hi+24
+; CHECK-NEXT:    s_lshl_b32 s0, s0, 2
+; CHECK-NEXT:    s_load_b64 s[2:3], s[2:3], 0x0
+; CHECK-NEXT:    s_add_co_i32 s0, s0, 15
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_and_b32 s0, s0, -16
+; CHECK-NEXT:    s_mov_b32 s1, s32
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_lshl_b32 s0, s0, 5
+; CHECK-NEXT:    scratch_store_b32 off, v0, s1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_i32 s32, s1, s0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[2:3]
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 %count, align 4, addrspace(5)
+  store i32 0, ptr addrspace(5) %v, align 4
+  call amdgpu_gfx void @foo()
+  ret void
+}
+
+define amdgpu_cs_chain void @test_alloca_and_call_var(i32 %count) {
+; CHECK-LABEL: test_alloca_and_call_var:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    v_lshl_add_u32 v0, v8, 2, 15
+; CHECK-NEXT:    s_mov_b32 s1, exec_lo
+; CHECK-NEXT:    s_mov_b32 s0, 0
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_and_b32 v1, -16, v0
+; CHECK-NEXT:  .LBB5_1: ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_ctz_i32_b32 s2, s1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_readlane_b32 s3, v1, s2
+; CHECK-NEXT:    s_bitset0_b32 s1, s2
+; CHECK-NEXT:    s_max_u32 s0, s0, s3
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_cmp_lg_u32 s1, 0
+; CHECK-NEXT:    s_cbranch_scc1 .LBB5_1
+; CHECK-NEXT:  ; %bb.2:
+; CHECK-NEXT:    s_getpc_b64 s[2:3]
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_sext_i32_i16 s3, s3
+; CHECK-NEXT:    s_add_co_u32 s2, s2, foo@gotpcrel32@lo+12
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_ci_u32 s3, s3, foo@gotpcrel32@hi+24
+; CHECK-NEXT:    s_mov_b32 s1, s32
+; CHECK-NEXT:    s_load_b64 s[2:3], s[2:3], 0x0
+; CHECK-NEXT:    v_lshl_add_u32 v1, s0, 5, s1
+; CHECK-NEXT:    scratch_store_b32 off, v0, s1
+; CHECK-NEXT:    v_readfirstlane_b32 s32, v1
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_wait_alu 0xf1ff
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[2:3]
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 %count, align 4, addrspace(5)
+  store i32 0, ptr addrspace(5) %v, align 4
+  call amdgpu_gfx void @foo()
+  ret void
+}
+
+define amdgpu_cs_chain void @test_call_and_alloca() {
+; CHECK-LABEL: test_call_and_alloca:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_getpc_b64 s[0:1]
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_sext_i32_i16 s1, s1
+; CHECK-NEXT:    s_add_co_u32 s0, s0, foo@gotpcrel32@lo+12
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_ci_u32 s1, s1, foo@gotpcrel32@hi+24
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_load_b64 s[0:1], s[0:1], 0x0
+; CHECK-NEXT:    s_mov_b32 s4, s32
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_i32 s32, s4, 0x200
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[0:1]
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    scratch_store_b32 off, v0, s4
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 1, align 4, addrspace(5)
+  call amdgpu_gfx void @foo()
+  store i32 0, ptr addrspace(5) %v, align 4
+  ret void
+}
+
+define amdgpu_cs_chain void @test_call_and_alloca_var_uniform(i32 inreg %count) {
+; CHECK-LABEL: test_call_and_alloca_var_uniform:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_getpc_b64 s[2:3]
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_sext_i32_i16 s3, s3
+; CHECK-NEXT:    s_add_co_u32 s2, s2, foo@gotpcrel32@lo+12
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_ci_u32 s3, s3, foo@gotpcrel32@hi+24
+; CHECK-NEXT:    s_lshl_b32 s0, s0, 2
+; CHECK-NEXT:    s_load_b64 s[2:3], s[2:3], 0x0
+; CHECK-NEXT:    s_add_co_i32 s0, s0, 15
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_and_b32 s0, s0, -16
+; CHECK-NEXT:    s_mov_b32 s4, s32
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_lshl_b32 s0, s0, 5
+; CHECK-NEXT:    v_mov_b32_e32 v40, 0
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_i32 s32, s4, s0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[2:3]
+; CHECK-NEXT:    scratch_store_b32 off, v40, s4
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 %count, align 4, addrspace(5)
+  call amdgpu_gfx void @foo()
+  store i32 0, ptr addrspace(5) %v, align 4
+  ret void
+}
+
+define amdgpu_cs_chain void @test_call_and_alloca_var(i32 %count) {
+; CHECK-LABEL: test_call_and_alloca_var:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_wait_loadcnt_dscnt 0x0
+; CHECK-NEXT:    s_wait_expcnt 0x0
+; CHECK-NEXT:    s_wait_samplecnt 0x0
+; CHECK-NEXT:    s_wait_bvhcnt 0x0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    v_lshl_add_u32 v0, v8, 2, 15
+; CHECK-NEXT:    v_mov_b32_e32 v40, 0
+; CHECK-NEXT:    s_mov_b32 s1, exec_lo
+; CHECK-NEXT:    s_mov_b32 s0, 0
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    v_and_b32_e32 v0, -16, v0
+; CHECK-NEXT:  .LBB8_1: ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_ctz_i32_b32 s2, s1
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_readlane_b32 s3, v0, s2
+; CHECK-NEXT:    s_bitset0_b32 s1, s2
+; CHECK-NEXT:    s_max_u32 s0, s0, s3
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_cmp_lg_u32 s1, 0
+; CHECK-NEXT:    s_cbranch_scc1 .LBB8_1
+; CHECK-NEXT:  ; %bb.2:
+; CHECK-NEXT:    s_getpc_b64 s[2:3]
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_sext_i32_i16 s3, s3
+; CHECK-NEXT:    s_add_co_u32 s2, s2, foo@gotpcrel32@lo+12
+; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_add_co_ci_u32 s3, s3, foo@gotpcrel32@hi+24
+; CHECK-NEXT:    s_mov_b32 s4, s32
+; CHECK-NEXT:    s_load_b64 s[2:3], s[2:3], 0x0
+; CHECK-NEXT:    v_lshl_add_u32 v0, s0, 5, s4
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_readfirstlane_b32 s32, v0
+; CHECK-NEXT:    s_wait_kmcnt 0x0
+; CHECK-NEXT:    s_wait_alu 0xf1ff
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[2:3]
+; CHECK-NEXT:    scratch_store_b32 off, v40, s4
+; CHECK-NEXT:    s_endpgm
+.entry:
+  br label %SW_C
+
+SW_C:                                             ; preds = %.entry
+  %v = alloca i32, i32 %count, align 4, addrspace(5)
+  call amdgpu_gfx void @foo()
+  store i32 0, ptr addrspace(5) %v, align 4
+  ret void
+}

br label %SW_C

SW_C: ; preds = %.entry
%v = alloca i32, i32 %count, align 4, addrspace(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the out-of-blockness actually matter? Will any dynamic alloca do? The static sized allocas out of block are treated the same

Copy link
Contributor Author

@ro-i ro-i Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the out-of-blockness actually matter?

yes. Have a look at the definition of AllocaInst::isStaticAlloca():

/// isStaticAlloca - Return true if this alloca is in the entry block of the
/// function and is a constant size. If so, the code generator will fold it
/// into the prolog/epilog code, so it is basically free.
bool AllocaInst::isStaticAlloca() const {
// Must be constant size.
if (!isa<ConstantInt>(getArraySize())) return false;
// Must be in the entry block.
const BasicBlock *Parent = getParent();
return Parent->isEntryBlock() && !isUsedWithInAlloca();
}

tldr: If it's not in the entry block, it's not static.
That makes MFI.hasVarSizedObjects true, which makes frameTriviallyRequiresSP (in SIFrameLowering.cpp) true, which makes SIFrameLowering::hasFPImpl true, which makes hasFP in SIFrameLowering::emitPrologue true.
And then, this becomes an issue:

bool FPSaved = FuncInfo->hasPrologEpilogSGPRSpillEntry(FramePtrReg);
(void)FPSaved;
assert((!HasFP || FPSaved) &&
"Needed to save FP but didn't save it anywhere");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaiu, cs_chain functions are not supposed to return, btw. So, in general, there is no point in doing FP/SP saving stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point is that you can keep the variable-sized allocas in the entry block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood the comment, thanks, done

Copy link
Collaborator

@rovka rovka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but wait a couple days in case @arsenm has anything to add.

@ro-i
Copy link
Contributor Author

ro-i commented Oct 23, 2025

@arsenm do you have any further comments?
I think the fix is correct since we don't need the FP in a chain function just because we "trivially require" the SP.
And modifying frameTriviallyRequiresSP doesn't seem right since it's also called in SIFrameLowering::requiresStackPointerReference, which probably should not be changed. That's why I think that I added the additional check in the right place.

@ro-i ro-i merged commit c02bdd4 into main Nov 4, 2025
10 checks passed
@ro-i ro-i deleted the users/ro-i/cs-chain-fp branch November 4, 2025 09:22
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 4, 2025

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/17815

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: 1200 seconds without output running [b'ninja', b'-j 4', b'check-openmp'], attempting to kill
...
PASS: ompd-test :: openmp_examples/example_4.c (452 of 462)
PASS: ompd-test :: openmp_examples/example_5.c (453 of 462)
PASS: ompd-test :: openmp_examples/fibonacci.c (454 of 462)
PASS: ompd-test :: openmp_examples/example_3.c (455 of 462)
UNSUPPORTED: ompd-test :: openmp_examples/ompd_bt.c (456 of 462)
PASS: ompd-test :: openmp_examples/example_task.c (457 of 462)
UNSUPPORTED: ompd-test :: openmp_examples/ompd_parallel.c (458 of 462)
PASS: ompd-test :: openmp_examples/parallel.c (459 of 462)
PASS: ompd-test :: openmp_examples/nested.c (460 of 462)
PASS: ompd-test :: openmp_examples/ompd_icvs.c (461 of 462)
command timed out: 1200 seconds without output running [b'ninja', b'-j 4', b'check-openmp'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1418.026713

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 4, 2025

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/22460

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/exception/cpp/TestDAP_exception_cpp.py (1227 of 2354)
PASS: lldb-api :: tools/lldb-dap/invalidated-event/TestDAP_invalidatedEvent.py (1228 of 2354)
PASS: lldb-api :: tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py (1229 of 2354)
PASS: lldb-api :: tools/lldb-dap/io/TestDAP_io.py (1230 of 2354)
PASS: lldb-api :: tools/lldb-dap/locations/TestDAP_locations.py (1231 of 2354)
PASS: lldb-api :: tools/lldb-dap/memory/TestDAP_memory.py (1232 of 2354)
PASS: lldb-api :: tools/lldb-dap/evaluate/TestDAP_evaluate.py (1233 of 2354)
UNSUPPORTED: lldb-api :: tools/lldb-dap/module/TestDAP_module.py (1234 of 2354)
PASS: lldb-api :: tools/lldb-dap/module-event/TestDAP_module_event.py (1235 of 2354)
PASS: lldb-api :: tools/lldb-dap/moduleSymbols/TestDAP_moduleSymbols.py (1236 of 2354)
FAIL: lldb-api :: tools/lldb-dap/output/TestDAP_output.py (1237 of 2354)
******************** TEST 'lldb-api :: tools/lldb-dap/output/TestDAP_output.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/output -p TestDAP_output.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision c02bdd466a1c22221bc6de3b6817945c90979351)
  clang revision c02bdd466a1c22221bc6de3b6817945c90979351
  llvm revision c02bdd466a1c22221bc6de3b6817945c90979351
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py:388: UserWarning: received a malformed packet, expected 'seq != 0' for {'body': {'$__lldb_version': 'lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision c02bdd466a1c22221bc6de3b6817945c90979351)\n  clang revision c02bdd466a1c22221bc6de3b6817945c90979351\n  llvm revision c02bdd466a1c22221bc6de3b6817945c90979351', 'completionTriggerCharacters': ['.', ' ', '\t'], 'exceptionBreakpointFilters': [{'description': 'C++ Catch', 'filter': 'cpp_catch', 'label': 'C++ Catch', 'supportsCondition': True}, {'description': 'C++ Throw', 'filter': 'cpp_throw', 'label': 'C++ Throw', 'supportsCondition': True}, {'description': 'Objective-C Catch', 'filter': 'objc_catch', 'label': 'Objective-C Catch', 'supportsCondition': True}, {'description': 'Objective-C Throw', 'filter': 'objc_throw', 'label': 'Objective-C Throw', 'supportsCondition': True}], 'supportTerminateDebuggee': True, 'supportsBreakpointLocationsRequest': True, 'supportsCancelRequest': True, 'supportsCompletionsRequest': True, 'supportsConditionalBreakpoints': True, 'supportsConfigurationDoneRequest': True, 'supportsDataBreakpoints': True, 'supportsDelayedStackTraceLoading': True, 'supportsDisassembleRequest': True, 'supportsEvaluateForHovers': True, 'supportsExceptionFilterOptions': True, 'supportsExceptionInfoRequest': True, 'supportsFunctionBreakpoints': True, 'supportsHitConditionalBreakpoints': True, 'supportsInstructionBreakpoints': True, 'supportsLogPoints': True, 'supportsModuleSymbolsRequest': True, 'supportsModulesRequest': True, 'supportsReadMemoryRequest': True, 'supportsSetVariable': True, 'supportsSteppingGranularity': True, 'supportsValueFormattingOptions': True, 'supportsWriteMemoryRequest': True}, 'command': 'initialize', 'request_seq': 1, 'seq': 0, 'success': True, 'type': 'response'}
  warnings.warn(
========= DEBUG ADAPTER PROTOCOL LOGS =========
1762252524.625605822 (stdio) --> {"command":"initialize","type":"request","arguments":{"adapterID":"lldb-native","clientID":"vscode","columnsStartAt1":true,"linesStartAt1":true,"locale":"en-us","pathFormat":"path","supportsRunInTerminalRequest":true,"supportsVariablePaging":true,"supportsVariableType":true,"supportsStartDebuggingRequest":true,"supportsProgressReporting":true,"supportsInvalidatedEvent":true,"supportsMemoryEvent":true,"$__lldb_sourceInitFile":false},"seq":1}
1762252524.625746250 (stdio) queued (command=initialize seq=1)
1762252524.631979942 (stdio) <-- {"body":{"$__lldb_version":"lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision c02bdd466a1c22221bc6de3b6817945c90979351)\n  clang revision c02bdd466a1c22221bc6de3b6817945c90979351\n  llvm revision c02bdd466a1c22221bc6de3b6817945c90979351","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"description":"C++ Catch","filter":"cpp_catch","label":"C++ Catch","supportsCondition":true},{"description":"C++ Throw","filter":"cpp_throw","label":"C++ Throw","supportsCondition":true},{"description":"Objective-C Catch","filter":"objc_catch","label":"Objective-C Catch","supportsCondition":true},{"description":"Objective-C Throw","filter":"objc_throw","label":"Objective-C Throw","supportsCondition":true}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionFilterOptions":true,"supportsExceptionInfoRequest":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModuleSymbolsRequest":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsSetVariable":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true,"supportsWriteMemoryRequest":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}
1762252524.635359764 (stdio) --> {"command":"launch","type":"request","arguments":{"program":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/output/TestDAP_output.test_output/a.out","initCommands":["settings clear --all","settings set symbols.enable-external-lookup false","settings set target.inherit-tcc true","settings set target.disable-aslr false","settings set target.detach-on-error false","settings set target.auto-apply-fixits false","settings set plugin.process.gdb-remote.packet-timeout 60","settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"","settings set use-color false","settings set show-statusline false"],"exitCommands":["?script print('out\\0\\0', end='\\r\\n', file=sys.stdout)","?script print('err\\0\\0', end='\\r\\n', file=sys.stderr)"],"disableASLR":false,"enableAutoVariableSummaries":false,"enableSyntheticChildDebugging":false,"displayExtendedBacktrace":false},"seq":2}
1762252524.635428905 (stdio) queued (command=launch seq=2)
1762252524.635837078 (stdio) <-- {"body":{"category":"console","output":"Running initCommands:\n"},"event":"output","seq":1,"type":"event"}
1762252524.635870218 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings clear --all\n"},"event":"output","seq":2,"type":"event"}
1762252524.635884762 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set symbols.enable-external-lookup false\n"},"event":"output","seq":3,"type":"event"}
1762252524.635899305 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.inherit-tcc true\n"},"event":"output","seq":4,"type":"event"}
1762252524.635912657 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.disable-aslr false\n"},"event":"output","seq":5,"type":"event"}
1762252524.635926008 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.detach-on-error false\n"},"event":"output","seq":6,"type":"event"}
1762252524.635954380 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.auto-apply-fixits false\n"},"event":"output","seq":7,"type":"event"}
1762252524.635967970 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set plugin.process.gdb-remote.packet-timeout 60\n"},"event":"output","seq":8,"type":"event"}
1762252524.635984182 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"\n"},"event":"output","seq":9,"type":"event"}
1762252524.636009216 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set use-color false\n"},"event":"output","seq":10,"type":"event"}
1762252524.636022806 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set show-statusline false\n"},"event":"output","seq":11,"type":"event"}
1762252525.179133415 (stdio) <-- {"body":{"module":{"addressRange":"0xf765f000","debugInfoSize":"983.3KB","id":"C8C14F49-271D-D5EC-D97C-F1A70B997423-FCAC2BC1","name":"ld-linux-armhf.so.3","path":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolFilePath":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":12,"type":"event"}
1762252525.179744244 (stdio) <-- {"body":{"module":{"addressRange":"0xa3a0000","debugInfoSize":"1.2KB","id":"DFA4390A","name":"a.out","path":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/output/TestDAP_output.test_output/a.out","symbolFilePath":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/output/TestDAP_output.test_output/a.out","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":13,"type":"event"}

@ro-i
Copy link
Contributor Author

ro-i commented Nov 4, 2025

(both failures unrelated, the buildbots are already green again)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants