Skip to content

Commit 07a5b84

Browse files
committed
SelectionDAG: Fix bug in ClusterNeighboringLoads
Summary: The method attempts to find loads that can be legally clustered by looking for loads consuming the same chain glue token. However, the old code looks at _all_ users of values produced by the chain node -- including uses of the loaded/returned value of volatile loads or atomics. This could lead to circular dependencies which then failed during scheduling. With this change, we filter out users by getResNo, i.e. by which SDValue value they use, to ensure that we only look at users of the chain glue token. This appears to be a rather old bug, which is perhaps surprising. However, the test case is actually quite fragile (i.e., it is hidden by fairly small changes), and the test _must_ use volatile loads for the bug to manifest. Reviewers: arsenm, bogner, craig.topper, foad Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74253
1 parent 572fc89 commit 07a5b84

File tree

2 files changed

+32
-2
lines changed

2 files changed

+32
-2
lines changed

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -198,10 +198,10 @@ static void RemoveUnusedGlue(SDNode *N, SelectionDAG *DAG) {
198198
/// outputs to ensure they are scheduled together and in order. This
199199
/// optimization may benefit some targets by improving cache locality.
200200
void ScheduleDAGSDNodes::ClusterNeighboringLoads(SDNode *Node) {
201-
SDNode *Chain = nullptr;
201+
SDValue Chain;
202202
unsigned NumOps = Node->getNumOperands();
203203
if (Node->getOperand(NumOps-1).getValueType() == MVT::Other)
204-
Chain = Node->getOperand(NumOps-1).getNode();
204+
Chain = Node->getOperand(NumOps-1);
205205
if (!Chain)
206206
return;
207207

@@ -234,6 +234,9 @@ void ScheduleDAGSDNodes::ClusterNeighboringLoads(SDNode *Node) {
234234
unsigned UseCount = 0;
235235
for (SDNode::use_iterator I = Chain->use_begin(), E = Chain->use_end();
236236
I != E && UseCount < 100; ++I, ++UseCount) {
237+
if (I.getUse().getResNo() != Chain.getResNo())
238+
continue;
239+
237240
SDNode *User = *I;
238241
if (User == Node || !Visited.insert(User).second)
239242
continue;
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=gfx1010 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK
2+
3+
; This used to cause a circular chain dependency during
4+
; SelectionDAG instruction scheduling.
5+
6+
; CHECK-LABEL: {{^}}_amdgpu_gs_main:
7+
; CHECK: ds_read_b32
8+
; CHECK: ds_read_b32
9+
; CHECK: ds_read_b32
10+
; CHECK: ds_read_b32
11+
define amdgpu_gs float @_amdgpu_gs_main(i8 addrspace(3)* %arg0, i8 addrspace(3)* %arg1, i8 addrspace(3)* %arg2) #0 {
12+
%tmp0 = bitcast i8 addrspace(3)* %arg0 to i32 addrspace(3)* addrspace(3)*
13+
%tmp = load volatile i32 addrspace(3)*, i32 addrspace(3)* addrspace(3)* %tmp0, align 4
14+
15+
%tmp3 = load volatile i32, i32 addrspace(3)* %tmp, align 4
16+
17+
%tmp4a = bitcast i8 addrspace(3)* %arg1 to i32 addrspace(3)*
18+
%tmp4 = load volatile i32, i32 addrspace(3)* %tmp4a, align 4
19+
20+
%tmp7a = getelementptr i32, i32 addrspace(3)* %tmp, i32 8
21+
%tmp8 = load volatile i32, i32 addrspace(3)* %tmp7a, align 4
22+
23+
%tmp9 = add i32 %tmp3, %tmp8
24+
%tmp10 = add i32 %tmp9, %tmp4
25+
%tmp14 = bitcast i32 %tmp10 to float
26+
ret float %tmp14
27+
}

0 commit comments

Comments
 (0)