Miscompile with opt -loop-vectorize

|  |  |
| --- | --- |
| Bugzilla Link | [44488](https://llvm.org/bz44488) |
| Resolution | FIXED |
| Resolved on | Feb 06, 2020 08:50 |
| Version | trunk |
| OS | Linux |
| Blocks | llvm/llvm-project#43900  |
| Attachments | [bbi-37172.ll reproducer](https://user-images.githubusercontent.com/55733087/143760244-51d14013-fac4-4057-888f-19386d1d1389.gz) |
| CC | @bjope,@fhahn,@zmodem,@RKSimon,@rotateright |
| Fixed by commit(s) | f14f2a856802e086662d919e2ead641718b27555 |

## Extended Description 


I think I've found a case where -loop-vectorize produces wrong code.

Reproduce with
 opt -loop-vectorize -S -o - bbi-37172.ll

The input looks like

----------------------
@&#8203;v_38 = global i16 12061, align 1
@&#8203;v_39 = global i16 11333, align 1

define i16 @&#8203;main() {
entry:
  br label %for.body

for.cond.cleanup:                                 ; preds = %cond.end5
  %0 = load i16, i16* @&#8203;v_39, align 1
  ret i16 %0

for.body:                                         ; preds = %entry, %cond.end5
  %i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
  %1 = load i16, i16* @&#8203;v_38, align 1
  %cmp1 = icmp eq i16 %1, 32767
  br i1 %cmp1, label %cond.end, label %cond.end

cond.end:                                         ; preds = %for.body, %for.body
  %cmp2 = icmp eq i16 %1, 0
  br i1 %cmp2, label %cond.end5, label %cond.false4

cond.false4:                                      ; preds = %cond.end
  %rem = srem i16 5786, %1
  br label %cond.end5

cond.end5:                                        ; preds = %cond.end, %cond.false4
  %cond6 = phi i16 [ %rem, %cond.false4 ], [ 5786, %cond.end ]
  store i16 %cond6, i16* @&#8203;v_39, align 1
  %inc7 = add nsw i16 %i.07, 1
  %cmp = icmp slt i16 %inc7, 111
  br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !&#8203;26
}
----------------------

If we just focus at the beginning of the loop we have

----------------------
for.body:                                         ; preds = %entry, %cond.end5
  %i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
  %1 = load i16, i16* @&#8203;v_38, align 1
  %cmp1 = icmp eq i16 %1, 32767
  br i1 %cmp1, label %cond.end, label %cond.end

cond.end:                                         ; preds = %for.body, %for.body
  %cmp2 = icmp eq i16 %1, 0
  br i1 %cmp2, label %cond.end5, label %cond.false4

cond.false4:                                      ; preds = %cond.end
  %rem = srem i16 5786, %1
  br label %cond.end5
----------------------

v_38 is 12061 and we will always branch from for.body to cond.end.
Then we do


  %cmp2 = icmp eq i16 %1, 0

which will be false, so

  br i1 %cmp2, label %cond.end5, label %cond.false4

will branch to cond.false4 where we will execute the srem

  %rem = srem i16 5786, %1

v_38 doesn't change throughout the program so this is the path we will use
every loop round.

After vectorization, we instead get the following:

----------------------
vector.body:                                      ; preds = %pred.srem.continue2, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.srem.continue2 ]
  %0 = trunc i32 %index to i16
  %offset.idx = add i16 99, %0
  %broadcast.splatinsert = insertelement <2 x i16> undef, i16 %offset.idx, i32 0
  %broadcast.splat = shufflevector <2 x i16> %broadcast.splatinsert, <2 x i16> undef, <2 x i32> zeroinitializer
  %induction = add <2 x i16> %broadcast.splat, <i16 0, i16 1>
  %1 = add i16 %offset.idx, 0
  %2 = load i16, i16* @&#8203;v_38, align 1
  %3 = load i16, i16* @&#8203;v_38, align 1
  %4 = insertelement <2 x i16> undef, i16 %2, i32 0
  %5 = insertelement <2 x i16> %4, i16 %3, i32 1
  %6 = icmp eq <2 x i16> %5, <i16 32767, i16 32767>
  %7 = icmp eq <2 x i16> %5, zeroinitializer
  %8 = or <2 x i1> %6, %6
  %9 = xor <2 x i1> %7, <i1 true, i1 true>
  %10 = and <2 x i1> %9, %8
  %11 = extractelement <2 x i1> %10, i32 0
  br i1 %11, label %pred.srem.if, label %pred.srem.continue

pred.srem.if:                                     ; preds = %vector.body
  %12 = srem i16 5786, %2
  %13 = insertelement <2 x i16> undef, i16 %12, i32 0
  br label %pred.srem.continue
----------------------

Since v_38 is 12061, %5 will be <12061, 12061>, and thus both %6 and %7 will
be <0, 0>.

From this we get that %8 is <0, 0>, %9 is <1, 1> and finally %10 is then <0, 0>.
%11 will be 0, and we will thus branch to pred.srem.continue and not execute
the srem.

In pred.srem.continue we then have

----------------------
pred.srem.continue:                               ; preds = %pred.srem.if, %vector.body
  %14 = phi <2 x i16> [ undef, %vector.body ], [ %13, %pred.srem.if ]
  %15 = extractelement <2 x i1> %10, i32 1
  br i1 %15, label %pred.srem.if1, label %pred.srem.continue2

pred.srem.if1:                                    ; preds = %pred.srem.continue
  %16 = srem i16 5786, %3
  %17 = insertelement <2 x i16> %14, i16 %16, i32 1
  br label %pred.srem.continue2
----------------------

%10 is still <0, 0> so also %15 will be 0, and we will branch to
pred.srem.continue2 and also skip the srem in pred.srem.if1.

Phew.

So, before vectorization we would execute the srem, but after vectorization
we don't anymore. This looks wrong to me.

I don't know really what is happening inside the vectorizer, but I have to say
I'm suspicious towards the generated

  %8 = or <2 x i1> %6, %6
  %9 = xor <2 x i1> %7, <i1 true, i1 true>
  %10 = and <2 x i1> %9, %8

Especially the

  %8 = or <2 x i1> %6, %6

looks pointless. I suppose in some way it's connected to the

  br i1 %cmp1, label %cond.end, label %cond.end

part in the input, but while that branch always takes us to cond.end,
the %8 will always be <0, 0> and then prevent us from reaching any of the
"srem" instructions since %8 is used in the and to calculate %10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Miscompile with opt -loop-vectorize #43833

Extended Description

cond.end5: ; preds = %cond.end, %cond.false4
%cond6 = phi i16 [ %rem, %cond.false4 ], [ 5786, %cond.end ]
store i16 %cond6, i16* @v_39, align 1
%inc7 = add nsw i16 %i.07, 1
%cmp = icmp slt i16 %inc7, 111
br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !26
}

cond.false4: ; preds = %cond.end
%rem = srem i16 5786, %1
br label %cond.end5

pred.srem.if: ; preds = %vector.body
%12 = srem i16 5786, %2
%13 = insertelement <2 x i16> undef, i16 %12, i32 0
br label %pred.srem.continue

pred.srem.if1: ; preds = %pred.srem.continue
%16 = srem i16 5786, %3
%17 = insertelement <2 x i16> %14, i16 %16, i32 1
br label %pred.srem.continue2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Bugzilla Link	44488
Resolution	FIXED
Resolved on	Feb 06, 2020 08:50
Version	trunk
OS	Linux
Blocks	#43900
Attachments	bbi-37172.ll reproducer
CC	@bjope,@fhahn,@zmodem,@RKSimon,@rotateright
Fixed by commit(s)	`f14f2a8`

Miscompile with opt -loop-vectorize #43833

Description

Extended Description

cond.end5: ; preds = %cond.end, %cond.false4 %cond6 = phi i16 [ %rem, %cond.false4 ], [ 5786, %cond.end ] store i16 %cond6, i16* @​v_39, align 1 %inc7 = add nsw i16 %i.07, 1 %cmp = icmp slt i16 %inc7, 111 br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !​26 }

cond.false4: ; preds = %cond.end %rem = srem i16 5786, %1 br label %cond.end5

pred.srem.if: ; preds = %vector.body %12 = srem i16 5786, %2 %13 = insertelement <2 x i16> undef, i16 %12, i32 0 br label %pred.srem.continue

pred.srem.if1: ; preds = %pred.srem.continue %16 = srem i16 5786, %3 %17 = insertelement <2 x i16> %14, i16 %16, i32 1 br label %pred.srem.continue2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

cond.end5: ; preds = %cond.end, %cond.false4
%cond6 = phi i16 [ %rem, %cond.false4 ], [ 5786, %cond.end ]
store i16 %cond6, i16* @v_39, align 1
%inc7 = add nsw i16 %i.07, 1
%cmp = icmp slt i16 %inc7, 111
br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !26
}

cond.false4: ; preds = %cond.end
%rem = srem i16 5786, %1
br label %cond.end5

pred.srem.if: ; preds = %vector.body
%12 = srem i16 5786, %2
%13 = insertelement <2 x i16> undef, i16 %12, i32 0
br label %pred.srem.continue

pred.srem.if1: ; preds = %pred.srem.continue
%16 = srem i16 5786, %3
%17 = insertelement <2 x i16> %14, i16 %16, i32 1
br label %pred.srem.continue2