Skip to content

Miscompile with opt -loop-vectorize #43833

@mikaelholmen

Description

@mikaelholmen
Bugzilla Link 44488
Resolution FIXED
Resolved on Feb 06, 2020 08:50
Version trunk
OS Linux
Blocks #43900
Attachments bbi-37172.ll reproducer
CC @bjope,@fhahn,@zmodem,@RKSimon,@rotateright
Fixed by commit(s) f14f2a8

Extended Description

I think I've found a case where -loop-vectorize produces wrong code.

Reproduce with
opt -loop-vectorize -S -o - bbi-37172.ll

The input looks like


@​v_38 = global i16 12061, align 1
@​v_39 = global i16 11333, align 1

define i16 @​main() {
entry:
br label %for.body

for.cond.cleanup: ; preds = %cond.end5
%0 = load i16, i16* @​v_39, align 1
ret i16 %0

for.body: ; preds = %entry, %cond.end5
%i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
%1 = load i16, i16* @​v_38, align 1
%cmp1 = icmp eq i16 %1, 32767
br i1 %cmp1, label %cond.end, label %cond.end

cond.end: ; preds = %for.body, %for.body
%cmp2 = icmp eq i16 %1, 0
br i1 %cmp2, label %cond.end5, label %cond.false4

cond.false4: ; preds = %cond.end
%rem = srem i16 5786, %1
br label %cond.end5

cond.end5: ; preds = %cond.end, %cond.false4
%cond6 = phi i16 [ %rem, %cond.false4 ], [ 5786, %cond.end ]
store i16 %cond6, i16* @​v_39, align 1
%inc7 = add nsw i16 %i.07, 1
%cmp = icmp slt i16 %inc7, 111
br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !​26
}

If we just focus at the beginning of the loop we have


for.body: ; preds = %entry, %cond.end5
%i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
%1 = load i16, i16* @​v_38, align 1
%cmp1 = icmp eq i16 %1, 32767
br i1 %cmp1, label %cond.end, label %cond.end

cond.end: ; preds = %for.body, %for.body
%cmp2 = icmp eq i16 %1, 0
br i1 %cmp2, label %cond.end5, label %cond.false4

cond.false4: ; preds = %cond.end
%rem = srem i16 5786, %1
br label %cond.end5

v_38 is 12061 and we will always branch from for.body to cond.end.
Then we do

%cmp2 = icmp eq i16 %1, 0

which will be false, so

br i1 %cmp2, label %cond.end5, label %cond.false4

will branch to cond.false4 where we will execute the srem

%rem = srem i16 5786, %1

v_38 doesn't change throughout the program so this is the path we will use
every loop round.

After vectorization, we instead get the following:


vector.body: ; preds = %pred.srem.continue2, %vector.ph
%index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.srem.continue2 ]
%0 = trunc i32 %index to i16
%offset.idx = add i16 99, %0
%broadcast.splatinsert = insertelement <2 x i16> undef, i16 %offset.idx, i32 0
%broadcast.splat = shufflevector <2 x i16> %broadcast.splatinsert, <2 x i16> undef, <2 x i32> zeroinitializer
%induction = add <2 x i16> %broadcast.splat, <i16 0, i16 1>
%1 = add i16 %offset.idx, 0
%2 = load i16, i16* @​v_38, align 1
%3 = load i16, i16* @​v_38, align 1
%4 = insertelement <2 x i16> undef, i16 %2, i32 0
%5 = insertelement <2 x i16> %4, i16 %3, i32 1
%6 = icmp eq <2 x i16> %5, <i16 32767, i16 32767>
%7 = icmp eq <2 x i16> %5, zeroinitializer
%8 = or <2 x i1> %6, %6
%9 = xor <2 x i1> %7, <i1 true, i1 true>
%10 = and <2 x i1> %9, %8
%11 = extractelement <2 x i1> %10, i32 0
br i1 %11, label %pred.srem.if, label %pred.srem.continue

pred.srem.if: ; preds = %vector.body
%12 = srem i16 5786, %2
%13 = insertelement <2 x i16> undef, i16 %12, i32 0
br label %pred.srem.continue

Since v_38 is 12061, %5 will be <12061, 12061>, and thus both %6 and %7 will
be <0, 0>.

From this we get that %8 is <0, 0>, %9 is <1, 1> and finally %10 is then <0, 0>.
%11 will be 0, and we will thus branch to pred.srem.continue and not execute
the srem.

In pred.srem.continue we then have


pred.srem.continue: ; preds = %pred.srem.if, %vector.body
%14 = phi <2 x i16> [ undef, %vector.body ], [ %13, %pred.srem.if ]
%15 = extractelement <2 x i1> %10, i32 1
br i1 %15, label %pred.srem.if1, label %pred.srem.continue2

pred.srem.if1: ; preds = %pred.srem.continue
%16 = srem i16 5786, %3
%17 = insertelement <2 x i16> %14, i16 %16, i32 1
br label %pred.srem.continue2

%10 is still <0, 0> so also %15 will be 0, and we will branch to
pred.srem.continue2 and also skip the srem in pred.srem.if1.

Phew.

So, before vectorization we would execute the srem, but after vectorization
we don't anymore. This looks wrong to me.

I don't know really what is happening inside the vectorizer, but I have to say
I'm suspicious towards the generated

%8 = or <2 x i1> %6, %6
%9 = xor <2 x i1> %7, <i1 true, i1 true>
%10 = and <2 x i1> %9, %8

Especially the

%8 = or <2 x i1> %6, %6

looks pointless. I suppose in some way it's connected to the

br i1 %cmp1, label %cond.end, label %cond.end

part in the input, but while that branch always takes us to cond.end,
the %8 will always be <0, 0> and then prevent us from reaching any of the
"srem" instructions since %8 is used in the and to calculate %10.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions