Skip to content

[X86] Incorrect shuffle of shift optimization on haswell #42575

@nikic

Description

@nikic
Bugzilla Link 43230
Resolution FIXED
Resolved on Sep 09, 2019 02:37
Version 9.0
OS Linux
Blocks #41819
CC @topperc,@zmodem,@RKSimon,@rotateright
Fixed by commit(s) 371305,371307

Extended Description

define <16 x i16> @​test(<16 x i16> %a, <16 x i16> %b) {
%shr = lshr <16 x i16> %a, %b
%shuf = shufflevector <16 x i16> zeroinitializer, <16 x i16> %shr, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 30, i32 15>
ret <16 x i16> %shuf
}

With -mcpu=haswell results in:

vpxor	%xmm2, %xmm2, %xmm2
vpunpckhwd	%ymm2, %ymm1, %ymm1 # ymm1 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
vpunpckhwd	%ymm0, %ymm2, %ymm0 # ymm0 = ymm2[4],ymm0[4],ymm2[5],ymm0[5],ymm2[6],ymm0[6],ymm2[7],ymm0[7],ymm2[12],ymm0[12],ymm2[13],ymm0[13],ymm2[14],ymm0[14],ymm2[15],ymm0[15]
vpsrlvd	%ymm1, %ymm0, %ymm0
vpand	.LCPI0_0(%rip), %ymm0, %ymm0

While in LLVM 7 it was:

    vpxor   xmm2, xmm2, xmm2
    vpunpckhwd      ymm1, ymm1, ymm2 # ymm1 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
    vpunpckhwd      ymm0, ymm2, ymm0 # ymm0 = ymm2[4],ymm0[4],ymm2[5],ymm0[5],ymm2[6],ymm0[6],ymm2[7],ymm0[7],ymm2[12],ymm0[12],ymm2[13],ymm0[13],ymm2[14],ymm0[14],ymm2[15],ymm0[15]
    vpsrlvd ymm0, ymm0, ymm1
    vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]

Godbolt: https://godbolt.org/z/4SyEhQ

I think this transformation is not correct, though maybe my vector foo is too weak.

The debug log has:

With: t53: v32i8 = BUILD_VECTOR undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, Constant:i8<26>, Constant:i8<27>, undef:i8, undef:i8

Combining: t50: v32i8 = X86ISD::PSHUFB t48, t53
Creating new node: t54: v8i32 = undef
Creating new node: t55: v16i16 = bitcast t23
Creating constant: t56: i8 = Constant<-28>
Creating new node: t57: v16i16 = X86ISD::PSHUFLW t55, Constant:i8<-28>
Creating new node: t58: v32i8 = bitcast t57
... into: t58: v32i8 = bitcast t57

Which looks like a non-identity pshufb is replaced with an identity pshuflw.

This happens via matchUnaryPermuteShuffle(), though I haven't looked further.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions