[LLVM] Suboptimal code generated for rounding right shifts on NEON, AArch64 SIMD, LSX, and RISC-V V

Here is a link to a snippet that generates suboptimal code on NEON, AArch64 SIMD, LSX, and RISC-V V:
https://alive2.llvm.org/ce/z/V82FtF

Alive2 has determined that the transformation of `(a >> b) + ((b == 0) ? 0 : ((a >> (b - 1)) & 1))` to `(b == 0) ? a : ((a >> (b - 1)) - (a >> b))` seems to be correct.

The above snippet can be further optimized to the following on ARMv7 NEON (arm-linux-gnueabihf):
```
src1:                                   @ @src1
        vneg.s32        q9, q1
        vrshl.u32        q0, q0, q9
        mov     pc, lr
tgt1:                                   @ @tgt1
        vneg.s32        q9, q1
        vrshl.u32        q0, q0, q9
        mov     pc, lr
src2:                                   @ @src2
        vneg.s32        q9, q1
        vrshl.s32        q0, q0, q9
        mov     pc, lr
tgt2:                                   @ @tgt2
        vneg.s32        q9, q1
        vrshl.s32        q0, q0, q9
        mov     pc, lr
```

The above snippet can be further optimized to the following on AArch64:
```
src1:                                   // @src1
        neg     v1.4s, v1.4s
        urshl    v0.4s, v0.4s, v1.4s
        ret
tgt1:                                   // @tgt1
        neg     v1.4s, v1.4s
        urshl    v0.4s, v0.4s, v1.4s
        ret
src2:                                   // @src2
        neg     v1.4s, v1.4s
        srshl    v0.4s, v0.4s, v1.4s
        ret
tgt2:                                   // @tgt2
        neg     v1.4s, v1.4s
        srshl    v0.4s, v0.4s, v1.4s
        ret
```

The above snippet can be further optimized to the following on LoongArch64 with LSX:
```
src1:                                   # @src1
        vsrlr.w  $vr0, $vr0, $vr1
        ret
tgt1:                                   # @tgt1
        vsrlr.w  $vr0, $vr0, $vr1
        ret
src2:                                   # @src2
        vsrar.w  $vr0, $vr0, $vr1
        ret
tgt2:                                   # @tgt2
        vsrar.w  $vr0, $vr0, $vr1
        ret
```

The above snippet can be further optimized to the following on 64-bit RISC-V with the "V" extension:
```
src1:                                   # @src1
        csrwi   vxrm, 0
        vsetivli        zero, 4, e32, m1, ta, ma
        vssrl.vv        v8, v8, v9
        ret
tgt1:                                   # @tgt1
        csrwi   vxrm, 0
        vsetivli        zero, 4, e32, m1, ta, ma
        vssrl.vv        v8, v8, v9
        ret
src2:                                   # @src2
        csrwi   vxrm, 0
        vsetivli        zero, 4, e32, m1, ta, ma
        vssra.vv        v8, v8, v9
        ret
tgt2:                                   # @tgt2
        csrwi   vxrm, 0
        vsetivli        zero, 4, e32, m1, ta, ma
        vssra.vv        v8, v8, v9
        ret
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLVM] Suboptimal code generated for rounding right shifts on NEON, AArch64 SIMD, LSX, and RISC-V V #147863

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[LLVM] Suboptimal code generated for rounding right shifts on NEON, AArch64 SIMD, LSX, and RISC-V V #147863

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions