While looking into some GCC auto-vect code generation (https://gcc.gnu.org/PR116075) I noticed that both GCC and LLVM does not optimize this testcase:
https://godbolt.org/z/7KTbhMxcs
#include <arm_sve.h>
svint8_t f(void)
{
svint8_t tt;
tt = svdup_s8 (0);
tt = svinsr (tt, 0);
return tt;
}
The fix for GCC, I have in mind for the GCC auto-vectorization issue will fix the above testcase so I thought I would file it here also. Note also the value 0 does not need to be a constant but both values passed to svdup and svinsr need to be the same.