Consider the interleave store with tail gap only:
for (int i = 0; i < n; i++) {
a[3*i + 0] = i;
a[3*i + 1] = i;
// No access a[3*i + 2]
}
Currently, the vectorizer can converts it into a widen masked stores. When the gap mask has only a single true-to-false transition across the lanes, we can try to convert it into a vssseg instruction in the InterleavedAccessPass. I believe this could serve as a good starting point for supporting strided segment instructions.
Note: Masked load/store instructions can only represent strided segment accesses with compile-time constant stride. For runtime stride cases, alternative IR constructs are still required.
@mshockwave