See reproducer in https://godbolt.org/z/v7fqorb1h.
a2 = a1 + 256;
a2 has an interleaved store access, and a1 has a simple linear access.
There are no actual dependencies between them when VF is 4, and it looks safe for the loop to be vectorized.
BTW, changing a2[2*i] to a2[i] makes vectorization safe.