We have a race as we read the IsSPMD global inside mapping::getBlockSize() before kernel initialization is done and the value is available to all threads.
Similar errors existed before, this one was introduced as we started to adjust getBlockSize for the extra warp/wave in generic mode.