You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][CUDA][HIP] Fix enable-global-offset flag (#11674)
- Modify the globaloffset pass to remove calls to the
`llvm.nvvm.implicit.offset` and `llvm.amdgcn.implicit.offset` from the
IR during the SYCL globaloffset pass when `-enable-global-offset=false`.
Remove their respective uses, i.e. GEPs and Loads and replace further
uses of the latter with 0 constants.
Ensure that these intrinsics do not occur anymore during target
lowering.
Before, in some cases a compilation error was thrown because the
intrinsic could not be selected for the AMDGPU and NVPTX targets.
Based on the inspection of the IR, any calls of the intrinsic were
probably expected to be fully removed after the globaloffset pass.
- Replace Loads from the intrinsic with known constants and enable
further optimization of the IR to remove dead code.
In our observed cases, several kernels with implicit global offset
failed to remove useless stores to the stack.
0 commit comments