-
Notifications
You must be signed in to change notification settings - Fork 33
Description
I'm having troubles with using atomics from atomic-ops.spir, with the following error message:
...<elided traceback>...
popenargs = (['spirv-link', '--allow-partial-linkage', '-o', '/tmp/tmpcp_knjn_/2-linked-spirv', '/tmp/tmpcp_knjn_/1-generated-spirv', '/opt/venv/lib/python3.9/site-packages/numba_dpex/ocl/atomics/atomic_ops.spir'],)
kwargs = {}, retcode = 1
cmd = ['spirv-link', '--allow-partial-linkage', '-o', '/tmp/tmpcp_knjn_/2-linked-spirv', '/tmp/tmpcp_knjn_/1-generated-spirv', '/opt/venv/lib/python3.9/site-packages/numba_dpex/ocl/atomics/atomic_ops.spir']
def check_call(*popenargs, **kwargs):
"""Run command with arguments. Wait for command to complete. If
the exit code was zero then return, otherwise raise
CalledProcessError. The CalledProcessError object will have the
return code in the returncode attribute.
The arguments are the same as for the call function. Example:
check_call(["ls", "-l"])
"""
retcode = call(*popenargs, **kwargs)
if retcode:
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
> raise CalledProcessError(retcode, cmd)
E subprocess.CalledProcessError: Command '['spirv-link', '--allow-partial-linkage', '-o', '/tmp/tmpcp_knjn_/2-linked-spirv', '/tmp/tmpcp_knjn_/1-generated-spirv', '/opt/venv/lib/python3.9/site-packages/numba_dpex/ocl/atomics/atomic_ops.spir']' returned non-zero exit status 1.
/opt/pyenv/versions/3.9.16/lib/python3.9/subprocess.py:373: CalledProcessError
error: 1: Conflicting SPIR-V versions: 1.4 (input modules 1 through 1) vs 1.0 (input module 2).
Traceback (most recent call last):
File "/opt/venv/lib/python3.9/site-packages/numba_dpex/spirv_generator.py", line 137, in __del__
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpcp_knjn_/2-linked-spirv'
what could cause such a version mismatch ? I'm trying to get a minimal reproducer but it seems the error does not trigger for all atomics calls - will update.
I'm using a custom numba-dpex build from 0.19.0, with an up to date environment (2023 one api releases, dpctl >= 0.14.1dev1)
(I don't think there are differences between my build environment and the runtime environment. I'm using spirv-tools binaries from ubuntu jammy repositories )
For GPU, the error can be circumvented by using native atomics.
Edit: it seems it's a bug that can be summed up this way: the atomic_ops.spir binary has some SPIR-V version that is determined at build time, and in some cases, the JIT can produce different SPIR-V versions for the kernels, but different versions are not compatible and crash the linker. In my case, the SPIR-V version of atomic_ops.spir is 1.0 and I can fix the bug by passing --spirv-max-version 1.0 to the llvm-spirv call at https://github.com/IntelPython/numba-dpex/blob/main/numba_dpex/spirv_generator.py#L83 . I am not, however, able to explain why suddenly the llvm-spirv starts outputting SPIR-V 1.3 for some of my kernels 🤔