Skip to content

Conflicting SPIR-V versions when linking to atomic-ops.spir #868

@fcharras

Description

@fcharras

I'm having troubles with using atomics from atomic-ops.spir, with the following error message:

...<elided traceback>...

popenargs = (['spirv-link', '--allow-partial-linkage', '-o', '/tmp/tmpcp_knjn_/2-linked-spirv', '/tmp/tmpcp_knjn_/1-generated-spirv', '/opt/venv/lib/python3.9/site-packages/numba_dpex/ocl/atomics/atomic_ops.spir'],)
kwargs = {}, retcode = 1
cmd = ['spirv-link', '--allow-partial-linkage', '-o', '/tmp/tmpcp_knjn_/2-linked-spirv', '/tmp/tmpcp_knjn_/1-generated-spirv', '/opt/venv/lib/python3.9/site-packages/numba_dpex/ocl/atomics/atomic_ops.spir']

    def check_call(*popenargs, **kwargs):
        """Run command with arguments.  Wait for command to complete.  If
        the exit code was zero then return, otherwise raise
        CalledProcessError.  The CalledProcessError object will have the
        return code in the returncode attribute.
    
        The arguments are the same as for the call function.  Example:
    
        check_call(["ls", "-l"])
        """
        retcode = call(*popenargs, **kwargs)
        if retcode:
            cmd = kwargs.get("args")
            if cmd is None:
                cmd = popenargs[0]
>           raise CalledProcessError(retcode, cmd)
E           subprocess.CalledProcessError: Command '['spirv-link', '--allow-partial-linkage', '-o', '/tmp/tmpcp_knjn_/2-linked-spirv', '/tmp/tmpcp_knjn_/1-generated-spirv', '/opt/venv/lib/python3.9/site-packages/numba_dpex/ocl/atomics/atomic_ops.spir']' returned non-zero exit status 1.

/opt/pyenv/versions/3.9.16/lib/python3.9/subprocess.py:373: CalledProcessError

error: 1: Conflicting SPIR-V versions: 1.4 (input modules 1 through 1) vs 1.0 (input module 2).

Traceback (most recent call last):
  File "/opt/venv/lib/python3.9/site-packages/numba_dpex/spirv_generator.py", line 137, in __del__
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpcp_knjn_/2-linked-spirv'

what could cause such a version mismatch ? I'm trying to get a minimal reproducer but it seems the error does not trigger for all atomics calls - will update.

I'm using a custom numba-dpex build from 0.19.0, with an up to date environment (2023 one api releases, dpctl >= 0.14.1dev1)

(I don't think there are differences between my build environment and the runtime environment. I'm using spirv-tools binaries from ubuntu jammy repositories )

For GPU, the error can be circumvented by using native atomics.

Edit: it seems it's a bug that can be summed up this way: the atomic_ops.spir binary has some SPIR-V version that is determined at build time, and in some cases, the JIT can produce different SPIR-V versions for the kernels, but different versions are not compatible and crash the linker. In my case, the SPIR-V version of atomic_ops.spir is 1.0 and I can fix the bug by passing --spirv-max-version 1.0 to the llvm-spirv call at https://github.com/IntelPython/numba-dpex/blob/main/numba_dpex/spirv_generator.py#L83 . I am not, however, able to explain why suddenly the llvm-spirv starts outputting SPIR-V 1.3 for some of my kernels 🤔

Metadata

Metadata

Assignees

No one assigned

    Labels

    userUser submitted issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions