Skip to content

Minimal reproducer for regression in #804 #898

@fcharras

Description

@fcharras

Performances issues #804 have been reported in #891 and #886

In fact, what I have reported as a performance issue is really a JIT issue, maybe related to caching. Sorry for misleading you into thinking that might be only related to performance. The way it surfaced on the benchmarks was a general slowdown on GPU because rather than segfaulting it returns corrupted results that trigger a slower code branch in our KMeans (cluster relocation). On CPU it gives segfaults, meaning the wrong code was executed, so I could have thought sooner about errors in the compiled code.

I think it might be related to caching, but I have looked for the error in the python depth of numba_dpex (post- #804 ) without any luck, the bug must be well hidden.

I finally came up with a minimal reproducer of what cause the regression in my code:

import numba_dpex as dpex
import dpctl.tensor as dpt


def make_write_values_kernel(n_rows):

    write_values = make_write_values_kernel_func()

    @dpex.kernel
    def write_values_kernel(array_in):
        for row_idx in range(n_rows):
            is_even = (row_idx % 2) == 0
            write_values(array_in, row_idx, is_even)

    return write_values_kernel[1, 1]


def make_write_values_kernel_func():
    write_when_even = _make_write_values_kernel_func(4)
    write_when_odd = _make_write_values_kernel_func(2)

    @dpex.func
    def write_values(array_in, row_idx, is_even):
        if is_even:
            write_when_even(array_in, row_idx)
        else:
            write_when_odd(array_in, row_idx)

    return write_values


def _make_write_values_kernel_func(n_cols):
    @dpex.func
    def write_values(array_in, row_idx):
        for idx in range(n_cols):
            array_in[row_idx, idx] = 1

    return write_values


kernel = make_write_values_kernel(10)
array_in = dpt.empty(sh=(10, 10), dtype=dpt.int64)
kernel(array_in)
print(array_in)

Read it attentively, it should output the following:

[[1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]]

but instead the output is:

[[1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]]

(basically, when write_when_odd should be called, it's write_when_even that is called instead, since those two funcs have the same code and the same name it suggests a caching issue)

Important notes:

In case you wonder why one would come up with such convoluted cascade of dpex.funcs: basically in sklearn_numba_dpex we factor some code using dpex.func kernel functions that are defined as closures and re-instantiated with different parameters every time it's needed. See https://github.com/soda-inria/sklearn-numba-dpex/blob/main/sklearn_numba_dpex/kmeans/kernels/_base_kmeans_kernel_funcs.py . This is the best solution for factoring redundant code that I've thought of (and it only works as long as there are no local memory allocation or barriers in the factored code) ).

Note that #892 doesn't seem to be related at all, since it's already wrong for numba_dpex == 0.19.0.

Metadata

Metadata

Assignees

Labels

userUser submitted issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions