Minimal reproducer for regression in #804

Performances issues #804 have been reported in https://github.com/IntelPython/numba-dpex/issues/891 and https://github.com/IntelPython/numba-dpex/issues/886

In fact, what I have reported as a performance issue is really a JIT issue, maybe related to caching.  Sorry for misleading you into thinking that might be only related to performance. The way it surfaced on the benchmarks was a general slowdown on GPU because rather than segfaulting it returns corrupted results that trigger a slower code branch in our KMeans (cluster relocation). On CPU it gives segfaults, meaning the wrong code was executed, so I could have thought sooner about errors in the compiled code.

I think it might be related to caching, but I have looked for the error in the python depth of `numba_dpex` (post- #804 ) without any luck, the bug must be well hidden.

I finally came up with a minimal reproducer of what cause the regression in my code:

```python
import numba_dpex as dpex
import dpctl.tensor as dpt


def make_write_values_kernel(n_rows):

    write_values = make_write_values_kernel_func()

    @dpex.kernel
    def write_values_kernel(array_in):
        for row_idx in range(n_rows):
            is_even = (row_idx % 2) == 0
            write_values(array_in, row_idx, is_even)

    return write_values_kernel[1, 1]


def make_write_values_kernel_func():
    write_when_even = _make_write_values_kernel_func(4)
    write_when_odd = _make_write_values_kernel_func(2)

    @dpex.func
    def write_values(array_in, row_idx, is_even):
        if is_even:
            write_when_even(array_in, row_idx)
        else:
            write_when_odd(array_in, row_idx)

    return write_values


def _make_write_values_kernel_func(n_cols):
    @dpex.func
    def write_values(array_in, row_idx):
        for idx in range(n_cols):
            array_in[row_idx, idx] = 1

    return write_values


kernel = make_write_values_kernel(10)
array_in = dpt.empty(sh=(10, 10), dtype=dpt.int64)
kernel(array_in)
print(array_in)
```

Read it attentively, it should output the following:
```
[[1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0]]
```

but instead the output is:

```
[[1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]]
```

(basically, when `write_when_odd` should be called,  it's `write_when_even` that is called instead, since those two funcs have the same code and the same name it suggests a caching issue)

Important notes:

- it works with `numba_dpex == 0.19.0` when using this patch that we apply by monkey-patching in `numba_dpex`: https://github.com/soda-inria/sklearn-numba-dpex/blob/main/sklearn_numba_dpex/patches/load_numba_dpex.py#L45
- it (probably - haven't tested this specific case but our `KMeans` didn't have issues then) works with `numba_dpex < 0.19.0`

In case you wonder why one would come up with such convoluted cascade of `dpex.func`s: basically in `sklearn_numba_dpex` we factor some code using `dpex.func` kernel functions that are defined as closures and re-instantiated with different parameters every time it's needed. See https://github.com/soda-inria/sklearn-numba-dpex/blob/main/sklearn_numba_dpex/kmeans/kernels/_base_kmeans_kernel_funcs.py . This is the best solution for factoring redundant code that I've thought of (and it only works as long as there are no local memory allocation or barriers in the factored code) ).

Note that https://github.com/IntelPython/numba-dpex/issues/892 doesn't seem to be related at all, since it's already wrong for `numba_dpex == 0.19.0`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minimal reproducer for regression in #804 #898

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Minimal reproducer for regression in #804 #898

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions