Skip to content

Commit 67400a3

Browse files
author
Diptorup Deb
committed
launching a kernel section.
1 parent bf21ff8 commit 67400a3

File tree

5 files changed

+65
-15
lines changed

5 files changed

+65
-15
lines changed

docs/source/ext_links.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,6 @@
3131
.. _oneAPI GPU optimization guide: https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/general-purpose-computing-on-gpu.html
3232
.. _dpctl.tensor.usm_ndarray: https://intelpython.github.io/dpctl/latest/docfiles/dpctl/usm_ndarray.html#dpctl.tensor.usm_ndarray
3333
.. _dpnp.ndarray: https://intelpython.github.io/dpnp/reference/ndarray.html
34+
35+
.. _Dispatcher: https://numba.readthedocs.io/en/stable/reference/jit-compilation.html#dispatcher-objects
36+
.. _Unboxes: https://numba.readthedocs.io/en/stable/extending/interval-example.html#boxing-and-unboxing
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. _launching-an-async-kernel:
2+
3+
Async kernel execution
4+
======================
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
.. _launching-a-kernel:
2+
3+
Launching a kernel
4+
==================
5+
6+
A ``kernel`` decorated kapi function produces a ``KernelDispatcher`` object that
7+
is a type of a Numba* `Dispatcher`_ object. However, unlike regular Numba*
8+
Dispatcher objects a ``KernelDispatcher`` object cannot be directly invoked from
9+
either CPython or another compiled Numba* ``jit`` function. To invoke a
10+
``kernel`` decorated function, a programmer has to use the
11+
:func:`numba_dpex.experimental.call_kernel` function.
12+
13+
To invoke a ``KernelDispatcher`` the ``call_kernel`` function requires three
14+
things: the ``KernelDispatcher`` object, the ``Range`` or ``NdRange`` object
15+
over which the kernel is to be executed, and the list of arguments to be passed
16+
to the compiled kernel. Once called with the necessary arguments, the
17+
``call_kernel`` function does the following main things:
18+
19+
- Compiles the ``KernelDispatcher`` object specializing it for the provided
20+
argument types.
21+
22+
- `Unboxes`_ the kernel arguments by converting CPython objects into Numba* or
23+
numba-dpex objects.
24+
25+
- Infer the execution queue on which to submit the kernel from the provided
26+
kernel arguments. (TODO: Refer compute follows data.)
27+
28+
- Submits the kernel to the execution queue.
29+
30+
- Waits for the execution completion, before returning control back to the
31+
caller.
32+
33+
The ``call_kernel`` function can be invoked both from CPython and from another
34+
Numba* compiled function. Note that the ``call_kernel`` function supports only
35+
synchronous execution of kernel and the ``call_kernel_async`` function should be
36+
used for asynchronous mode of kernel execution (refer
37+
:ref:`launching-an-async-kernel`).
38+
39+
40+
.. seealso::
41+
42+
Refer the API documentation for
43+
:func:`numba_dpex.experimental.launcher.call_kernel` for more details.

docs/source/user_guide/kernel_programming/index.rst

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -236,8 +236,10 @@ users should first convert their input tensor or ndarray object into either of
236236
the two supported array types, both of which support DLPack.
237237

238238

239-
Launching a kernel
240-
==================
239+
.. Launching a kernel
240+
.. ==================
241+
242+
.. include:: ./call-kernel.rst
241243

242244
Advanced concepts
243245
*****************
@@ -254,8 +256,10 @@ Group barrier synchronization
254256
Atomic operations
255257
=================
256258

257-
Async kernel execution
258-
======================
259+
.. Async kernel execution
260+
.. ======================
261+
262+
.. include:: ./call-kernel-async.rst
259263

260264
Specializing a kernel or a device_func
261265
======================================

docs/source/user_guide/kernel_programming/writing-range-kernel.rst

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -121,17 +121,13 @@ kernel:
121121
* At least one argument of a kernel should be an array. The requirement is so
122122
that the kernel launcher (:func:`numba_dpex.experimental.call_kernel`) can
123123
determine the execution queue on which to launch the kernel. Refer
124-
the "Launching a kernel" section for more details.
125-
126-
A range kernel has to be executed by calling the
127-
:py:func:`numba_dpex.experimental.launcher.call_kernel` function. The execution
128-
range for the kernel is specified by creating an instance of a
129-
:class:`numba_dpex.kernel_api.Range` class and passing the ``Range`` object as
130-
an argument to ``call_kernel``. The ``call_kernel`` function does three things:
131-
compiles the kernel if needed, "unboxes" all kernel arguments by converting
132-
CPython objects into numba-dpex objects, and finally submitting the kernel to an
133-
execution queue with the specified execution range. Refer the
134-
:doc:`../../autoapi/index` for further details.
124+
the :ref:`launching-a-kernel` section for more details.
125+
126+
A range kernel has to be executed via the
127+
:py:func:`numba_dpex.experimental.launcher.call_kernel` function by passing in
128+
an instance of the :class:`numba_dpex.kernel_api.Range` class. Refer the
129+
:ref:`launching-a-kernel` section for more details on how to launch a range
130+
kernel.
135131

136132
A range kernel is meant to express a basic `parallel-for` calculation that is
137133
ideally suited for embarrassingly parallel kernels such as elementwise

0 commit comments

Comments
 (0)