launching a kernel section.

Diptorup Deb · Diptorup Deb · commit 67400a315989 · 2024-03-19T00:37:31.000-05:00
diff --git a/docs/source/ext_links.txt b/docs/source/ext_links.txt
@@ -31,3 +31,6 @@
 .. _oneAPI GPU optimization guide: https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/general-purpose-computing-on-gpu.html
 .. _dpctl.tensor.usm_ndarray: https://intelpython.github.io/dpctl/latest/docfiles/dpctl/usm_ndarray.html#dpctl.tensor.usm_ndarray
 .. _dpnp.ndarray: https://intelpython.github.io/dpnp/reference/ndarray.html
+
+.. _Dispatcher: https://numba.readthedocs.io/en/stable/reference/jit-compilation.html#dispatcher-objects
+.. _Unboxes: https://numba.readthedocs.io/en/stable/extending/interval-example.html#boxing-and-unboxing
diff --git a/docs/source/user_guide/kernel_programming/call-kernel-async.rst b/docs/source/user_guide/kernel_programming/call-kernel-async.rst
@@ -0,0 +1,4 @@
+.. _launching-an-async-kernel:
+
+Async kernel execution
+======================
diff --git a/docs/source/user_guide/kernel_programming/call-kernel.rst b/docs/source/user_guide/kernel_programming/call-kernel.rst
@@ -0,0 +1,43 @@
+.. _launching-a-kernel:
+
+Launching a kernel
+==================
+
+A ``kernel`` decorated kapi function produces a ``KernelDispatcher`` object that
+is a type of a Numba* `Dispatcher`_ object. However, unlike regular Numba*
+Dispatcher objects a ``KernelDispatcher`` object cannot be directly invoked from
+either CPython or another compiled Numba* ``jit`` function. To invoke a
+``kernel`` decorated function, a programmer has to use the
+:func:`numba_dpex.experimental.call_kernel` function.
+
+To invoke a ``KernelDispatcher`` the ``call_kernel`` function requires three
+things: the ``KernelDispatcher`` object, the ``Range`` or ``NdRange`` object
+over which the kernel is to be executed, and the list of arguments to be passed
+to the compiled kernel. Once called with the necessary  arguments, the
+``call_kernel`` function does the following main things:
+
+- Compiles the ``KernelDispatcher`` object specializing it for the provided
+  argument types.
+
+- `Unboxes`_  the kernel arguments by converting CPython objects into Numba* or
+   numba-dpex objects.
+
+- Infer the execution queue on which to submit the kernel from the provided
+  kernel arguments. (TODO: Refer compute follows data.)
+
+- Submits the kernel to the execution queue.
+
+- Waits for the execution completion, before returning control back to the
+  caller.
+
+The ``call_kernel`` function can be invoked both from CPython and from another
+Numba* compiled function. Note that the ``call_kernel`` function supports only
+synchronous execution of kernel and the ``call_kernel_async`` function should be
+used for asynchronous mode of kernel execution (refer
+:ref:`launching-an-async-kernel`).
+
+
+.. seealso::
+
+    Refer the API documentation for
+    :func:`numba_dpex.experimental.launcher.call_kernel` for more details.
diff --git a/docs/source/user_guide/kernel_programming/index.rst b/docs/source/user_guide/kernel_programming/index.rst
@@ -236,8 +236,10 @@ users should first convert their input tensor or ndarray object into either of
 the two supported array types, both of which support DLPack.
 
 
-Launching a kernel
-==================
+.. Launching a kernel
+.. ==================
+
+.. include:: ./call-kernel.rst
 
 Advanced concepts
 *****************
@@ -254,8 +256,10 @@ Group barrier synchronization
 Atomic operations
 =================
 
-Async kernel execution
-======================
+.. Async kernel execution
+.. ======================
+
+.. include:: ./call-kernel-async.rst
 
 Specializing a kernel or a device_func
 ======================================
diff --git a/docs/source/user_guide/kernel_programming/writing-range-kernel.rst b/docs/source/user_guide/kernel_programming/writing-range-kernel.rst
@@ -121,17 +121,13 @@ kernel:
 * At least one argument of a kernel should be an array. The requirement is so
   that the kernel launcher (:func:`numba_dpex.experimental.call_kernel`) can
   determine the execution queue on which to launch the kernel. Refer
-  the "Launching a kernel" section for more details.
-
-A range kernel has to be executed by calling the
-:py:func:`numba_dpex.experimental.launcher.call_kernel` function. The execution
-range for the kernel is specified by creating an instance of a
-:class:`numba_dpex.kernel_api.Range` class and passing the ``Range`` object as
-an argument to ``call_kernel``. The ``call_kernel`` function does three things:
-compiles the kernel if needed, "unboxes" all kernel arguments by converting
-CPython objects into numba-dpex objects, and finally submitting the kernel to an
-execution queue with the specified execution range. Refer the
-:doc:`../../autoapi/index` for further details.
+  the :ref:`launching-a-kernel` section for more details.
+
+A range kernel has to be executed via the
+:py:func:`numba_dpex.experimental.launcher.call_kernel` function by passing in
+an instance of the :class:`numba_dpex.kernel_api.Range` class. Refer the
+:ref:`launching-a-kernel` section for more details on how to launch a range
+kernel.
 
 A range kernel is meant to express a basic `parallel-for` calculation that is
 ideally suited for embarrassingly parallel kernels such as elementwise