Skip to content

Commit c30ea89

Browse files
author
Diptorup Deb
committed
Cleanups to overview section.
1 parent 5e09f86 commit c30ea89

File tree

1 file changed

+50
-146
lines changed

1 file changed

+50
-146
lines changed

docs/source/overview.rst

Lines changed: 50 additions & 146 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,15 @@ Overview
66

77
Data Parallel Extension for Numba* (`numba-dpex`_) is a free and open-source
88
LLVM-based code generator for portable accelerator programming in Python.
9-
numba_dpex defines a new kernel programming domain-specific language (DSL)
10-
in pure Python called `KAPI` that is modeled after the C++ embedded DSL
11-
`SYCL*`_.
9+
numba_dpex defines a new kernel programming domain-specific language (DSL) in
10+
pure Python called `KAPI` that is modeled after the C++ embedded DSL `SYCL*`_. A
11+
KAPI function can be JIT compiled by numba-dpex to generate a "data-parallel"
12+
kernel function that executes in parallel on a supported device. Currently,
13+
compilation of KAPI is possible for x86 CPU devices (using OpenCL CPU drivers),
14+
Intel Gen9 integrated GPUs, Intel UHD integrated GPUs, and Intel discrete GPUs.
1215

13-
The following example illustrates a relatively simple pairwise distance matrix
14-
computation example written in KAPI.
16+
The following example presents an example that uses KAPI to code a pairwise
17+
distance computation.
1518

1619
.. code-block:: python
1720
@@ -35,158 +38,59 @@ computation example written in KAPI.
3538
3639
3740
data = np.random.ranf((10000, 3)).astype(np.float32)
38-
distance = np.empty(shape=(data.shape[0], data.shape[0]), dtype=np.float32)
41+
dist = np.empty(shape=(data.shape[0], data.shape[0]), dtype=np.float32)
3942
exec_range = kapi.Range(data.shape[0], data.shape[0])
40-
kapi.call_kernel(pairwise_distance_kernel, exec_range, data, distance)
41-
42-
Skipping over much of the language details, at a high-level the
43-
``pairwise_distance_kernel`` can be viewed as a "data-parallel" function that
44-
gets executed individually by a set of "work items". That is, each work item
45-
runs the same function for a subset of the elements of the input ``data`` and
46-
``distance`` arrays. For programmers familiar with the CUDA or OpenCL languages,
47-
it is the same programming model referred to as Single Program Multiple Data
48-
(SPMD). As Python has no concept of a work item the KAPI function runs
49-
sequentially resulting in a very slow execution time. Experienced Python
50-
programmers will most probably write a much faster version of the function using
51-
NumPy*.
52-
53-
However, using a JIT compiler numba-dpex can compile a function written in the
54-
KAPI language to a CPython native extension function that executes according to
55-
the SPMD programming model, speeding up the execution time by orders of
56-
magnitude. Currently, compilation of KAPI is possible for x86 CPU devices,
57-
Intel Gen9 integrated GPUs, Intel UHD integrated GPUs, and Intel discrete GPUs.
58-
59-
60-
``numba-dpex`` is an open-source project and can be installed as part of `Intel
61-
AI Analytics Toolkit`_ or the `Intel Distribution for Python*`_. The package is
62-
also available on Anaconda cloud and as a Docker image on GitHub. Please refer
63-
the :doc:`getting_started` page to learn more.
64-
65-
Main Features
66-
-------------
67-
68-
Portable Kernel Programming
69-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
70-
71-
The ``numba-dpex`` kernel programming API has a design similar to Numba's
72-
``cuda.jit`` sub-module. The API is modeled after the `SYCL*`_ language and uses
73-
the `DPC++`_ SYCL runtime. Currently, compilation of kernels is supported for
74-
SPIR-V-based OpenCL and `oneAPI Level Zero`_ devices CPU and GPU devices. In the
75-
future, compilation support for other types of hardware that are supported by
76-
DPC++ will be added.
77-
78-
The following example illustrates a vector addition kernel written with
79-
``numba-dpex`` kernel API.
43+
kapi.call_kernel(pairwise_distance_kernel, exec_range, data, dist)
44+
45+
The ``pairwise_distance_kernel`` function is conceptually a "data-parallel"
46+
function that gets executed individually by a set of "work items". That is, each
47+
work item runs the same function for a subset of the elements of the input
48+
**data** and **distance** arrays. For programmers familiar with the CUDA or
49+
OpenCL languages, it is the programming model referred to as Single Program
50+
Multiple Data (SPMD). Although a KAPI function is conceptually following the
51+
SPMD model, as Python has no concept of a work item a KAPI function
52+
runs sequentially in Python and needs to be JIT compiled for parallel execution.
53+
54+
JIT compiling a KAPI function only requires adding the ``dpex.kernel`` decorator
55+
to the function and calling the function from the ``dpex.call_kernel`` method.
56+
It should be noted that a JIT compiled KAPI function does not support passing in
57+
NumPy arrays. A KAPI function can only be called using either ``dpnp.ndarray``
58+
or ``dpctl.tensor.usm_ndarray`` array objects. The restriction is due to a
59+
compiled KAPI function requiring memory that was allocated on the device where
60+
the kernel should execute. Refer the :doc:`programming_model` and kernel
61+
programming user guide for further details. The modification to
62+
``pairwise_distance_kernel`` function for JIT compilation are shown in the next
63+
example.
8064

8165
.. code-block:: python
8266
83-
import dpnp
67+
from numba_dpex import kernel_api as kapi
8468
import numba_dpex as dpex
85-
86-
87-
@dpex.kernel
88-
def vecadd_kernel(a, b, c):
89-
i = dpex.get_global_id(0)
90-
c[i] = a[i] + b[i]
91-
92-
93-
a = dpnp.ones(1024, device="gpu")
94-
b = dpnp.ones(1024, device="gpu")
95-
c = dpnp.empty_like(a)
96-
97-
vecadd_kernel[dpex.Range(1024)](a, b, c)
98-
print(c)
99-
100-
In the above example, three arrays are allocated on a default ``gpu`` device
101-
using the ``dpnp`` library. The arrays are then passed as input arguments to the
102-
kernel function. The compilation target and the subsequent execution of the
103-
kernel is determined by the input arguments and follow the
104-
"compute-follows-data" programming model as specified in the `Python* Array API
105-
Standard`_. To change the execution target to a CPU, the device keyword needs to
106-
be changed to ``cpu`` when allocating the ``dpnp`` arrays. It is also possible
107-
to leave the ``device`` keyword undefined and let the ``dpnp`` library select a
108-
default device based on environment flag settings. Refer the
109-
:doc:`user_guide/kernel_programming/index` for further details.
110-
111-
``dpjit`` decorator
112-
~~~~~~~~~~~~~~~~~~~
113-
114-
The ``numba-dpex`` package provides a new decorator ``dpjit`` that extends
115-
Numba's ``njit`` decorator. The new decorator is equivalent to
116-
``numba.njit(parallel=True)``, but additionally supports compiling ``dpnp``
117-
functions, ``prange`` loops, and array expressions that use ``dpnp.ndarray``
118-
objects.
119-
120-
Unlike Numba's NumPy parallelization that only supports CPUs, ``dpnp``
121-
expressions are first converted to data-parallel kernels and can then be
122-
`offloaded` to different types of devices. As ``dpnp`` implements the same API
123-
as NumPy*, an existing ``numba.njit`` decorated function that uses
124-
``numpy.ndarray`` may be refactored to use ``dpnp.ndarray`` and decorated with
125-
``dpjit``. Such a refactoring can allow the parallel regions to be offloaded
126-
to a supported GPU device, providing users an additional option to execute their
127-
code parallelly.
128-
129-
The vector addition example depicted using the kernel API can also be
130-
expressed in several different ways using ``dpjit``.
131-
132-
.. code-block:: python
133-
69+
import math
13470
import dpnp
135-
import numba_dpex as dpex
136-
137-
138-
@dpex.dpjit
139-
def vecadd_v1(a, b):
140-
return a + b
14171
14272
143-
@dpex.dpjit
144-
def vecadd_v2(a, b):
145-
return dpnp.add(a, b)
146-
147-
148-
@dpex.dpjit
149-
def vecadd_v3(a, b):
150-
c = dpnp.empty_like(a)
151-
for i in prange(a.shape[0]):
152-
c[i] = a[i] + b[i]
153-
return c
154-
155-
As with the kernel API example, a ``dpjit`` function if invoked with ``dpnp``
156-
input arguments follows the compute-follows-data programming model. Refer
157-
:doc:`user_manual/dpnp_offload/index` for further details.
158-
159-
160-
.. Project Goal
161-
.. ------------
162-
163-
.. If C++ is not your language, you can skip writing data-parallel kernels in SYCL
164-
.. and directly write them in Python.
165-
166-
.. Our package ``numba-dpex`` extends the Numba compiler to allow kernel creation
167-
.. directly in Python via a custom compute API
168-
73+
@dpex.kernel
74+
def pairwise_distance_kernel(item: kapi.Item, data, distance):
75+
i = item.get_id(0)
76+
j = item.get_id(1)
16977
170-
.. Contributing
171-
.. ------------
78+
data_dims = data.shape[1]
17279
173-
.. Refer the `contributing guide
174-
.. <https://github.com/IntelPython/numba-dpex/blob/main/CONTRIBUTING>`_ for
175-
.. information on coding style and standards used in ``numba-dpex``.
80+
d = data.dtype.type(0.0)
81+
for k in range(data_dims):
82+
tmp = data[i, k] - data[j, k]
83+
d += tmp * tmp
17684
177-
.. License
178-
.. -------
85+
distance[j, i] = math.sqrt(d)
17986
180-
.. ``numba-dpex`` is Licensed under Apache License 2.0 that can be found in `LICENSE
181-
.. <https://github.com/IntelPython/numba-dpex/blob/main/LICENSE>`_. All usage and
182-
.. contributions to the project are subject to the terms and conditions of this
183-
.. license.
18487
88+
data = dpnp.random.ranf((10000, 3)).astype(dpnp.float32)
89+
dist = dpnp.empty(shape=(data.shape[0], data.shape[0]), dtype=dpnp.float32)
90+
exec_range = kapi.Range(data.shape[0], data.shape[0])
91+
dpex.call_kernel(pairwise_distance_kernel, exec_range, data, dist)
18592
186-
.. Along with the kernel programming API an auto-offload feature is also provided.
187-
.. The feature enables automatic generation of kernels from data-parallel NumPy
188-
.. library calls and array expressions, Numba ``prange`` loops, and `other
189-
.. "data-parallel by construction" expressions
190-
.. <https://numba.pydata.org/numba-doc/latest/user/parallel.html>`_ that Numba is
191-
.. able to parallelize. Following two examples demonstrate the two ways in which
192-
.. kernels may be written using numba-dpex.
93+
``numba-dpex`` is an open-source project and can be installed as part of `Intel
94+
AI Analytics Toolkit`_ or the `Intel Distribution for Python*`_. The package is
95+
also available on Anaconda cloud, PyPi, and as a Docker image on GitHub.
96+
Refer the :doc:`getting_started` page for further details.

0 commit comments

Comments
 (0)