You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2
2
.. SPDX-License-Identifier: Apache-2.0
3
3
4
+
.. currentmodule:: cuda.core.experimental
5
+
4
6
Overview
5
7
========
6
8
@@ -18,26 +20,25 @@ including:
18
20
- and much more!
19
21
20
22
Rather than providing 1:1 equivalents of the CUDA driver and runtime APIs
21
-
(for that, see [``cuda.bindings``][bindings]), ``cuda.core`` provides high-level constructs such as:
23
+
(for that, see `cuda.bindings<https://nvidia.github.io/cuda-python/cuda-bindings/latest/>`_), ``cuda.core`` provides high-level constructs such as:
22
24
23
-
- {class}``Device <cuda.core.experimental.Device>`` class for GPU device operations and context management.
24
-
- {class}``Buffer <cuda.core.experimental.Buffer>`` and {class}``MemoryResource <cuda.core.experimental.MemoryResource>`` classes for memory allocation and management.
25
-
- {class}``Program <cuda.core.experimental.Program>`` for JIT compilation of CUDA kernels.
26
-
- {class}``GraphBuilder <cuda.core.experimental.GraphBuilder>`` for building and executing CUDA graphs.
27
-
- {class}``Stream <cuda.core.experimental.Stream>`` and {class}``Event <cuda.core.experimental.Event>`` for asynchronous execution and timing.
25
+
- :class:`Device` class for GPU device operations and context management.
26
+
- :class:`Buffer` and :class:`MemoryResource` classes for memory allocation and management.
27
+
- :class:`Program` for JIT compilation of CUDA kernels.
28
+
- :class:`GraphBuilder` for building and executing CUDA graphs.
29
+
- :class:`Stream` and :class:`Event` for asynchronous execution and timing.
28
30
29
31
Example: Compiling and Launching a CUDA kernel
30
32
----------------------------------------------
31
33
32
34
To get a taste for ``cuda.core``, let's walk through a simple example that compiles and launches a vector addition kernel.
33
-
You can find the complete example in [``vector_add.py``][vector_add_example].
35
+
You can find the complete example in `vector_add.py<https://github.com/NVIDIA/cuda-python/tree/main/cuda_core/examples/vector_add.py>`_.
34
36
35
37
First, we define a string containing the CUDA C++ kernel. Note that this is a templated kernel:
36
38
37
39
.. code-block:: python
38
40
39
-
compute c = a + b
40
-
=================
41
+
# compute c = a + b
41
42
code ="""
42
43
template<typename T>
43
44
__global__ void vector_add(const T* A,
@@ -51,9 +52,9 @@ First, we define a string containing the CUDA C++ kernel. Note that this is a te
51
52
}
52
53
"""
53
54
54
-
Next, we create a {class}``Device <cuda.core.experimental.Device>`` object
55
-
and a corresponding {class}``Stream <cuda.core.experimental.Stream>``.
56
-
Don't forget to use {meth}``Device.set_current() <cuda.core.experimental.Device.set_current>``!
55
+
Next, we create a :class:`Device` object
56
+
and a corresponding :class:`Stream`.
57
+
Don't forget to use :meth:`Device.set_current`!
57
58
58
59
.. code-block:: python
59
60
@@ -64,9 +65,9 @@ Don't forget to use {meth}``Device.set_current() <cuda.core.experimental.Device.
64
65
dev.set_current()
65
66
s = dev.create_stream()
66
67
67
-
Next, we compile the CUDA C++ kernel from earlier using the {class}``Program <cuda.core.experimental.Program>`` class.
68
+
Next, we compile the CUDA C++ kernel from earlier using the :class:`Program` class.
68
69
The result of the compilation is saved as a CUBIN.
69
-
Note the use of the ``name_expressions`` parameter to the {meth}``Program.compile() <cuda.core.experimental.Program.compile>`` method to specify which kernel template instantiations to compile:
70
+
Note the use of the ``name_expressions`` parameter to the :meth:`Program.compile` method to specify which kernel template instantiations to compile:
70
71
71
72
.. code-block:: python
72
73
@@ -76,28 +77,26 @@ Note the use of the ``name_expressions`` parameter to the {meth}``Program.compil
76
77
mod = prog.compile("cubin", name_expressions=("vector_add<float>",))
77
78
78
79
Next, we retrieve the compiled kernel from the CUBIN and prepare the arguments and kernel configuration.
79
-
We're using [CuPy][cupy] arrays as inputs for this example, but you can use PyTorch tensors too
80
-
(we show how to do this in one of our [examples][examples]).
80
+
We're using `CuPy<https://cupy.dev/>`_ arrays as inputs for this example, but you can use PyTorch tensors too
81
+
(we show how to do this in one of our `examples<https://github.com/NVIDIA/cuda-python/tree/main/cuda_core/examples>`_).
81
82
82
83
.. code-block:: python
83
84
84
85
ker = mod.get_kernel("vector_add<float>")
85
86
86
-
Prepare input/output arrays (using CuPy)
87
-
========================================
87
+
# Prepare input/output arrays (using CuPy)
88
88
size =50000
89
89
rng = cp.random.default_rng()
90
90
a = rng.random(size, dtype=cp.float32)
91
91
b = rng.random(size, dtype=cp.float32)
92
92
c = cp.empty_like(a)
93
93
94
-
Configure launch parameters
95
-
===========================
94
+
# Configure launch parameters
96
95
block =256
97
96
grid = (size + block -1) // block
98
97
config = LaunchConfig(grid=grid, block=block)
99
98
100
-
Finally, we use the {func}``launch <cuda.core.experimental.launch>`` function to execute our kernel on the specified stream with the given configuration and arguments. Note the use of ``.data.ptr`` to get the pointer to the array data.
99
+
Finally, we use the :func:`launch` function to execute our kernel on the specified stream with the given configuration and arguments. Note the use of ``.data.ptr`` to get the pointer to the array data.
101
100
102
101
.. code-block:: python
103
102
@@ -113,11 +112,4 @@ Examples and Recipes
113
112
As we mentioned before, ``cuda.core`` can do much more than just compile and launch kernels.
114
113
115
114
The best way to explore and learn the different features ``cuda.core`` is through
116
-
our [``examples``][examples]. Find one that matches your use-case, and modify it to fit your needs!
our `examples <https://github.com/NVIDIA/cuda-python/tree/main/cuda_core/examples>`_. Find one that matches your use-case, and modify it to fit your needs!
0 commit comments