Skip to content

Commit 4fb95fc

Browse files
gmlueckPennycook
andauthored
[SYCL][Doc] Add "device-if" extension (#3525)
Add a new proposed extension specification that allows device code to conditionally use features that are supported on only some devices. Co-authored-by: John Pennycook <[email protected]>
1 parent 3269c47 commit 4fb95fc

File tree

2 files changed

+392
-0
lines changed

2 files changed

+392
-0
lines changed
Lines changed: 391 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,391 @@
1+
= SYCL_EXT_ONEAPI_DEVICE_IF
2+
:source-highlighter: coderay
3+
:coderay-linenums-mode: table
4+
5+
// This section needs to be after the document title.
6+
:doctype: book
7+
:toc2:
8+
:toc: left
9+
:encoding: utf-8
10+
:lang: en
11+
12+
:blank: pass:[ +]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
20+
== Notice
21+
22+
Copyright (c) 2021-2021 Intel Corporation. All rights reserved.
23+
24+
NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
25+
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
26+
used by permission by Khronos.
27+
28+
This extension is written against the SYCL 2020 revision 3 specification. All
29+
references below to the "core SYCL specification" or to section numbers in the
30+
SYCL specification refer to that revision.
31+
32+
33+
== Introduction
34+
35+
This extension provides a way for device code to query the device on which it
36+
is running in order to conditionally use features that may not be supported on
37+
all devices. This is different from the existing `device::has()` function
38+
because the extension can be called from device code while `device::has()` can
39+
only be called from host code.
40+
41+
The motivating use case for this extension is for developers who provide device
42+
side libraries. To illustrate, consider an application developer (i.e. someone
43+
who is **not** developing a device-side library) who wants to code a kernel
44+
that conditionally uses a feature that is not available on all devices. This
45+
developer can write two versions of the kernel, one which uses the features and
46+
one that does not. Then the developer can use `device::has()` to test whether
47+
the device supports the feature and submit one or the other kernels according
48+
to the device's capabilities. (To avoid code duplication, the developer could
49+
write the kernel as a template using `if constexpr` and then instantiate the
50+
template according to the device capabilities.)
51+
52+
This technique, however, is not available to a developer writing a device-side
53+
library because such a developer does not have control over the host code that
54+
launches the kernel. The developer could expose the library function as a
55+
template with a template parameter that controls the use of the conditional
56+
feature. For example, consider a library function "frob" that wants to
57+
conditionally use a feature named "fancy":
58+
59+
```
60+
template<bool useFancy>
61+
void frob() { /*...*/ }
62+
```
63+
64+
The caller of the library function would be responsible for calling
65+
`device::has(aspect::fancy)` to check if the device supports this feature, and
66+
then submit a kernel that calls the appropriately instantiated version of the
67+
`frob()` template function. However, this has the serious downside that the
68+
library developer must expose all device features which the library wants
69+
to conditionally use.
70+
71+
This extension solves the problem by providing a way for the library developer
72+
to check for device features from within the library without exposing this to
73+
its callers. For example:
74+
75+
```
76+
void frob(kernel_handler kh) {
77+
kh.ext_oneapi_if_device_has<aspect::fancy>([&]() {
78+
// use "fancy" feature
79+
});
80+
}
81+
```
82+
83+
Callers of the library must pass the `kernel_handler` to the library function,
84+
but callers do not need to know which conditions the library will check.
85+
86+
The structure of the extension has been designed such that the overhead of the
87+
condition check can be entirely eliminated by the device compiler (in both the
88+
AOT case where the ahead-of-time compiler produces native device code and in
89+
the JIT case where the online compiler produces native device code.)
90+
91+
92+
== Feature test macro
93+
94+
This extension provides a feature-test macro as described in the core SYCL
95+
specification section 6.3.3 "Feature test macros". Therefore, an
96+
implementation supporting this extension must predefine the macro
97+
`SYCL_EXT_ONEAPI_DEVICE_IF` to one of the values defined in the table below.
98+
Applications can test for the existence of this macro to determine if the
99+
implementation supports this feature, or applications can test the macro's
100+
value to determine which of the extension's APIs the implementation supports.
101+
102+
[%header,cols="1,5"]
103+
|===
104+
|Value |Description
105+
|1 |Initial extension version. Base features are supported.
106+
|===
107+
108+
109+
== Extension to `kernel_handler`
110+
111+
This extension adds a new member function with two overloads to the
112+
`kernel_handler` class:
113+
114+
```
115+
template<aspect ...Aspects, typename T>
116+
void ext_oneapi_if_device_has(T fnTrue);
117+
118+
template<aspect ...Aspects, typename T1, typename T2>
119+
void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse);
120+
```
121+
122+
Both overloads are templated with an initial parameter pack which is a list of
123+
device aspects. These aspects form the condition that is checked. If the
124+
device on which the kernel is running has **all** of the listed aspects, the
125+
condition is true, otherwise the condition is false. Valid aspect names
126+
include the enumerated aspect values defined in the core SYCL specification as
127+
well as any enumerated aspect values defined by other extensions that the
128+
implementation supports.
129+
130+
The first overload takes a single callable (typically a lambda expression),
131+
which the implementation calls only if the condition is true. The second
132+
overload takes two callables; the implementation calls the first when the
133+
condition is true and calls the second when the condition is false.
134+
135+
Typically, an application uses the first callable to use a device feature
136+
which is only available on a device that has all of the listed aspects, and it
137+
uses the second callable as a fallback on other devices.
138+
139+
== Example usage
140+
141+
This non-normative section shows some example usages of the extension.
142+
143+
=== Basic usage
144+
145+
Conditionally use the `sycl::half` type, which is only available on devices
146+
that have the `fp16` aspect.
147+
148+
```
149+
void device_code(kernel_handler kh) {
150+
kh.ext_oneapi_if_device_has<aspect::fp16>([&]() {
151+
sycl::half val = 3.14;
152+
/* ... */
153+
});
154+
}
155+
```
156+
157+
=== Conditional with fallback
158+
159+
Use the `sycl::half` type if the device supports it, otherwise fallback to
160+
using `float`.
161+
162+
```
163+
void device_code(kernel_handler kh) {
164+
kh.ext_oneapi_if_device_has<aspect::fp16>([&]() {
165+
sycl::half val = 3.14;
166+
/* ... */
167+
}, [&]() {
168+
float val = 3.14;
169+
/* ... */
170+
});
171+
}
172+
```
173+
174+
=== Test for multiple aspects
175+
176+
If more than one aspect is listed, the condition is only true if the device has
177+
all of the listed aspects. This code checks for both `fp64` and `atomic64` in
178+
order to tell if atomic operations are allowed on 64-bit floating point values.
179+
180+
```
181+
void device_code(kernel_handler kh) {
182+
kh.ext_oneapi_if_device_has<aspect::fp64, aspect::atomic64>([&]() {
183+
/* can do atomic operations on "double" */
184+
});
185+
}
186+
```
187+
188+
=== Nested constructs
189+
190+
These calls can be nested to achieve if-then-elseif checks.
191+
192+
```
193+
void device_code(kernel_handler kh) {
194+
kh.ext_oneapi_if_device_has<aspect::fp64, aspect::atomic64>([&]() {
195+
/* can do atomic operations on "double" */
196+
}, [&]() {
197+
kh.ext_oneapi_if_device_has<aspect::fp64>([&]() {
198+
/* can use "double" but not with atomic operations */
199+
}, [&]() {
200+
/* can not use "double" at all */
201+
});
202+
});
203+
}
204+
```
205+
206+
[NOTE]
207+
====
208+
Although all the examples shown above have tests for feature-based aspects,
209+
it's possible to test any aspect that the implementation supports. In
210+
particular, if the implementation supports aspects that allow the application
211+
to query the architecture of the device, it would be possible to use
212+
`ext_oneapi_if_device_has()` to enable code only when the device has a certain
213+
architecture.
214+
====
215+
216+
217+
== Hypothetical implementation
218+
219+
This non-normative section outlines a possible implementation for this
220+
extension for a compiler-based solution. Different implementations are
221+
described for AOT vs. JIT modes.
222+
223+
The general approach in AOT mode is to lower the `ext_oneapi_if_device_has()`
224+
checks to `if constexpr` statements that are optimized away by the C++
225+
front-end. The approach in JIT mode is to lower the
226+
`ext_oneapi_if_device_has()` checks to code that tests a specialization
227+
constant, and these tests are optimized away by the JIT compiler.
228+
229+
=== AOT mode
230+
231+
For AOT mode, we assume that the user has specified a list of devices on the
232+
compiler's command line. The implementation invokes the device compiler
233+
multiple times over the application's device code, once for each device that
234+
the user specified.
235+
236+
Since the compiler driver knows the device, it can have a simple lookup table
237+
which tells the set of aspects that are supported for each device. The driver
238+
can then predefine a preprocessor macro for each device, indicating whether
239+
that aspect is supported. The implementation of `ext_oneapi_if_device_has()`
240+
can use these preprocessor macros to implement the condition checks. For
241+
example:
242+
243+
```
244+
class kernel_handler {
245+
#ifdef __SYCL_AOT__
246+
public:
247+
template<aspect ...Aspects, typename T>
248+
void ext_oneapi_if_device_has(T fnTrue) {
249+
if constexpr (device_has<Aspects...>()) {
250+
fnTrue();
251+
}
252+
}
253+
254+
template<aspect ...Aspects, typename T1, typename T2>
255+
void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) {
256+
if constexpr (device_has<Aspects...>()) {
257+
fnTrue();
258+
} else {
259+
fnFalse();
260+
}
261+
}
262+
263+
private:
264+
static constexpr bool capabilities[] = {
265+
__SYCL_AOT_FP16__,
266+
__SYCL_AOT_FP64__,
267+
__SYCL_AOT_ATOMIC64__
268+
};
269+
270+
template<aspect... Aspect>
271+
constexpr static bool device_has() {
272+
return (capabilities[static_cast<int>(Aspect)] && ...);
273+
}
274+
#else
275+
/* ... */
276+
#endif
277+
};
278+
```
279+
280+
This code snippet assumes that the compiler driver has predefined
281+
`+__SYCL_AOT_FP16__+`, etc. to `true` or `false` depending on the capabilities
282+
of the current device. The `if constexpr` statement in the code then evaluates
283+
the set of aspects in each call to `ext_oneapi_if_device_has()` at compilation
284+
time and normal compiler optimizations throw out all the overhead of the
285+
condition check and the lambda call, leaving only the body of the selected
286+
lambda.
287+
288+
[NOTE]
289+
====
290+
The preprocessor macro names proposed above like `+__SYCL_AOT_FP16__+` are
291+
intended to be an implementation detail that is not exposed to applications.
292+
Application code should use the `ext_oneapi_if_device_has()` construct, not
293+
reference the macros directly.
294+
====
295+
296+
=== JIT mode
297+
298+
In JIT mode, the goal is to generate a single SPIR-V module for the device
299+
code, which contains all the condition checks and lambda calls. The online
300+
compiler evaluates the condition checks at online compilation time, and the
301+
generated native code contains only the body of the selected lambda. The
302+
implementation can make use of specialization constants to represent each
303+
aspect: a specialization constant will be set to `true` if the device supports
304+
that aspect and to `false` if it does not.
305+
306+
One challenge with this approach is that we need some way to guarantee that the
307+
online compiler will not raise a compilation error from an unselected lambda
308+
call. To illustrate, consider a call to `ext_oneapi_if_device_has()` which
309+
conditionally uses `sycl::half` and let's consider the case where the current
310+
device does not have this support. We need to make sure that the online
311+
compiler does not raise a compilation error when attempting to compile the
312+
lambda body which uses `sycl::half`. In such a case the SPIR-V will
313+
conceptually look like:
314+
315+
```
316+
if (__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)) call lambda
317+
```
318+
319+
Of course, the online compiler will know that the value of
320+
`+__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)+` is `false`, so
321+
hopefully the compiler will throw away the call to lambda. However, there is
322+
no guarantee that this will happen. For example, what happens if optimization
323+
is disabled in the online compiler? Even though the lambda will never be
324+
called at runtime, we don't want the online compiler to throw an exception when
325+
it tries to generate device code for the non-existent `sycl::half` type. We
326+
therefore need some way to ensure that the online compiler discards any
327+
unselected lambda calls (or at least a way to ensure that the compiler doesn't
328+
throw an exception when compiling it).
329+
330+
We propose introducing some SPIR-V extension that provides this guarantee.
331+
The extension would take a boolean expression of specialization constant values
332+
to compute an "if" condition. The extension would then guarantee that the
333+
body of the "if" (the lambda call) is discarded if the boolean expression is
334+
`false`.
335+
336+
[NOTE]
337+
====
338+
The description of this hypothetical SPIR-V extension is intentionally vague
339+
here. The purpose of this section is only to point out that such an extension
340+
is needed, not to precisely define it. We expect that a separate document
341+
would describe the SPIR-V extension.
342+
====
343+
344+
We can now outline some pseudo code for the implementation of
345+
`ext_oneapi_if_device_has()` in JIT mode:
346+
347+
```
348+
class kernel_handler {
349+
#ifdef __SYCL_AOT__
350+
/* ... */
351+
#else
352+
public:
353+
template<aspect ...Aspects, typename T>
354+
void ext_oneapi_if_device_has(T fnTrue) {
355+
__builtin_spec_constant_if((__builtin_spec_constant(specid[Aspects]) && ...), fnTrue);
356+
}
357+
358+
template<aspect ...Aspects, typename T1, typename T2>
359+
void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) {
360+
__builtin_spec_constant_if(
361+
(__builtin_spec_constant(specid[Aspects]) && ...), fnTrue, fnFalse);
362+
}
363+
364+
private:
365+
static constexpr int specid[] = {
366+
/* SPIR-V specialization constant ID for aspect::fp16 */,
367+
/* SPIR-V specialization constant ID for aspect::fp64 */,
368+
/* SPIR-V specialization constant ID for aspect::atomic64 */
369+
};
370+
#endif
371+
};
372+
```
373+
374+
Here the builtin function `+__builtin_spec_constant(aspect)+` produces SPIR-V
375+
that loads the value of a specialization constant. The builtin function
376+
`+__builtin_spec_constant_if()+` produces the extended SPIR-V described above
377+
which provides a guarantee that the unselected lambda will be discarded.
378+
379+
Of course, the SYCL runtime must also set the values of the specialization
380+
constants appropriately before invoking the online compiler. This is easy to
381+
do since the runtime knows the target device, so it can set their values
382+
according to a lookup table.
383+
384+
385+
== Revision History
386+
387+
[%header,cols="5,15,15,65"]
388+
|===
389+
|Rev |Date |Author |Changes
390+
|1 |2021-04-09 |Greg Lueck |Initial public working draft.
391+
|===

sycl/doc/extensions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ DPC++ extensions status:
3737
| [Use Pinned Memory Property](UsePinnedMemoryProperty/UsePinnedMemoryPropery.adoc) | Supported | |
3838
| [Level-Zero backend specification](LevelZeroBackend/LevelZeroBackend.md) | Supported | |
3939
| [ITT annotations support](ITTAnnotations/ITTAnnotations.rst) | Supported | |
40+
| [SYCL_EXT_ONEAPI_DEVICE_IF](DeviceIf/device_if.asciidoc) | Proposal | |
4041

4142
Legend:
4243

0 commit comments

Comments
 (0)