Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
391 changes: 391 additions & 0 deletions sycl/doc/extensions/DeviceIf/device_if.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,391 @@
= SYCL_EXT_ONEAPI_DEVICE_IF
:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

Copyright (c) 2021-2021 Intel Corporation. All rights reserved.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

This extension is written against the SYCL 2020 revision 3 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.


== Introduction

This extension provides a way for device code to query the device on which it
is running in order to conditionally use features that may not be supported on
all devices. This is different from the existing `device::has()` function
because the extension can be called from device code while `device::has()` can
only be called from host code.

The motivating use case for this extension is for developers who provide device
side libraries. To illustrate, consider an application developer (i.e. someone
who is **not** developing a device-side library) who wants to code a kernel
that conditionally uses a feature that is not available on all devices. This
developer can write two versions of the kernel, one which uses the features and
one that does not. Then the developer can use `device::has()` to test whether
the device supports the feature and submit one or the other kernels according
to the device's capabilities. (To avoid code duplication, the developer could
write the kernel as a template using `if constexpr` and then instantiate the
template according to the device capabilities.)

This technique, however, is not available to a developer writing a device-side
library because such a developer does not have control over the host code that
launches the kernel. The developer could expose the library function as a
template with a template parameter that controls the use of the conditional
feature. For example, consider a library function "frob" that wants to
conditionally use a feature named "fancy":

```
template<bool useFancy>
void frob() { /*...*/ }
```

The caller of the library function would be responsible for calling
`device::has(aspect::fancy)` to check if the device supports this feature, and
then submit a kernel that calls the appropriately instantiated version of the
`frob()` template function. However, this has the serious downside that the
library developer must expose all device features which the library wants
to conditionally use.

This extension solves the problem by providing a way for the library developer
to check for device features from within the library without exposing this to
its callers. For example:

```
void frob(kernel_handler kh) {
kh.ext_oneapi_if_device_has<aspect::fancy>([&]() {
// use "fancy" feature
});
}
```

Callers of the library must pass the `kernel_handler` to the library function,
but callers do not need to know which conditions the library will check.

The structure of the extension has been designed such that the overhead of the
condition check can be entirely eliminated by the device compiler (in both the
AOT case where the ahead-of-time compiler produces native device code and in
the JIT case where the online compiler produces native device code.)


== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification section 6.3.3 "Feature test macros". Therefore, an
implementation supporting this extension must predefine the macro
`SYCL_EXT_ONEAPI_DEVICE_IF` to one of the values defined in the table below.
Applications can test for the existence of this macro to determine if the
implementation supports this feature, or applications can test the macro's
value to determine which of the extension's APIs the implementation supports.

[%header,cols="1,5"]
|===
|Value |Description
|1 |Initial extension version. Base features are supported.
|===


== Extension to `kernel_handler`

This extension adds a new member function with two overloads to the
`kernel_handler` class:

```
template<aspect ...Aspects, typename T>
void ext_oneapi_if_device_has(T fnTrue);

template<aspect ...Aspects, typename T1, typename T2>
void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse);
```

Both overloads are templated with an initial parameter pack which is a list of
device aspects. These aspects form the condition that is checked. If the
device on which the kernel is running has **all** of the listed aspects, the
condition is true, otherwise the condition is false. Valid aspect names
include the enumerated aspect values defined in the core SYCL specification as
well as any enumerated aspect values defined by other extensions that the
implementation supports.

The first overload takes a single callable (typically a lambda expression),
which the implementation calls only if the condition is true. The second
overload takes two callables; the implementation calls the first when the
condition is true and calls the second when the condition is false.

Typically, an application uses the first callable to use a device feature
which is only available on a device that has all of the listed aspects, and it
uses the second callable as a fallback on other devices.

== Example usage

This non-normative section shows some example usages of the extension.

=== Basic usage

Conditionally use the `sycl::half` type, which is only available on devices
that have the `fp16` aspect.

```
void device_code(kernel_handler kh) {
kh.ext_oneapi_if_device_has<aspect::fp16>([&]() {
sycl::half val = 3.14;
/* ... */
});
}
```

=== Conditional with fallback

Use the `sycl::half` type if the device supports it, otherwise fallback to
using `float`.

```
void device_code(kernel_handler kh) {
kh.ext_oneapi_if_device_has<aspect::fp16>([&]() {
sycl::half val = 3.14;
/* ... */
}, [&]() {
float val = 3.14;
/* ... */
});
}
```

=== Test for multiple aspects

If more than one aspect is listed, the condition is only true if the device has
all of the listed aspects. This code checks for both `fp64` and `atomic64` in
order to tell if atomic operations are allowed on 64-bit floating point values.

```
void device_code(kernel_handler kh) {
kh.ext_oneapi_if_device_has<aspect::fp64, aspect::atomic64>([&]() {
/* can do atomic operations on "double" */
});
}
```

=== Nested constructs

These calls can be nested to achieve if-then-elseif checks.

```
void device_code(kernel_handler kh) {
kh.ext_oneapi_if_device_has<aspect::fp64, aspect::atomic64>([&]() {
/* can do atomic operations on "double" */
}, [&]() {
kh.ext_oneapi_if_device_has<aspect::fp64>([&]() {
/* can use "double" but not with atomic operations */
}, [&]() {
/* can not use "double" at all */
});
});
}
```

[NOTE]
====
Although all the examples shown above have tests for feature-based aspects,
it's possible to test any aspect that the implementation supports. In
particular, if the implementation supports aspects that allow the application
to query the architecture of the device, it would be possible to use
`ext_oneapi_if_device_has()` to enable code only when the device has a certain
architecture.
====


== Hypothetical implementation

This non-normative section outlines a possible implementation for this
extension for a compiler-based solution. Different implementations are
described for AOT vs. JIT modes.

The general approach in AOT mode is to lower the `ext_oneapi_if_device_has()`
checks to `if constexpr` statements that are optimized away by the C++
front-end. The approach in JIT mode is to lower the
`ext_oneapi_if_device_has()` checks to code that tests a specialization
constant, and these tests are optimized away by the JIT compiler.

=== AOT mode

For AOT mode, we assume that the user has specified a list of devices on the
compiler's command line. The implementation invokes the device compiler
multiple times over the application's device code, once for each device that
the user specified.

Since the compiler driver knows the device, it can have a simple lookup table
which tells the set of aspects that are supported for each device. The driver
can then predefine a preprocessor macro for each device, indicating whether
that aspect is supported. The implementation of `ext_oneapi_if_device_has()`
can use these preprocessor macros to implement the condition checks. For
example:

```
class kernel_handler {
#ifdef __SYCL_AOT__
public:
template<aspect ...Aspects, typename T>
void ext_oneapi_if_device_has(T fnTrue) {
if constexpr (device_has<Aspects...>()) {
fnTrue();
}
}

template<aspect ...Aspects, typename T1, typename T2>
void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) {
if constexpr (device_has<Aspects...>()) {
fnTrue();
} else {
fnFalse();
}
}

private:
static constexpr bool capabilities[] = {
__SYCL_AOT_FP16__,
__SYCL_AOT_FP64__,
__SYCL_AOT_ATOMIC64__
};

template<aspect... Aspect>
constexpr static bool device_has() {
return (capabilities[static_cast<int>(Aspect)] && ...);
}
#else
/* ... */
#endif
};
```

This code snippet assumes that the compiler driver has predefined
`+__SYCL_AOT_FP16__+`, etc. to `true` or `false` depending on the capabilities
of the current device. The `if constexpr` statement in the code then evaluates
the set of aspects in each call to `ext_oneapi_if_device_has()` at compilation
time and normal compiler optimizations throw out all the overhead of the
condition check and the lambda call, leaving only the body of the selected
lambda.

[NOTE]
====
The preprocessor macro names proposed above like `+__SYCL_AOT_FP16__+` are
intended to be an implementation detail that is not exposed to applications.
Application code should use the `ext_oneapi_if_device_has()` construct, not
reference the macros directly.
====

=== JIT mode

In JIT mode, the goal is to generate a single SPIR-V module for the device
code, which contains all the condition checks and lambda calls. The online
compiler evaluates the condition checks at online compilation time, and the
generated native code contains only the body of the selected lambda. The
implementation can make use of specialization constants to represent each
aspect: a specialization constant will be set to `true` if the device supports
that aspect and to `false` if it does not.

One challenge with this approach is that we need some way to guarantee that the
online compiler will not raise a compilation error from an unselected lambda
call. To illustrate, consider a call to `ext_oneapi_if_device_has()` which
conditionally uses `sycl::half` and let's consider the case where the current
device does not have this support. We need to make sure that the online
compiler does not raise a compilation error when attempting to compile the
lambda body which uses `sycl::half`. In such a case the SPIR-V will
conceptually look like:

```
if (__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)) call lambda
```

Of course, the online compiler will know that the value of
`+__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)+` is `false`, so
hopefully the compiler will throw away the call to lambda. However, there is
no guarantee that this will happen. For example, what happens if optimization
is disabled in the online compiler? Even though the lambda will never be
called at runtime, we don't want the online compiler to throw an exception when
it tries to generate device code for the non-existent `sycl::half` type. We
therefore need some way to ensure that the online compiler discards any
unselected lambda calls (or at least a way to ensure that the compiler doesn't
throw an exception when compiling it).

We propose introducing some SPIR-V extension that provides this guarantee.
The extension would take a boolean expression of specialization constant values
to compute an "if" condition. The extension would then guarantee that the
body of the "if" (the lambda call) is discarded if the boolean expression is
`false`.

[NOTE]
====
The description of this hypothetical SPIR-V extension is intentionally vague
here. The purpose of this section is only to point out that such an extension
is needed, not to precisely define it. We expect that a separate document
would describe the SPIR-V extension.
====

We can now outline some pseudo code for the implementation of
`ext_oneapi_if_device_has()` in JIT mode:

```
class kernel_handler {
#ifdef __SYCL_AOT__
/* ... */
#else
public:
template<aspect ...Aspects, typename T>
void ext_oneapi_if_device_has(T fnTrue) {
__builtin_spec_constant_if((__builtin_spec_constant(specid[Aspects]) && ...), fnTrue);
}

template<aspect ...Aspects, typename T1, typename T2>
void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) {
__builtin_spec_constant_if(
(__builtin_spec_constant(specid[Aspects]) && ...), fnTrue, fnFalse);
}

private:
static constexpr int specid[] = {
/* SPIR-V specialization constant ID for aspect::fp16 */,
/* SPIR-V specialization constant ID for aspect::fp64 */,
/* SPIR-V specialization constant ID for aspect::atomic64 */
};
#endif
};
```

Here the builtin function `+__builtin_spec_constant(aspect)+` produces SPIR-V
that loads the value of a specialization constant. The builtin function
`+__builtin_spec_constant_if()+` produces the extended SPIR-V described above
which provides a guarantee that the unselected lambda will be discarded.

Of course, the SYCL runtime must also set the values of the specialization
constants appropriately before invoking the online compiler. This is easy to
do since the runtime knows the target device, so it can set their values
according to a lookup table.


== Revision History

[%header,cols="5,15,15,65"]
|===
|Rev |Date |Author |Changes
|1 |2021-04-09 |Greg Lueck |Initial public working draft.
|===
1 change: 1 addition & 0 deletions sycl/doc/extensions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ DPC++ extensions status:
| [Use Pinned Memory Property](UsePinnedMemoryProperty/UsePinnedMemoryPropery.adoc) | Supported | |
| [Level-Zero backend specification](LevelZeroBackend/LevelZeroBackend.md) | Supported | |
| [ITT annotations support](ITTAnnotations/ITTAnnotations.rst) | Supported | |
| [SYCL_EXT_ONEAPI_DEVICE_IF](DeviceIf/device_if.asciidoc) | Proposal | |

Legend:

Expand Down