diff --git a/sycl/doc/extensions/DeviceIf/device_if.asciidoc b/sycl/doc/extensions/DeviceIf/device_if.asciidoc new file mode 100644 index 0000000000000..21e36fbd2da27 --- /dev/null +++ b/sycl/doc/extensions/DeviceIf/device_if.asciidoc @@ -0,0 +1,391 @@ += SYCL_EXT_ONEAPI_DEVICE_IF +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +Copyright (c) 2021-2021 Intel Corporation. All rights reserved. + +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are +trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. +used by permission by Khronos. + +This extension is written against the SYCL 2020 revision 3 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + + +== Introduction + +This extension provides a way for device code to query the device on which it +is running in order to conditionally use features that may not be supported on +all devices. This is different from the existing `device::has()` function +because the extension can be called from device code while `device::has()` can +only be called from host code. + +The motivating use case for this extension is for developers who provide device +side libraries. To illustrate, consider an application developer (i.e. someone +who is **not** developing a device-side library) who wants to code a kernel +that conditionally uses a feature that is not available on all devices. This +developer can write two versions of the kernel, one which uses the features and +one that does not. Then the developer can use `device::has()` to test whether +the device supports the feature and submit one or the other kernels according +to the device's capabilities. (To avoid code duplication, the developer could +write the kernel as a template using `if constexpr` and then instantiate the +template according to the device capabilities.) + +This technique, however, is not available to a developer writing a device-side +library because such a developer does not have control over the host code that +launches the kernel. The developer could expose the library function as a +template with a template parameter that controls the use of the conditional +feature. For example, consider a library function "frob" that wants to +conditionally use a feature named "fancy": + +``` +template +void frob() { /*...*/ } +``` + +The caller of the library function would be responsible for calling +`device::has(aspect::fancy)` to check if the device supports this feature, and +then submit a kernel that calls the appropriately instantiated version of the +`frob()` template function. However, this has the serious downside that the +library developer must expose all device features which the library wants +to conditionally use. + +This extension solves the problem by providing a way for the library developer +to check for device features from within the library without exposing this to +its callers. For example: + +``` +void frob(kernel_handler kh) { + kh.ext_oneapi_if_device_has([&]() { + // use "fancy" feature + }); +} +``` + +Callers of the library must pass the `kernel_handler` to the library function, +but callers do not need to know which conditions the library will check. + +The structure of the extension has been designed such that the overhead of the +condition check can be entirely eliminated by the device compiler (in both the +AOT case where the ahead-of-time compiler produces native device code and in +the JIT case where the online compiler produces native device code.) + + +== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification section 6.3.3 "Feature test macros". Therefore, an +implementation supporting this extension must predefine the macro +`SYCL_EXT_ONEAPI_DEVICE_IF` to one of the values defined in the table below. +Applications can test for the existence of this macro to determine if the +implementation supports this feature, or applications can test the macro's +value to determine which of the extension's APIs the implementation supports. + +[%header,cols="1,5"] +|=== +|Value |Description +|1 |Initial extension version. Base features are supported. +|=== + + +== Extension to `kernel_handler` + +This extension adds a new member function with two overloads to the +`kernel_handler` class: + +``` +template +void ext_oneapi_if_device_has(T fnTrue); + +template +void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse); +``` + +Both overloads are templated with an initial parameter pack which is a list of +device aspects. These aspects form the condition that is checked. If the +device on which the kernel is running has **all** of the listed aspects, the +condition is true, otherwise the condition is false. Valid aspect names +include the enumerated aspect values defined in the core SYCL specification as +well as any enumerated aspect values defined by other extensions that the +implementation supports. + +The first overload takes a single callable (typically a lambda expression), +which the implementation calls only if the condition is true. The second +overload takes two callables; the implementation calls the first when the +condition is true and calls the second when the condition is false. + +Typically, an application uses the first callable to use a device feature +which is only available on a device that has all of the listed aspects, and it +uses the second callable as a fallback on other devices. + +== Example usage + +This non-normative section shows some example usages of the extension. + +=== Basic usage + +Conditionally use the `sycl::half` type, which is only available on devices +that have the `fp16` aspect. + +``` +void device_code(kernel_handler kh) { + kh.ext_oneapi_if_device_has([&]() { + sycl::half val = 3.14; + /* ... */ + }); +} +``` + +=== Conditional with fallback + +Use the `sycl::half` type if the device supports it, otherwise fallback to +using `float`. + +``` +void device_code(kernel_handler kh) { + kh.ext_oneapi_if_device_has([&]() { + sycl::half val = 3.14; + /* ... */ + }, [&]() { + float val = 3.14; + /* ... */ + }); +} +``` + +=== Test for multiple aspects + +If more than one aspect is listed, the condition is only true if the device has +all of the listed aspects. This code checks for both `fp64` and `atomic64` in +order to tell if atomic operations are allowed on 64-bit floating point values. + +``` +void device_code(kernel_handler kh) { + kh.ext_oneapi_if_device_has([&]() { + /* can do atomic operations on "double" */ + }); +} +``` + +=== Nested constructs + +These calls can be nested to achieve if-then-elseif checks. + +``` +void device_code(kernel_handler kh) { + kh.ext_oneapi_if_device_has([&]() { + /* can do atomic operations on "double" */ + }, [&]() { + kh.ext_oneapi_if_device_has([&]() { + /* can use "double" but not with atomic operations */ + }, [&]() { + /* can not use "double" at all */ + }); + }); +} +``` + +[NOTE] +==== +Although all the examples shown above have tests for feature-based aspects, +it's possible to test any aspect that the implementation supports. In +particular, if the implementation supports aspects that allow the application +to query the architecture of the device, it would be possible to use +`ext_oneapi_if_device_has()` to enable code only when the device has a certain +architecture. +==== + + +== Hypothetical implementation + +This non-normative section outlines a possible implementation for this +extension for a compiler-based solution. Different implementations are +described for AOT vs. JIT modes. + +The general approach in AOT mode is to lower the `ext_oneapi_if_device_has()` +checks to `if constexpr` statements that are optimized away by the C++ +front-end. The approach in JIT mode is to lower the +`ext_oneapi_if_device_has()` checks to code that tests a specialization +constant, and these tests are optimized away by the JIT compiler. + +=== AOT mode + +For AOT mode, we assume that the user has specified a list of devices on the +compiler's command line. The implementation invokes the device compiler +multiple times over the application's device code, once for each device that +the user specified. + +Since the compiler driver knows the device, it can have a simple lookup table +which tells the set of aspects that are supported for each device. The driver +can then predefine a preprocessor macro for each device, indicating whether +that aspect is supported. The implementation of `ext_oneapi_if_device_has()` +can use these preprocessor macros to implement the condition checks. For +example: + +``` +class kernel_handler { +#ifdef __SYCL_AOT__ + public: + template + void ext_oneapi_if_device_has(T fnTrue) { + if constexpr (device_has()) { + fnTrue(); + } + } + + template + void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) { + if constexpr (device_has()) { + fnTrue(); + } else { + fnFalse(); + } + } + + private: + static constexpr bool capabilities[] = { + __SYCL_AOT_FP16__, + __SYCL_AOT_FP64__, + __SYCL_AOT_ATOMIC64__ + }; + + template + constexpr static bool device_has() { + return (capabilities[static_cast(Aspect)] && ...); + } +#else + /* ... */ +#endif +}; +``` + +This code snippet assumes that the compiler driver has predefined +`+__SYCL_AOT_FP16__+`, etc. to `true` or `false` depending on the capabilities +of the current device. The `if constexpr` statement in the code then evaluates +the set of aspects in each call to `ext_oneapi_if_device_has()` at compilation +time and normal compiler optimizations throw out all the overhead of the +condition check and the lambda call, leaving only the body of the selected +lambda. + +[NOTE] +==== +The preprocessor macro names proposed above like `+__SYCL_AOT_FP16__+` are +intended to be an implementation detail that is not exposed to applications. +Application code should use the `ext_oneapi_if_device_has()` construct, not +reference the macros directly. +==== + +=== JIT mode + +In JIT mode, the goal is to generate a single SPIR-V module for the device +code, which contains all the condition checks and lambda calls. The online +compiler evaluates the condition checks at online compilation time, and the +generated native code contains only the body of the selected lambda. The +implementation can make use of specialization constants to represent each +aspect: a specialization constant will be set to `true` if the device supports +that aspect and to `false` if it does not. + +One challenge with this approach is that we need some way to guarantee that the +online compiler will not raise a compilation error from an unselected lambda +call. To illustrate, consider a call to `ext_oneapi_if_device_has()` which +conditionally uses `sycl::half` and let's consider the case where the current +device does not have this support. We need to make sure that the online +compiler does not raise a compilation error when attempting to compile the +lambda body which uses `sycl::half`. In such a case the SPIR-V will +conceptually look like: + +``` +if (__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)) call lambda +``` + +Of course, the online compiler will know that the value of +`+__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)+` is `false`, so +hopefully the compiler will throw away the call to lambda. However, there is +no guarantee that this will happen. For example, what happens if optimization +is disabled in the online compiler? Even though the lambda will never be +called at runtime, we don't want the online compiler to throw an exception when +it tries to generate device code for the non-existent `sycl::half` type. We +therefore need some way to ensure that the online compiler discards any +unselected lambda calls (or at least a way to ensure that the compiler doesn't +throw an exception when compiling it). + +We propose introducing some SPIR-V extension that provides this guarantee. +The extension would take a boolean expression of specialization constant values +to compute an "if" condition. The extension would then guarantee that the +body of the "if" (the lambda call) is discarded if the boolean expression is +`false`. + +[NOTE] +==== +The description of this hypothetical SPIR-V extension is intentionally vague +here. The purpose of this section is only to point out that such an extension +is needed, not to precisely define it. We expect that a separate document +would describe the SPIR-V extension. +==== + +We can now outline some pseudo code for the implementation of +`ext_oneapi_if_device_has()` in JIT mode: + +``` +class kernel_handler { +#ifdef __SYCL_AOT__ + /* ... */ +#else + public: + template + void ext_oneapi_if_device_has(T fnTrue) { + __builtin_spec_constant_if((__builtin_spec_constant(specid[Aspects]) && ...), fnTrue); + } + + template + void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) { + __builtin_spec_constant_if( + (__builtin_spec_constant(specid[Aspects]) && ...), fnTrue, fnFalse); + } + + private: + static constexpr int specid[] = { + /* SPIR-V specialization constant ID for aspect::fp16 */, + /* SPIR-V specialization constant ID for aspect::fp64 */, + /* SPIR-V specialization constant ID for aspect::atomic64 */ + }; +#endif +}; +``` + +Here the builtin function `+__builtin_spec_constant(aspect)+` produces SPIR-V +that loads the value of a specialization constant. The builtin function +`+__builtin_spec_constant_if()+` produces the extended SPIR-V described above +which provides a guarantee that the unselected lambda will be discarded. + +Of course, the SYCL runtime must also set the values of the specialization +constants appropriately before invoking the online compiler. This is easy to +do since the runtime knows the target device, so it can set their values +according to a lookup table. + + +== Revision History + +[%header,cols="5,15,15,65"] +|=== +|Rev |Date |Author |Changes +|1 |2021-04-09 |Greg Lueck |Initial public working draft. +|=== diff --git a/sycl/doc/extensions/README.md b/sycl/doc/extensions/README.md index e4b4a7bdb52be..0faddf8955ee2 100755 --- a/sycl/doc/extensions/README.md +++ b/sycl/doc/extensions/README.md @@ -37,6 +37,7 @@ DPC++ extensions status: | [Use Pinned Memory Property](UsePinnedMemoryProperty/UsePinnedMemoryPropery.adoc) | Supported | | | [Level-Zero backend specification](LevelZeroBackend/LevelZeroBackend.md) | Supported | | | [ITT annotations support](ITTAnnotations/ITTAnnotations.rst) | Supported | | +| [SYCL_EXT_ONEAPI_DEVICE_IF](DeviceIf/device_if.asciidoc) | Proposal | | Legend: