|
| 1 | += SYCL_EXT_ONEAPI_DEVICE_IF |
| 2 | +:source-highlighter: coderay |
| 3 | +:coderay-linenums-mode: table |
| 4 | + |
| 5 | +// This section needs to be after the document title. |
| 6 | +:doctype: book |
| 7 | +:toc2: |
| 8 | +:toc: left |
| 9 | +:encoding: utf-8 |
| 10 | +:lang: en |
| 11 | + |
| 12 | +:blank: pass:[ +] |
| 13 | + |
| 14 | +// Set the default source code type in this document to C++, |
| 15 | +// for syntax highlighting purposes. This is needed because |
| 16 | +// docbook uses c++ and html5 uses cpp. |
| 17 | +:language: {basebackend@docbook:c++:cpp} |
| 18 | + |
| 19 | + |
| 20 | +== Notice |
| 21 | + |
| 22 | +Copyright (c) 2021-2021 Intel Corporation. All rights reserved. |
| 23 | + |
| 24 | +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are |
| 25 | +trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. |
| 26 | +used by permission by Khronos. |
| 27 | + |
| 28 | +This extension is written against the SYCL 2020 revision 3 specification. All |
| 29 | +references below to the "core SYCL specification" or to section numbers in the |
| 30 | +SYCL specification refer to that revision. |
| 31 | + |
| 32 | + |
| 33 | +== Introduction |
| 34 | + |
| 35 | +This extension provides a way for device code to query the device on which it |
| 36 | +is running in order to conditionally use features that may not be supported on |
| 37 | +all devices. This is different from the existing `device::has()` function |
| 38 | +because the extension can be called from device code while `device::has()` can |
| 39 | +only be called from host code. |
| 40 | + |
| 41 | +The motivating use case for this extension is for developers who provide device |
| 42 | +side libraries. To illustrate, consider an application developer (i.e. someone |
| 43 | +who is **not** developing a device-side library) who wants to code a kernel |
| 44 | +that conditionally uses a feature that is not available on all devices. This |
| 45 | +developer can write two versions of the kernel, one which uses the features and |
| 46 | +one that does not. Then the developer can use `device::has()` to test whether |
| 47 | +the device supports the feature and submit one or the other kernels according |
| 48 | +to the device's capabilities. (To avoid code duplication, the developer could |
| 49 | +write the kernel as a template using `if constexpr` and then instantiate the |
| 50 | +template according to the device capabilities.) |
| 51 | + |
| 52 | +This technique, however, is not available to a developer writing a device-side |
| 53 | +library because such a developer does not have control over the host code that |
| 54 | +launches the kernel. The developer could expose the library function as a |
| 55 | +template with a template parameter that controls the use of the conditional |
| 56 | +feature. For example, consider a library function "frob" that wants to |
| 57 | +conditionally use a feature named "fancy": |
| 58 | + |
| 59 | +``` |
| 60 | +template<bool useFancy> |
| 61 | +void frob() { /*...*/ } |
| 62 | +``` |
| 63 | + |
| 64 | +The caller of the library function would be responsible for calling |
| 65 | +`device::has(aspect::fancy)` to check if the device supports this feature, and |
| 66 | +then submit a kernel that calls the appropriately instantiated version of the |
| 67 | +`frob()` template function. However, this has the serious downside that the |
| 68 | +library developer must expose all device features which the library wants |
| 69 | +to conditionally use. |
| 70 | + |
| 71 | +This extension solves the problem by providing a way for the library developer |
| 72 | +to check for device features from within the library without exposing this to |
| 73 | +its callers. For example: |
| 74 | + |
| 75 | +``` |
| 76 | +void frob(kernel_handler kh) { |
| 77 | + kh.ext_oneapi_if_device_has<aspect::fancy>([&]() { |
| 78 | + // use "fancy" feature |
| 79 | + }); |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +Callers of the library must pass the `kernel_handler` to the library function, |
| 84 | +but callers do not need to know which conditions the library will check. |
| 85 | + |
| 86 | +The structure of the extension has been designed such that the overhead of the |
| 87 | +condition check can be entirely eliminated by the device compiler (in both the |
| 88 | +AOT case where the ahead-of-time compiler produces native device code and in |
| 89 | +the JIT case where the online compiler produces native device code.) |
| 90 | + |
| 91 | + |
| 92 | +== Feature test macro |
| 93 | + |
| 94 | +This extension provides a feature-test macro as described in the core SYCL |
| 95 | +specification section 6.3.3 "Feature test macros". Therefore, an |
| 96 | +implementation supporting this extension must predefine the macro |
| 97 | +`SYCL_EXT_ONEAPI_DEVICE_IF` to one of the values defined in the table below. |
| 98 | +Applications can test for the existence of this macro to determine if the |
| 99 | +implementation supports this feature, or applications can test the macro's |
| 100 | +value to determine which of the extension's APIs the implementation supports. |
| 101 | + |
| 102 | +[%header,cols="1,5"] |
| 103 | +|=== |
| 104 | +|Value |Description |
| 105 | +|1 |Initial extension version. Base features are supported. |
| 106 | +|=== |
| 107 | + |
| 108 | + |
| 109 | +== Extension to `kernel_handler` |
| 110 | + |
| 111 | +This extension adds a new member function with two overloads to the |
| 112 | +`kernel_handler` class: |
| 113 | + |
| 114 | +``` |
| 115 | +template<aspect ...Aspects, typename T> |
| 116 | +void ext_oneapi_if_device_has(T fnTrue); |
| 117 | + |
| 118 | +template<aspect ...Aspects, typename T1, typename T2> |
| 119 | +void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse); |
| 120 | +``` |
| 121 | + |
| 122 | +Both overloads are templated with an initial parameter pack which is a list of |
| 123 | +device aspects. These aspects form the condition that is checked. If the |
| 124 | +device on which the kernel is running has **all** of the listed aspects, the |
| 125 | +condition is true, otherwise the condition is false. Valid aspect names |
| 126 | +include the enumerated aspect values defined in the core SYCL specification as |
| 127 | +well as any enumerated aspect values defined by other extensions that the |
| 128 | +implementation supports. |
| 129 | + |
| 130 | +The first overload takes a single callable (typically a lambda expression), |
| 131 | +which the implementation calls only if the condition is true. The second |
| 132 | +overload takes two callables; the implementation calls the first when the |
| 133 | +condition is true and calls the second when the condition is false. |
| 134 | + |
| 135 | +Typically, an application uses the first callable to use a device feature |
| 136 | +which is only available on a device that has all of the listed aspects, and it |
| 137 | +uses the second callable as a fallback on other devices. |
| 138 | + |
| 139 | +== Example usage |
| 140 | + |
| 141 | +This non-normative section shows some example usages of the extension. |
| 142 | + |
| 143 | +=== Basic usage |
| 144 | + |
| 145 | +Conditionally use the `sycl::half` type, which is only available on devices |
| 146 | +that have the `fp16` aspect. |
| 147 | + |
| 148 | +``` |
| 149 | +void device_code(kernel_handler kh) { |
| 150 | + kh.ext_oneapi_if_device_has<aspect::fp16>([&]() { |
| 151 | + sycl::half val = 3.14; |
| 152 | + /* ... */ |
| 153 | + }); |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +=== Conditional with fallback |
| 158 | + |
| 159 | +Use the `sycl::half` type if the device supports it, otherwise fallback to |
| 160 | +using `float`. |
| 161 | + |
| 162 | +``` |
| 163 | +void device_code(kernel_handler kh) { |
| 164 | + kh.ext_oneapi_if_device_has<aspect::fp16>([&]() { |
| 165 | + sycl::half val = 3.14; |
| 166 | + /* ... */ |
| 167 | + }, [&]() { |
| 168 | + float val = 3.14; |
| 169 | + /* ... */ |
| 170 | + }); |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +=== Test for multiple aspects |
| 175 | + |
| 176 | +If more than one aspect is listed, the condition is only true if the device has |
| 177 | +all of the listed aspects. This code checks for both `fp64` and `atomic64` in |
| 178 | +order to tell if atomic operations are allowed on 64-bit floating point values. |
| 179 | + |
| 180 | +``` |
| 181 | +void device_code(kernel_handler kh) { |
| 182 | + kh.ext_oneapi_if_device_has<aspect::fp64, aspect::atomic64>([&]() { |
| 183 | + /* can do atomic operations on "double" */ |
| 184 | + }); |
| 185 | +} |
| 186 | +``` |
| 187 | + |
| 188 | +=== Nested constructs |
| 189 | + |
| 190 | +These calls can be nested to achieve if-then-elseif checks. |
| 191 | + |
| 192 | +``` |
| 193 | +void device_code(kernel_handler kh) { |
| 194 | + kh.ext_oneapi_if_device_has<aspect::fp64, aspect::atomic64>([&]() { |
| 195 | + /* can do atomic operations on "double" */ |
| 196 | + }, [&]() { |
| 197 | + kh.ext_oneapi_if_device_has<aspect::fp64>([&]() { |
| 198 | + /* can use "double" but not with atomic operations */ |
| 199 | + }, [&]() { |
| 200 | + /* can not use "double" at all */ |
| 201 | + }); |
| 202 | + }); |
| 203 | +} |
| 204 | +``` |
| 205 | + |
| 206 | +[NOTE] |
| 207 | +==== |
| 208 | +Although all the examples shown above have tests for feature-based aspects, |
| 209 | +it's possible to test any aspect that the implementation supports. In |
| 210 | +particular, if the implementation supports aspects that allow the application |
| 211 | +to query the architecture of the device, it would be possible to use |
| 212 | +`ext_oneapi_if_device_has()` to enable code only when the device has a certain |
| 213 | +architecture. |
| 214 | +==== |
| 215 | + |
| 216 | + |
| 217 | +== Hypothetical implementation |
| 218 | + |
| 219 | +This non-normative section outlines a possible implementation for this |
| 220 | +extension for a compiler-based solution. Different implementations are |
| 221 | +described for AOT vs. JIT modes. |
| 222 | + |
| 223 | +The general approach in AOT mode is to lower the `ext_oneapi_if_device_has()` |
| 224 | +checks to `if constexpr` statements that are optimized away by the C++ |
| 225 | +front-end. The approach in JIT mode is to lower the |
| 226 | +`ext_oneapi_if_device_has()` checks to code that tests a specialization |
| 227 | +constant, and these tests are optimized away by the JIT compiler. |
| 228 | + |
| 229 | +=== AOT mode |
| 230 | + |
| 231 | +For AOT mode, we assume that the user has specified a list of devices on the |
| 232 | +compiler's command line. The implementation invokes the device compiler |
| 233 | +multiple times over the application's device code, once for each device that |
| 234 | +the user specified. |
| 235 | + |
| 236 | +Since the compiler driver knows the device, it can have a simple lookup table |
| 237 | +which tells the set of aspects that are supported for each device. The driver |
| 238 | +can then predefine a preprocessor macro for each device, indicating whether |
| 239 | +that aspect is supported. The implementation of `ext_oneapi_if_device_has()` |
| 240 | +can use these preprocessor macros to implement the condition checks. For |
| 241 | +example: |
| 242 | + |
| 243 | +``` |
| 244 | +class kernel_handler { |
| 245 | +#ifdef __SYCL_AOT__ |
| 246 | + public: |
| 247 | + template<aspect ...Aspects, typename T> |
| 248 | + void ext_oneapi_if_device_has(T fnTrue) { |
| 249 | + if constexpr (device_has<Aspects...>()) { |
| 250 | + fnTrue(); |
| 251 | + } |
| 252 | + } |
| 253 | + |
| 254 | + template<aspect ...Aspects, typename T1, typename T2> |
| 255 | + void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) { |
| 256 | + if constexpr (device_has<Aspects...>()) { |
| 257 | + fnTrue(); |
| 258 | + } else { |
| 259 | + fnFalse(); |
| 260 | + } |
| 261 | + } |
| 262 | + |
| 263 | + private: |
| 264 | + static constexpr bool capabilities[] = { |
| 265 | + __SYCL_AOT_FP16__, |
| 266 | + __SYCL_AOT_FP64__, |
| 267 | + __SYCL_AOT_ATOMIC64__ |
| 268 | + }; |
| 269 | + |
| 270 | + template<aspect... Aspect> |
| 271 | + constexpr static bool device_has() { |
| 272 | + return (capabilities[static_cast<int>(Aspect)] && ...); |
| 273 | + } |
| 274 | +#else |
| 275 | + /* ... */ |
| 276 | +#endif |
| 277 | +}; |
| 278 | +``` |
| 279 | + |
| 280 | +This code snippet assumes that the compiler driver has predefined |
| 281 | +`+__SYCL_AOT_FP16__+`, etc. to `true` or `false` depending on the capabilities |
| 282 | +of the current device. The `if constexpr` statement in the code then evaluates |
| 283 | +the set of aspects in each call to `ext_oneapi_if_device_has()` at compilation |
| 284 | +time and normal compiler optimizations throw out all the overhead of the |
| 285 | +condition check and the lambda call, leaving only the body of the selected |
| 286 | +lambda. |
| 287 | + |
| 288 | +[NOTE] |
| 289 | +==== |
| 290 | +The preprocessor macro names proposed above like `+__SYCL_AOT_FP16__+` are |
| 291 | +intended to be an implementation detail that is not exposed to applications. |
| 292 | +Application code should use the `ext_oneapi_if_device_has()` construct, not |
| 293 | +reference the macros directly. |
| 294 | +==== |
| 295 | + |
| 296 | +=== JIT mode |
| 297 | + |
| 298 | +In JIT mode, the goal is to generate a single SPIR-V module for the device |
| 299 | +code, which contains all the condition checks and lambda calls. The online |
| 300 | +compiler evaluates the condition checks at online compilation time, and the |
| 301 | +generated native code contains only the body of the selected lambda. The |
| 302 | +implementation can make use of specialization constants to represent each |
| 303 | +aspect: a specialization constant will be set to `true` if the device supports |
| 304 | +that aspect and to `false` if it does not. |
| 305 | + |
| 306 | +One challenge with this approach is that we need some way to guarantee that the |
| 307 | +online compiler will not raise a compilation error from an unselected lambda |
| 308 | +call. To illustrate, consider a call to `ext_oneapi_if_device_has()` which |
| 309 | +conditionally uses `sycl::half` and let's consider the case where the current |
| 310 | +device does not have this support. We need to make sure that the online |
| 311 | +compiler does not raise a compilation error when attempting to compile the |
| 312 | +lambda body which uses `sycl::half`. In such a case the SPIR-V will |
| 313 | +conceptually look like: |
| 314 | + |
| 315 | +``` |
| 316 | +if (__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)) call lambda |
| 317 | +``` |
| 318 | + |
| 319 | +Of course, the online compiler will know that the value of |
| 320 | +`+__builtin_spec_constant(/*SPIR-V ID for aspect::fp16*/)+` is `false`, so |
| 321 | +hopefully the compiler will throw away the call to lambda. However, there is |
| 322 | +no guarantee that this will happen. For example, what happens if optimization |
| 323 | +is disabled in the online compiler? Even though the lambda will never be |
| 324 | +called at runtime, we don't want the online compiler to throw an exception when |
| 325 | +it tries to generate device code for the non-existent `sycl::half` type. We |
| 326 | +therefore need some way to ensure that the online compiler discards any |
| 327 | +unselected lambda calls (or at least a way to ensure that the compiler doesn't |
| 328 | +throw an exception when compiling it). |
| 329 | + |
| 330 | +We propose introducing some SPIR-V extension that provides this guarantee. |
| 331 | +The extension would take a boolean expression of specialization constant values |
| 332 | +to compute an "if" condition. The extension would then guarantee that the |
| 333 | +body of the "if" (the lambda call) is discarded if the boolean expression is |
| 334 | +`false`. |
| 335 | + |
| 336 | +[NOTE] |
| 337 | +==== |
| 338 | +The description of this hypothetical SPIR-V extension is intentionally vague |
| 339 | +here. The purpose of this section is only to point out that such an extension |
| 340 | +is needed, not to precisely define it. We expect that a separate document |
| 341 | +would describe the SPIR-V extension. |
| 342 | +==== |
| 343 | + |
| 344 | +We can now outline some pseudo code for the implementation of |
| 345 | +`ext_oneapi_if_device_has()` in JIT mode: |
| 346 | + |
| 347 | +``` |
| 348 | +class kernel_handler { |
| 349 | +#ifdef __SYCL_AOT__ |
| 350 | + /* ... */ |
| 351 | +#else |
| 352 | + public: |
| 353 | + template<aspect ...Aspects, typename T> |
| 354 | + void ext_oneapi_if_device_has(T fnTrue) { |
| 355 | + __builtin_spec_constant_if((__builtin_spec_constant(specid[Aspects]) && ...), fnTrue); |
| 356 | + } |
| 357 | + |
| 358 | + template<aspect ...Aspects, typename T1, typename T2> |
| 359 | + void ext_oneapi_if_device_has(T1 fnTrue, T2 fnFalse) { |
| 360 | + __builtin_spec_constant_if( |
| 361 | + (__builtin_spec_constant(specid[Aspects]) && ...), fnTrue, fnFalse); |
| 362 | + } |
| 363 | + |
| 364 | + private: |
| 365 | + static constexpr int specid[] = { |
| 366 | + /* SPIR-V specialization constant ID for aspect::fp16 */, |
| 367 | + /* SPIR-V specialization constant ID for aspect::fp64 */, |
| 368 | + /* SPIR-V specialization constant ID for aspect::atomic64 */ |
| 369 | + }; |
| 370 | +#endif |
| 371 | +}; |
| 372 | +``` |
| 373 | + |
| 374 | +Here the builtin function `+__builtin_spec_constant(aspect)+` produces SPIR-V |
| 375 | +that loads the value of a specialization constant. The builtin function |
| 376 | +`+__builtin_spec_constant_if()+` produces the extended SPIR-V described above |
| 377 | +which provides a guarantee that the unselected lambda will be discarded. |
| 378 | + |
| 379 | +Of course, the SYCL runtime must also set the values of the specialization |
| 380 | +constants appropriately before invoking the online compiler. This is easy to |
| 381 | +do since the runtime knows the target device, so it can set their values |
| 382 | +according to a lookup table. |
| 383 | + |
| 384 | + |
| 385 | +== Revision History |
| 386 | + |
| 387 | +[%header,cols="5,15,15,65"] |
| 388 | +|=== |
| 389 | +|Rev |Date |Author |Changes |
| 390 | +|1 |2021-04-09 |Greg Lueck |Initial public working draft. |
| 391 | +|=== |
0 commit comments