-
Notifications
You must be signed in to change notification settings - Fork 791
[SYCL][Doc] SYCL 2020 specialization constants design #3331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] SYCL 2020 specialization constants design #3331
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 design question, 1 nit.
This is a draft patch to get early feedback. Spec is currently under review here - intel#3331 Signed-off-by: Elizabeth Andrews <[email protected]>
|
|
||
| > A constant variable where the value is not known until compilation of the | ||
| > SYCL kernel function. | ||
| > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get clarification about exactly what constitutes compilation time of the SYCL kernel function?
In ahead of time compilation does this mean that all specialization constants must be known at that time?
In JIT compilation, does this mean the the first time the kernel is jitted for a device the specialization constants can all be resolved, and that the kernel never needs to be jitted for that device again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SYCL 2020 specification describes specialization constants this way:
Device code can make use of specialization constants which represent constants whose values can be set dynamically during execution of the SYCL application. The values of these constants are fixed when a SYCL kernel function is invoked, and they do not change during the execution of the kernel. However, the application is able to set a new value for a specialization constant each time a kernel is invoked, so the values can be tuned differently for each invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the above should change to say "A constant variable where the value is not known until invocation of the SYCL kernel function." rather than compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the above should change to say "A constant variable where the value is not known until invocation of the SYCL kernel function." rather than compilation.
@kbsmith-intel, @romanovvlad, please note that this was a quote from the spec (Glossary section), so if we change it here we should also reflect that in the spec.
Can we get clarification about exactly what constitutes compilation time of the SYCL kernel function?
I guess it depends on whether we are talking about online or offline compilation. Plus, even for offline compilation SYCL spec doesn't specify the exact formats used by the compiler:
When a SYCL application contains SYCL kernel function objects, the SYCL implementation must provide
an offline compilation mechanism that enables the integration of the device binaries into the SYCL application. The output of the offline compiler can be an intermediate representation, such as SPIR-V, that
will be finalized during execution or a final device ISA.3.5. The SYCL platform model
In ahead of time compilation does this mean that all specialization constants must be known at that time?
For AOT, we produce a native executable object and the idea that it don't need any further online compilation/linking before being executed. Therefore, we can only two ways of dealing with spec constants:
- Capture some default values of them and embed them into generated binary
- Emulate support of specialization constants without needing to recompile the binary
SYCL 2020 spec mandates us to support changing specialization constants values even for AOT-compiled programs, which means that we go with (2) here.
In JIT compilation, does this mean the the first time the kernel is jitted for a device the specialization constants can all be resolved, and that the kernel never needs to be jitted for that device again?
This is not entirely true: it is possible that kernel needs to be jitted again for the same device, because it was launched with different values of specialization constants. See the following note from the spec:
Implementations that support online compilation of kernel bundles will likely implement both methods of specialization constants using kernel bundles. Therefore, applications should expect that there is some overhead associated with invoking a kernel with
new values for its specialization constants. A typical implementation records the values
of specialization constants set viahandler::set_specialization_constant()and remembers these values until a kernel is invoked (e.g. viaparallel_for()). At this point, the
implementation determines the bundle that contains the invoked kernel. If that bundle
has already been compiled for the handler’s device and compiled with the correct values
for the specialization constants, the kernel is scheduled for invocation. Otherwise, the
implementation compiles the bundle before scheduling the kernel for invocation. Therefore, applications that frequently change the values of specialization constants may see
an overhead associated with recompilation of the kernel’s bundle.4.9.5. Specialization constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is resolved for the purposes of the design discussion. I think the SYCL spec wording is still confusing :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbsmith-intel, I quoted the SYCL spec's description of specialization constant above. If you can tell me the part that seems confusing, I can update the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These statements, from 4.9.5, seem very clear.
Device code can make use of specialization constants which represent constants whose values can be set
dynamically during execution of the SYCL application. The values of these constants are fixed when a
SYCL kernel function is invoked, and they do not change during the execution of the kernel. However,
the application is able to set a new value for a specialization constant each time a kernel is invoked, so
the values can be tuned differently for each invocation.
This statement, from page 502 of the glossary, and what is quoted above in the design document is wrong/confusing:
specialization constant
A constant variable where the value is not known until compilation of the SYCL kernel function.
I think in the design document above, we should change the wording from the glossary wording to:
"SYCL 2020, 4.9.5: Device code can make use of specialization constants which represent constants whose values can be set
dynamically during execution of the SYCL application. The values of these constants are fixed when a
SYCL kernel function is invoked, and they do not change during the execution of the kernel. However,
the application is able to set a new value for a specialization constant each time a kernel is invoked, so
the values can be tuned differently for each invocation."
And I think the SYCL 2020 glossary should be updated to be consistent with 4.9.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think the SYCL 2020 glossary should be updated to be consistent with 4.9.5.
Oh yes, that should be updated. I'll fix it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Greg
|
Will update wording today. |
Added overview of new mapping design, detailed description for each component TBD.
This is still WIP, see TODOs in the document
| template<> | ||
| const char *get_spec_constant_symbolic_ID<Wrapper::float_const>() { | ||
| return "unique_name_for_Wrapper_float_const"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these template functions need to be defined inline because you might get duplicates across translation units. For example, consider a case where Wrapper is declared in a second translation unit and that second TU also uses Wrapper::float_const. That second TU will also have a definition of get_spec_constant_symbolic_ID<Wrapper::float_const>(). Unless the two definitions are inline, the linker will give a multiply-defined-symbol error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added inline in edb1586
| template<> | ||
| const char *get_spec_constant_symbolic_ID<Wrapper::float_const>() { | ||
| return "unique_name_for_Wrapper_float_const"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere (either here are below in "DPC++ Compiler: front-end") there should be a description about the requirements for the strings that are returned by get_spec_constant_symbolic_ID(). I think the requirement is:
-
If the
specialization_idvariable has internal linkage, the string must be unique across all translation units. Since the mangled name is not necessarily unique, you cannot use that. Instead, the FE probably needs to generate a GUID. -
If the
specialization_idvariable has external linkage, the string must be consistent in all translation units that reference this same variable. The mangled name would be a good choice because it will have the same value in every translation unit that references this same variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added those requirements in ce4788a. Note: I've also added a requirement that the built-in should return the same unique string if it was called twice within a single translation unit for the same variable, i.e. the compiler should preserve a mapping internal linkage variable -> GUID in order to do that.
This additional requirement is related to another conversation we have below in this PR - let's wait for some feedback from @erichkeane about what what be easier for FE to implement - stable unique names for variable with internal linkage or including integration footer into device code compilation, depending on that we can update our design document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we already decided on a integration footer, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The built-in is used twice in the codebase:
- within
kernel_handler::get_specialization_constantmethod - device code - within
handler::set_specialization_constantmethod - host code
The first one is used by sycl-post-link to identify specialization constants and generate corresponding device image properties.
The second one is used by DPC++ RT to connect specialization_ids to device image properties.
So, for (1) we call the built-in directly as we are in device code and we can rely on its availability. For (2) we can't do that, because we would like to support 3rd-party host compilers and therefore we rely on the compiler here, in particular on the fact that integration footer will provide to us pre-computed result of calling that built-in without actually putting the call into the source code.
This comes down to a requirement that even though we should get unique strings for internal variables from different translation units, we should get the same string if we call the built-in second time for the same variable within a single translation unit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, ok, I think we're on the same page. The value at host-side would be pre-computed by the device side. We can do the 'same TU, same answer' thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... maybe? that would be really rough and I'm not sure makes it unique. There are some constexpr/template tricks to cause 'counting' that might defeat it, but none I'm smart enough to write myself.
I think I like the idea of having the driver pass us a unique token per-TU during the device compile instead. That way it can just pass us some random value and have it be the same to each invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of having the driver pass some sort of unique token to each is possible, at that point we can just do away with the GUID and do the normal mangling for the internal types. Though, of course, that can be affected by macros in some cases...
This seems like a workable direction. The driver creates a GUID and passes it via a command line parameter to the device compiler. For a specialization constant with internal linkage, the device compiler generates a string "GUID" + "mangled name".
I didn't understand @erichkeane's concern about "that can be affected by macros in some cases...". Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck : A sometimes used pattern (particularly in some of the boost code generators IIRC) is to generate a single .cpp file to represent multiple TUs and execute it multiple times with predefined macros. So something like:
clang++ My_File.cpp -DALL
clang++ My_File.cpp -DSECTION1
clang++ My_File.cpp -DSECTION2
clang++ My_File.cpp -DSECTION3
etc. The nasty part about this is sometimes there are code-generation/modification steps in between each, so My_File.cpp isn't actually the same file (let alone having different sections compiled each time!). So something like:
clang++ My_File.cpp -DALL
clang++ My_File.cpp -DSECTION1
./SomeScriptThatInsertsInfoFromThePreviousThingIntoMyFile
clang++ My_File.cpp -DSECTION2
clang++ My_File.cpp -DSECTION3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think this is not a problem for the proposed design where the driver creates the GUID. In the case you describe, each invocation of "clang++" (e.g. "clang++ My_File.cpp -DALL") would generate a different GUID, so each TU will still get unique strings for internal spec constants, despite being compiled from the same source file.
Of course, another solution would be for the driver to have some new pass that created the integration footer before any invocations of the device compiler. In this case, the footer could be appended to the source file before invoking the device compiler, and then we wouldn't need __builtin_unique_ID() at all. In some ways, this seems like a cleaner design, but I presume we are concerned about extra compilation time overhead by adding a new pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me point out that anything involving full path name is not going to work from a privacy perspective. That potentially leaks private user identifiable data, such as login id.
The driver creating GUID sounds OK
| [sycl-registry]: https://www.khronos.org/registry/SYCL/ | ||
| [sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html | ||
|
|
||
| TODO: feature overview? code example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pressed to 'LGTM' for a bit here... can we fix the "TODOs" before we do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular TODO fixed in 0e756c1, but there are still at least one more left
| // }; | ||
| template<> | ||
| inline const char *get_spec_constant_symbolic_ID<int_const>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still hasn't been closed on (anonymous namespaces being 'hidden'). At the moment, we're likely going to cause a host-error.
Co-authored-by: Dmitry Vodopyanov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just a couple of comments.
Co-authored-by: Romanov Vlad <[email protected]>
…thub.com:romanovvlad/llvm into private/vromanov/sycl-2020-spec-constants-design
|
@erichkeane, I've updated FE/integration footer sections of the doc in e88f402 Could you please take a look? |
I think you mean fc2faf3? That looks accurate/correct to me. |
I think I meant 71ece59. Note: |
| doing so we can still use this non-standard built-in function and preserve | ||
| support for third-party host compilers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any instruction how a user can build an app with a third-party host compiler. Specifically - what is our recommendation how user can "include" integration footer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any instruction how a user can build an app with a third-party host compiler.
AFAIK, we have a command line option which allows to specify the host compiler and its options, but it is likely that we don't have an specific guide for that.
| }; | ||
| constexpr specialization_id<int> id_int; | ||
| struct Wraper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any description why this is needed in the doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a user code snippet which is added to demonstrate the content which should be generated by the compiler in integration footer.
| constexpr specialization_id<int> id_int; | ||
| struct Wraper { | ||
| public: | ||
| static constexpr specialization_id<A> id_A; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also mention wrappers which are being introduced in the patch which does "integration footer integration".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are mentioned in the section Ambiguous references to specialization_id
The author of the doc is @AlexeySachkov