Skip to content

Conversation

aelovikov-intel
Copy link
Contributor

@aelovikov-intel aelovikov-intel commented Sep 26, 2025

Compilation of #include <sycl/sycl.hpp> is slow and that's especially problematic for SYCL RTC (run-time compilation). One way to overcome this is fine-grained includes that are being pursued separately. Another way is to employ clang's precompiled headers support which this PR is doing. Those two approaches can be combined, and this PR adds test-e2e/PerformanceTests/KernelCompiler/auto-pch.cpp that gives some idea of the PCH impact. The test shows PCH benefits when compiling some of the fine-grained includes on top of absolute minimum required to compiled SYCL RTC's "Hello world". From one of the CI runs:

Extra Headers Without PCH With auto-PCH
176ms 137ms 136ms 136ms 136ms 226ms 64ms 64ms 64ms 64ms
sycl/half_type.hpp 165ms 165ms 165ms 165ms 165ms 267ms 71ms 72ms 72ms 72ms
sycl/ext/oneapi/bfloat16.hpp 174ms 173ms 173ms 173ms 173ms 279ms 76ms 73ms 73ms 74ms
sycl/marray.hpp 142ms 143ms 142ms 142ms 143ms 235ms 66ms 66ms 66ms 66ms
sycl/vector.hpp 296ms 290ms 290ms 290ms 290ms 487ms 124ms 125ms 125ms 125ms
sycl/multi_ptr.hpp 278ms 278ms 276ms 275ms 274ms 441ms 125ms 125ms 125ms 125ms
sycl/builtins.hpp 537ms 533ms 531ms 531ms 531ms 883ms 218ms 218ms 219ms 218ms

It misses sycl/sycl.hpp line because that currently crashes FE when reading the generated PCH, the crash is being investigated/fixed separately.

Implementation-wise I'm reusing existing upstream clang::PrecompiledPreamble with one minor modification. It seems that PrecompiledPreamble's main usage is for things like clangd so it ignores errors in the code. I've modified it so that those errors would break pch-generation the same way normal compilation would break. I'm also not sure if we'd want that long-term, because it seems that making such "auto-pch" persistent would deviate from the upstream version of PrecompiledPreamble even more. I can imagine that in some near future we'd need to "fork" it into a separate utility. Still, seems to be fine for the first step.

Driver modifications are for the --auto-pch option support that should only be present on the SYCL RTC path and not for the regular clang invocations from the command line. I'm relatively confident those will stay in future.

@aelovikov-intel aelovikov-intel changed the title [DRAFT][SYCL RTC] Implement --auto-pch support [SYCL RTC] Introduce --auto-pch support Sep 30, 2025
@aelovikov-intel aelovikov-intel marked this pull request as ready for review September 30, 2025 20:51
Copy link
Contributor

@srividya-sundaram srividya-sundaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver changes LGTM!

Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rocking!

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cc @tahonermann for awareness.

@aelovikov-intel
Copy link
Contributor Author

I'm also not sure if we'd want that long-term, because it seems that making such "auto-pch" persistent would deviate from the upstream version of PrecompiledPreamble even more. I can imagine that in some near future we'd need to "fork" it into a separate utility. Still, seems to be fine for the first step.

I've sketched the changes to make the cache persistent, PrecompiledPreamble would only need this additional member (in both .hpp/.cpp for declaration/definition):

llvm::StringRef PrecompiledPreamble::memoryContents() const {
  return Storage->memoryContents();
}

and the SYCL RTC code would do something like this:

if (!llvm::sys::fs::exists(PCHPath)) {
  auto PCHPreamble = PrecompiledPreamble::Build(...);
  raw_fd_ostream{PCHPath, EC} << PCHPreamble->memoryContents();
}

adjustInvocation(PCHPath); // Impl mostly copy-pasted from PrecompiledPreamble.

@aelovikov-intel
Copy link
Contributor Author

@gmlueck , your approval is the only remaining blocker.

@tahonermann
Copy link
Contributor

@premanandrao should review this.

This seems like useful functionality we should consider upstreaming if we can demonstrate it to be a robust solution.

Copy link
Contributor

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice and clean. I noted one FIXME comment that should ideally be addressed/removed. Otherwise, this looks good to me.

- No persistency between invocations
- Currently there is no eviction mechanism, so application is expected to use
the option only when number of preambles is limited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an immediate concern, but future support for C++20 modules might require changes to include module imports in the preamble. Module imports don't look like preprocessor control lines, but they are. The language has rules that prevent use of macros that expand to import declarations so that recognition of them isn't dependent on preprocessing.

import std;      // Named module import
import <vector>; // Header unit import

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This seems like a great feature. I have some comments / questions on the documentation and exact behavior, though. See below.


Enable auto-detection of the preamble and use it as a pre-compiled header to
speed up subsequent compilations of TUs matching the preamble/compilation
options. Example of the code that can benefit from this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some more description here in order for people to understand what this does. I suggest something like:

Automatically create and use a precompiled header (PCH) to speed up compilation. The first time this option is passed, the compiler finds the initial set of #include directives in the compiled source string (the preamble) and creates a PCH from these header files. On subsequent compilations, if the compiled source string has the same preamble, the PCH is used instead of the header files, which speeds up compilation. If the compiled source string has a different preamble, a new PCH is generated, and that PCH can also be used to speed up subsequent compilations. These PCH files are stored internally in memory, so they do not persist from one execution of the application to the next.

The preamble ends with the first statement that is not a preprocessing directive. For example, in the code below, the preamble ends immediately before the namespace syclext = statement because this is the first statement that is not a preprocessing directive.

[Example here. I suggest an example that has a #define before the #include to illustrate that this is legal. For example, you could #define SYCL_SIMPLE_SWIZZLES.]

The compiler uses the following factors when deciding whether a previously generated PCH can be used:

  • The preamble must exactly match (including whitespace and comments).
  • The compilation options must match (including the same order).

There are also certain restrictions that the user must avoid:

  • The content of each header file in the preamble must not change from one compilation to another.
  • The header files in the preamble must not use the __DATE__ or __TIME__ macros.

I guessed at some of the details above like whitespace and the order of options. If I guessed wrong, please correct.

// Auto-detected preamble ends in the middle of `#else` and would fail to compile.
void foo() {}
#endif
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limitation isn't clear to me. Does the limitation exist whenever there is a #if / #endif? Or, is it a limitation because the preamble ends inside the body of an #if / #endif?

How is this handled? Does the user get an obvious error, so they know it's related to --auto-pch? Can we instead make this work by ending the preamble at the #if that encloses the non-directive statement?


* No support (including not reporting any errors) for `+__DATE__+`/`+__TIME__+`
macros inside auto-detected preamble (transitively in regards to the
includes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "no support" mean? Does this generate an error? If so, does the error message tell the user it's related to --auto-pch? Does it generate wrong code? Does it cache the value of the date / time in the PCH and use the cached values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I planned that as UB at this stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently generating bad code is not a good option. I think any of these would be acceptable:

  • Issue an error and abort compilation with a clear message.
  • Cache the value of the date / time in the PCH and use that in subsequent compilations that use the PCH.
  • Do not generate the PCH if it contains these macros, optionally issuing a warning message.

@gmlueck
Copy link
Contributor

gmlueck commented Oct 1, 2025

It misses sycl/sycl.hpp line because that currently crashes FE when reading the generated PCH, the crash is being investigated/fixed separately.

@tahonermann I assume your team is the one investigating this? I think this needs to be fixed before we can make a release with the --auto-pch option because the main use case (and the example in the documentation) does #include <sycl/sycl.hpp>. How should we handle this dependency? Is it safe to merge this PR with the expectation that the FE will be fixed before the next release? Or, should we hold this PR until there is a fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants