-
Notifications
You must be signed in to change notification settings - Fork 13.7k
SYCL: using graphs is configurable by environment variable and compile option #12371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ggml/src/ggml-sycl/ggml-sycl.cpp
Outdated
| if (!initialized) { | ||
| g_ggml_sycl_debug = get_sycl_env("GGML_SYCL_DEBUG", 0); | ||
| g_ggml_sycl_disable_optimize= get_sycl_env("GGML_SYCL_DISABLE_OPT", 0); | ||
| g_ggml_sycl_graphs = get_sycl_env("GGML_SYCL_GRAPHS", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is some doc here where we could add this
llama.cpp/docs/backend/SYCL.md
Line 668 in be7c303
| #### Runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the doc and renamed runtime and compile time variable names to align with similar names. Please check if it looks OK to you.
|
It's great job!
|
Rbiessy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no concern with the changes here. I tested without the graph enabled and confirmed no issues.
Thank you. This goes also to @Alcpz , whose work I've continued here.
It is currently slower, than non-graph version because of whole graph update approach, which is going to be changed into small modifications of existing graph. I'd like to have this merged in this first shape, in order to set up some internal benchmarking on llama with graphs and to split my work of making graphs version finally the fastest one.
If a function is not used outside cpp file compilation unit, then its symbol does not need to be exported and can be hidden. Newest llvm complain in a warning about lack of 'static' keyword in such a case.
You are right. This is the goal and I hope I will be able to change default values of the variable soon. |
Co-authored-by: Romain Biessy <[email protected]>
How to improve to make the SYCL graph increase performance in next step? If don't bring perforemance, that will impact the SYCL graph reputation. |
docs/backend/SYCL.md
Outdated
| | GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. | | ||
| | GGML_SYCL_DEVICE_ARCH | Optional (except for AMD) | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. | | ||
| | GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. | | ||
| | GGML_SYCL_GRAPH | OFF *(default)* \|ON *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest it enable sycl graph in building and running.
GGML_SYCL_ENABLE_GRAPH is rename to GGML_SYCL_DISABLE_GRAPH.
That will make the user experience be simple.
1 GGML_SYCL_GRAPH , open sycl graph in building and running by one variable.
2. GGML_SYCL_DISABLE_GRAPH, close sycl graph for debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of points as to consider when making this decision.
- Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.
- Running by default: As discussed in the other thread, performance is worse with SYCL-Graph enabled right now. If we enable this code path by default, users may see there application performance drop after this commit. Which is both surprising and undesirable from a user perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.
I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?
My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?
I would agree this is probably the best path forward, as you say two different switches introduces the chances of misconfiguration. I just wanted to note the experimental API point so that maintainers can make an informed choice, the impact on compilation time should be negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've merged this discussion with #12371 (comment) and decided to:
| stream->parallel_for( | ||
| sycl::nd_range<3>(num_blocks * block_size, block_size), | ||
| [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] { | ||
| [=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need to change?
Are you using official oneAPI compiler 2025.0? or internal compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not strictly needed in this PR. I believe intel::reqd_sub_group_size will become deprecated in the next compiler release. This works fine with 2025.0 so I don't mind updating this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am compiling with both: oneAPI and latest intel llvm compiler (next oneAPI candidate). Latest llvm issues deprecated warning, hence the change.
See llvm release notes: Deprecated intel::reqd_sub_group_size, the official SYCL 2020 spelling should be used instead (with sycl:: namespace).
| } | ||
| return false; | ||
| } break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After remove so many break, the code path will be changed.
Have you test with the CI?
This change will impact the CI test.
Suggest to run CI locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran our CI and got no issues, see #12371 (review)
The break are not needed as there is always a return before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.
docs/backend/SYCL.md
Outdated
| | GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. | | ||
| | GGML_SYCL_DEVICE_ARCH | Optional (except for AMD) | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. | | ||
| | GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. | | ||
| | GGML_SYCL_GRAPH | OFF *(default)* \|ON *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.
I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?
My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.
| stream->parallel_for( | ||
| sycl::nd_range<3>(num_blocks * block_size, block_size), | ||
| [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] { | ||
| [=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.
| } | ||
| return false; | ||
| } break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.
We are going to use a new feature in graphs that make it possible to update graph more often rather than recreate it. It needs some work outside llama.
Good point. I did it. |
|
OK, it's clear to me. Because the feature is not good enough to be enabled, I suggest pending this PR until the feature bring active performance impact. Thank you! |
|
Merged now as all the comments have been addressed. This will simplify the work to make SYCL-Graph more performant. There is already a comment to manage the expectations. |
Changes: