Skip to content

Conversation

@lslusarczyk
Copy link
Contributor

Changes:

  • using sycl graphs configurable by compile option and environment variable
  • fixed warnings in gml-sycl

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Mar 13, 2025
if (!initialized) {
g_ggml_sycl_debug = get_sycl_env("GGML_SYCL_DEBUG", 0);
g_ggml_sycl_disable_optimize= get_sycl_env("GGML_SYCL_DISABLE_OPT", 0);
g_ggml_sycl_graphs = get_sycl_env("GGML_SYCL_GRAPHS", 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is some doc here where we could add this

#### Runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the doc and renamed runtime and compile time variable names to align with similar names. Please check if it looks OK to you.

@NeoZhangJianyu
Copy link
Collaborator

It's great job!

  1. what's the performance data after use sycl graph?
  2. why add static in the functions?
  3. If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no concern with the changes here. I tested without the graph enabled and confirmed no issues.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 14, 2025
@lslusarczyk
Copy link
Contributor Author

It's great job!

Thank you. This goes also to @Alcpz , whose work I've continued here.

what's the performance data after use sycl graph?

It is currently slower, than non-graph version because of whole graph update approach, which is going to be changed into small modifications of existing graph. I'd like to have this merged in this first shape, in order to set up some internal benchmarking on llama with graphs and to split my work of making graphs version finally the fastest one.

why add static in the functions?

If a function is not used outside cpp file compilation unit, then its symbol does not need to be exported and can be hidden. Newest llvm complain in a warning about lack of 'static' keyword in such a case.

If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

You are right. This is the goal and I hope I will be able to change default values of the variable soon.

@lslusarczyk lslusarczyk marked this pull request as ready for review March 14, 2025 11:26
@arthw
Copy link
Contributor

arthw commented Mar 15, 2025

It's great job!

Thank you. This goes also to @Alcpz , whose work I've continued here.

what's the performance data after use sycl graph?

It is currently slower, than non-graph version because of whole graph update approach, which is going to be changed into small modifications of existing graph. I'd like to have this merged in this first shape, in order to set up some internal benchmarking on llama with graphs and to split my work of making graphs version finally the fastest one.

why add static in the functions?

If a function is not used outside cpp file compilation unit, then its symbol does not need to be exported and can be hidden. Newest llvm complain in a warning about lack of 'static' keyword in such a case.

If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

You are right. This is the goal and I hope I will be able to change default values of the variable soon.

How to improve to make the SYCL graph increase performance in next step?
Update the application code or use newer compiler/running time?

If don't bring perforemance, that will impact the SYCL graph reputation.
I suggest highlight the status in guide to avoid user misunstanding.

| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
| GGML_SYCL_DEVICE_ARCH | Optional (except for AMD) | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
| GGML_SYCL_GRAPH | OFF *(default)* \|ON *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest it enable sycl graph in building and running.
GGML_SYCL_ENABLE_GRAPH is rename to GGML_SYCL_DISABLE_GRAPH.

That will make the user experience be simple.
1 GGML_SYCL_GRAPH , open sycl graph in building and running by one variable.
2. GGML_SYCL_DISABLE_GRAPH, close sycl graph for debug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of points as to consider when making this decision.

  • Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.
  • Running by default: As discussed in the other thread, performance is worse with SYCL-Graph enabled right now. If we enable this code path by default, users may see there application performance drop after this commit. Which is both surprising and undesirable from a user perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.

I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

I would agree this is probably the best path forward, as you say two different switches introduces the chances of misconfiguration. I just wanted to note the experimental API point so that maintainers can make an informed choice, the impact on compilation time should be negligible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged this discussion with #12371 (comment) and decided to:

  • rename GGML_SYCL_ENABLE_GRAPH to GGML_SYCL_DISABLE_GRAPH (as @arthw suggested)
  • enable build graphs by default (as @EwanC and @Alcpz suggested)
  • disable running graphs by default (as @EwanC and @Alcpz , temporary contrary to @arthw until graphs perf is better than non-graph)

stream->parallel_for(
sycl::nd_range<3>(num_blocks * block_size, block_size),
[=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] {
[=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to change?
Are you using official oneAPI compiler 2025.0? or internal compiler?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not strictly needed in this PR. I believe intel::reqd_sub_group_size will become deprecated in the next compiler release. This works fine with 2025.0 so I don't mind updating this now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am compiling with both: oneAPI and latest intel llvm compiler (next oneAPI candidate). Latest llvm issues deprecated warning, hence the change.
See llvm release notes: Deprecated intel::reqd_sub_group_size, the official SYCL 2020 spelling should be used instead (with sycl:: namespace).

}
return false;
} break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After remove so many break, the code path will be changed.
Have you test with the CI?
This change will impact the CI test.
Suggest to run CI locally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran our CI and got no issues, see #12371 (review)
The break are not needed as there is always a return before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.

| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
| GGML_SYCL_DEVICE_ARCH | Optional (except for AMD) | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
| GGML_SYCL_GRAPH | OFF *(default)* \|ON *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.

I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.

stream->parallel_for(
sycl::nd_range<3>(num_blocks * block_size, block_size),
[=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] {
[=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.

}
return false;
} break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.

@lslusarczyk
Copy link
Contributor Author

How to improve to make the SYCL graph increase performance in next step? Update the application code or use newer compiler/running time?

We are going to use a new feature in graphs that make it possible to update graph more often rather than recreate it. It needs some work outside llama.

If don't bring performance, that will impact the SYCL graph reputation. I suggest highlight the status in guide to avoid user misunderstanding.

Good point. I did it.

@NeoZhangJianyu
Copy link
Collaborator

OK, it's clear to me.

Because the feature is not good enough to be enabled, I suggest pending this PR until the feature bring active performance impact.
Unused code will increase the maintain cost and technical debt.

Thank you!

@Rbiessy Rbiessy merged commit 35cae5b into ggml-org:master Mar 18, 2025
47 checks passed
@Rbiessy
Copy link
Collaborator

Rbiessy commented Mar 18, 2025

Merged now as all the comments have been addressed. This will simplify the work to make SYCL-Graph more performant. There is already a comment to manage the expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants