Skip to content
This repository was archived by the owner on Oct 7, 2025. It is now read-only.

Commit 6e709d2

Browse files
author
Lu Teng
authored
[RFC] Intel GPU integration (#99)
This RFC is from Intel, following `XLA GPU` roadmap to integrate Intel GPU as in-tree device in OpenXLA. Hope to get your suggestion for this proposal, thanks!
1 parent d68f24c commit 6e709d2

File tree

2 files changed

+134
-0
lines changed

2 files changed

+134
-0
lines changed

rfcs/20231102-intel-gpu.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Integrate Intel GPU to OpenXLA
2+
| Status | Proposed |
3+
| ------ | -------- |
4+
| RFC# | [99](https://github.com/openxla/community/pull/99) |
5+
| Author(s) | Teng, Lu ([email protected]), Yang, Sheng ([email protected]), Zhoulong, Jiang ([email protected]), Jianhui, Li ([email protected]) |
6+
| Updated | 2023-11-30 |
7+
8+
## Objective
9+
`XLA GPU` is a mechanism to extend new GPU to OpenXLA as in-tree build device. This RFC is to introduce the related changes to integrate Intel GPU to `XLA GPU`.
10+
In more generic speaking, it will support `SPIRV target` based on `SYCL` runtime.
11+
12+
## Motivation
13+
Intel has experimental released [Intel® Extension for OpenXLA*](https://github.com/intel/intel-extension-for-openxla) based on `PJRT C API` to support runing applications on Intel GPU when use OpenXLA,
14+
but in-tree build is a better way to maximize the capabilities of OpenXLA and improve user experience,
15+
Intel would like to upstream the related changes inside **Intel® Extension for OpenXLA*** to OpenXLA to make Intel GPU as in-tree build device.
16+
Besides, it will extend OpenXLA to support new `SPIRV target` devices based on `SYCL` runtime.
17+
18+
## User benifit
19+
This RFC allows user to run their applications on Intel GPU directly when use OpenXLA, w/o installing any extra extensions or modifying any code.
20+
21+
## Deisgn proposal
22+
### Overview
23+
Below marked components in OpenXLA will be modified to support Intel GPU:
24+
25+
![image](https://github.com/Zantares/community/blob/tenglu/intel_gpu_rfc/rfcs/20231102-intel-gpu/structre.png?raw=true)
26+
27+
Here we would like to distinguish these components as 2 different priorities:
28+
* **P1**: Related to basic functionality and is covered by this RFC
29+
- `LLVM IR`: Basic code generator for Intel GPU
30+
- `Lib Call`: Advanced `oneDNN` library call to replace `LLVM IR` for core ops, to improve performance for Intel GPU
31+
- `XLA GPU Runtime` (based on Stream Executor): Basic runtime for Intel GPU
32+
* **P2**: Related to performance or user experience and is not covered by this RFC. We will propose new RFCs to track these features
33+
- `Global Cost Model`
34+
- `Tool` (including Profiler, Debug API, etc.)
35+
36+
In future, we will follow community to enable more **advanced code generator** rather than `LLVM IR` for Intel GPU.
37+
38+
### Code integration & binary release
39+
** **Note: Below integration method based on macro is an intermediate states,
40+
all code will be merged with other in-tree devices once the software stack is stable.** **
41+
42+
We would like to introduce a new macro `INTEL_GPU` (Tentative) for code integration:
43+
```c++
44+
#ifndef INTEL_GPU
45+
// Original functions
46+
#else
47+
// Intel GPU functions
48+
#endif
49+
```
50+
And only enable it with `config=xpu` (Tentative) in OpenXLA, to differentiate Intel GPU from other devices.
51+
In this way we separate the binary release of Intel GPU from the original OpenXLA release to minimize the impact on other in-tree devices.
52+
53+
### LLVM IR
54+
Most `LLVM IR` work in community can be reused directly, and only a few modification are needed for Intel GPU as below:
55+
* Integrate [SPIRV tranlator](https://github.com/KhronosGroup/SPIRV-LLVM-Translator). Intel GPU can't use `LLVM IR` directly and need to converted it to `SPIRV IR` by this component first
56+
* Add target specific intrinsics. Here's an example to show what the OpenXLA function [`TargetIntrinsicID()`](https://github.com/openxla/xla/blob/fb9e7064dade52134a0858a865f4be97e894bb81/xla/service/gpu/target_util.cc#L52) looks like for Intel GPU:
57+
```c++
58+
// Gets the llvm intrinsic ids on different platforms (NVPTX, AMDGPU)
59+
// corresponding to the give TargetIntrinsicID.
60+
struct TargetIntrinsics GetIntrinsic(TargetIntrinsicID intrin) {
61+
switch (intrin) {
62+
case TargetIntrinsicID::kThreadIdx: {
63+
return {
64+
llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x,
65+
llvm::Intrinsic::amdgcn_workitem_id_x,
66+
[](llvm::IRBuilder<>* b_) -> llvm::CallInst* {
67+
return EmitDeviceFunctionCall("__builtin_IB_get_local_id_x", {}, {},
68+
U32, {b_->getContext()}, b_);
69+
},
70+
};
71+
}
72+
...
73+
```
74+
* Change the index of address space. Intel GPU has no extra pass in OpenXLA to handle its address space,
75+
so it needs to use index `1` in OpenXLA function [`BuildKernelPrototype()`](https://github.com/openxla/xla/blob/main/xla/service/gpu/fusions/fusion_emitter.cc#L83C1-L116) which is different as other in-tree devices:
76+
```c++
77+
IrEmitterUnnested::KernelAndIrArrays IrEmitterUnnested::BuildKernelPrototype(
78+
absl::string_view suggested_name,
79+
absl::Span<const KernelArgument> arguments,
80+
const LaunchDimensions& launch_dimensions) {
81+
...
82+
// Create the kernel and add it to the module.
83+
llvm::LLVMContext& context = module_->getContext();
84+
llvm::FunctionType* kernel_type = llvm::FunctionType::get(
85+
/*Result=*/llvm::Type::getVoidTy(context),
86+
// SYCL: Hardcode to global device addrspace.
87+
std::vector<llvm::Type*>(
88+
kNumLlvmArgs,
89+
llvm::Type::getInt8PtrTy(b_.getInt8PtrTy()->getContext(), 1)),
90+
/*isVarArg=*/false);
91+
...
92+
```
93+
* Turn off advanced LLVM optimization pass to avoid unsupported LLVM features on Intel GPU
94+
95+
**~250 LoC** are estimated for all of `LLVM IR` changes.
96+
97+
### Lib Call
98+
Some core ops (Conv/MatMul) will be lowered to [`oneDNN`](https://github.com/oneapi-src/oneDNN) library call instead of `LLVM IR` for better performance, so `oneDNN` will be integrated as third-party depedency.
99+
Currently the lib call list is hard coded for specific core ops, and it will be combined with `Global Cost Model` in future for dynamic dispatching.
100+
101+
### XLA GPU Runtime
102+
Intel GPU is based on `SYCL` runtime from [Intel® oneAPI DPC++/C++ Compiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html),
103+
so `Intel® oneAPI DPC++/C++ Compiler` will be required as runtime environment for user to execute applications on Intel GPU.
104+
Based on current `XLA GPU` runtime implementation, we chose `Stream Executor` as runtime infrastruct and reimplemented it by `SYCL` runtime including `Allocator`, `Event`, `Executor`, `Kernel`, `Platform`, `Stream`...
105+
The initial implementation can be found in [Intel® Extension for OpenXLA*](https://github.com/intel/intel-extension-for-openxla/tree/main/xla/stream_executor/sycl). It will be addressed to align OpenXLA code style before upstreaming.
106+
107+
**~3000 LoC** are estimated for all of `XLA GPU Runtime` changes.
108+
109+
### Performance Implications
110+
We don’t expect performance impact due to this RFC. The functions described by this RFC are realized at the initialization stage.
111+
112+
### Dependencies
113+
* Build dependency:
114+
- [OneDNN](https://github.com/oneapi-src/oneDNN)
115+
- [OneMKL](https://github.com/oneapi-src/oneMKL)
116+
- [SPIRV-LLVM-Translator](https://github.com/KhronosGroup/SPIRV-LLVM-Translator)
117+
* Execution (runtime) dependency: [Intel® oneAPI DPC++/C++ Compiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html)
118+
119+
This RFC also relies on some upcoming RFCs of `XLA GPU` from OpenXLA team, so some details will be changed by these upcoming RFCs progress, E.g.:
120+
- Command Buffer: A new propasal from OpenXLA community and we haven't implemented it in **Intel® Extension for OpenXLA**.
121+
As early initialization stage, this should not block current work based on `LLVM IR` + `Thunk` + `Stream Excutor`
122+
123+
### Engineering Impact
124+
The impact to binary size / startup time / build time are minimum, but test time will be increased due to new added device.
125+
126+
The whole OpenXLA community (Intel is contributor as part of the community) will maintain this code. Intel will help to setup CI in below way to ensure project quality:
127+
* Enable CI on Intel Dev Cloud with Intel® Data Center GPU MAX Series
128+
129+
### Platforms and Environments
130+
Intel GPU hardware (with correct driver) and `Intel® oneAPI DPC++/C++ Compiler` runtime environment are required. Other dependencies are the same as original OpenXLA.
131+
132+
### Compatibility
133+
The RFC follows `XLA GPU` [roadmap](https://docs.google.com/presentation/d/1FPVjZUkTApV80TKJ-WbPvLynjIxb3sdFGwn6Qs9UCrw/edit#slide=id.g224a3cf318c_0_1047) (WW33'2023) to integrate new GPU to OpenXLA.
134+
We don't expect this proposal to impact with other parts of the OpenXLA ecosystem. In this moment it only supports basic functionality of OpenXLA and some advanced features including `Profiler` and `Debug API` are still not supported yet, they will be supported in next RFC.
29.9 KB
Loading

0 commit comments

Comments
 (0)