You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml : add unified SYCL backend for Intel GPUs (ggml-org#2690)
* first update for migration
* update init_cublas
* add debug functio, commit all help code
* step 1
* step 2
* step3 add fp16, slower 31->28
* add GGML_LIST_DEVICE function
* step 5 format device and print
* step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue
* support main device is non-zero
* step7 add debug for code path, rm log
* step 8, rename all macro & func from cuda by sycl
* fix error of select non-zero device, format device list
* ren ggml-sycl.hpp -> ggml-sycl.h
* clear CMAKE to rm unused lib and options
* correct queue: rm dtct:get_queue
* add print tensor function to debug
* fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481
* summary dpct definition in one header file to replace folder:dpct
* refactor device log
* mv dpct definition from folder dpct to ggml-sycl.h
* update readme, refactor build script
* fix build with sycl
* set nthread=1 when sycl, increase performance
* add run script, comment debug code
* add ls-sycl-device tool
* add ls-sycl-device, rm unused files
* rm rear space
* dos2unix
* Update README_sycl.md
* fix return type
* remove sycl version from include path
* restore rm code to fix hang issue
* add syc and link for sycl readme
* rm original sycl code before refactor
* fix code err
* add know issue for pvc hang issue
* enable SYCL_F16 support
* align pr4766
* check for sycl blas, better performance
* cleanup 1
* remove extra endif
* add build&run script, clean CMakefile, update guide by review comments
* rename macro to intel hardware
* editor config format
* format fixes
* format fixes
* editor format fix
* Remove unused headers
* skip build sycl tool for other code path
* replace tab by space
* fix blas matmul function
* fix mac build
* restore hip dependency
* fix conflict
* ren as review comments
* mv internal function to .cpp file
* export funciton print_sycl_devices(), mv class dpct definition to source file
* update CI/action for sycl code, fix CI error of repeat/dup
* fix action ID format issue
* rm unused strategy
* enable llama_f16 in ci
* fix conflict
* fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml
* fix ci cases for unsupported data type
* revert unrelated changed in cuda cmake
remove useless nommq
fix typo of GGML_USE_CLBLAS_SYCL
* revert hip cmake changes
* fix indent
* add prefix in func name
* revert no mmq
* rm cpu blas duplicate
* fix no_new_line
* fix src1->type==F16 bug.
* pass batch offset for F16 src1
* fix batch error
* fix wrong code
* revert sycl checking in test-sampling
* pass void as arguments of ggml_backend_sycl_print_sycl_devices
* remove extra blank line in test-sampling
* revert setting n_threads in sycl
* implement std::isinf for icpx with fast math.
* Update ci/run.sh
Co-authored-by: Georgi Gerganov <[email protected]>
* Update examples/sycl/run-llama2.sh
Co-authored-by: Georgi Gerganov <[email protected]>
* Update examples/sycl/run-llama2.sh
Co-authored-by: Georgi Gerganov <[email protected]>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <[email protected]>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <[email protected]>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <[email protected]>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <[email protected]>
* add copyright and MIT license declare
* update the cmd example
---------
Co-authored-by: jianyuzh <[email protected]>
Co-authored-by: luoyu-intel <[email protected]>
Co-authored-by: Meng, Hengyu <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+10-1Lines changed: 10 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@ The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quant
63
63
- AVX, AVX2 and AVX512 support for x86 architectures
64
64
- Mixed F16 / F32 precision
65
65
- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support
66
-
- CUDA, Metal and OpenCL GPU backend support
66
+
- CUDA, Metal, OpenCL, SYCL GPU backend support
67
67
68
68
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
69
69
Since then, the project has improved significantly thanks to many contributions. This project is mainly for educational purposes and serves
@@ -599,6 +599,15 @@ Building the program with BLAS support may lead to some performance improvements
599
599
600
600
You can get a list of platforms and devices from the `clinfo -l` command, etc.
601
601
602
+
- #### SYCL
603
+
604
+
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators.
605
+
606
+
llama.cpp based on SYCL is used to support IntelGPU (DataCenterMax series, Flex series, Arc series, Built-in GPU and iGPU).
607
+
608
+
For detailed info, please refer to [llama.cpp forSYCL](README_sycl.md).
0 commit comments