Skip to content

Conversation

@AidanBeltonS
Copy link
Contributor

Reverts #7777. This PR broke llama-bench and main as when pinned memory is allocated during the models creating the backend is not initialized. This means the g_sycl_gpu_mgr is not constructed with the relevant devices. Causing a segfault as no devices exist within the manager.

I think we should try to reintroduce #7777 in a more suitable way that addresses this issue.

@github-actions github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Jun 10, 2024
@AidanBeltonS
Copy link
Contributor Author

Ping @bashbaug, @joeatodd

@bashbaug
Copy link
Contributor

Sorry about that, how can I reproduce this issue?

Copy link
Contributor

@abhilash1910 abhilash1910 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@OuadiElfarouki
Copy link
Contributor

OuadiElfarouki commented Jun 11, 2024

Sorry about that, how can I reproduce this issue?

We've encountered this on Nvidia GPUs for both llama-bench & main, instructions to build SYCL backend for Nvidia devices can be found here : https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md#nvidia-gpu

@abhilash1910
Copy link
Contributor

@AidanBeltonS could you rebase to fix CI? Thanks

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 12, 2024
@airMeng
Copy link
Contributor

airMeng commented Jun 12, 2024

We've encountered this on Nvidia GPUs for both llama-bench & main, instructions to build SYCL backend for Nvidia devices can be found here : https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md#nvidia-gpu

I can't reproduce this on Intel GPU. could you have a deep dive why only issues exist on NVIDIA GPU? Maybe an issue to Intel SYCL team is more appropriate.

cc some SYCL mates @Nuullll

@AidanBeltonS AidanBeltonS force-pushed the revert-7777-host-usm-context-fix branch from 4e4ff76 to a9cae48 Compare June 12, 2024 15:08
@AidanBeltonS AidanBeltonS reopened this Jun 12, 2024
@AidanBeltonS
Copy link
Contributor Author

We've encountered this on Nvidia GPUs for both llama-bench & main, instructions to build SYCL backend for Nvidia devices can be found here : https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md#nvidia-gpu

I can't reproduce this on Intel GPU. could you have a deep dive why only issues exist on NVIDIA GPU? Maybe an issue to Intel SYCL team is more appropriate.

cc some SYCL mates @Nuullll

Currently working on making a reproducer. It requires a model which uses pinned memory, it should not be a backend/hardware specific problem

@AidanBeltonS
Copy link
Contributor Author

@airMeng the problem also effects intel devices. I have reproduced the error on a Data Max 1100.

To reproduce:
./bin/llama-bench -m ~/llama_models/Llama-2-7b-chat-Q4_K.gguf -ngl 77 --mmap 0

Backtrace:

| model                          |       size |     params | backend    | ngl | mmap |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | ------------: | ---------------: |
[SYCL] call ggml_init_sycl
ggml_init_sycl: GGML_SYCL_DEBUG: 0
ggml_init_sycl: GGML_SYCL_F16: yes
found 4 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|         Intel Data Center GPU Max 1100|    1.3|    448|    1024|   32| 51539M|            1.3.29138|
| 1|     [opencl:gpu:0]|         Intel Data Center GPU Max 1100|    3.0|    448|    1024|   32| 48946M|       24.13.29138.29|
| 2|     [opencl:cpu:0]|                  Intel Xeon Gold 5418Y|    3.0|      2|    8192|   64|201419M|2024.17.3.0.08_160000|
| 3|     [opencl:acc:0]|            Intel FPGA Emulation Device|    1.2|      2|67108864|   64|201419M|2024.17.3.0.08_160000|
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:448
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory

Thread 1 "llama-bench" received signal SIGSEGV, Segmentation fault.
0x00007fffead4e644 in sycl::_V1::queue::get_context() const () from /opt/slurm/intel/oneapi/2024.1.0.596/compiler/2024.1/lib/libsycl.so.7
(gdb) bt
#0  0x00007fffead4e644 in sycl::_V1::queue::get_context() const () from /opt/slurm/intel/oneapi/2024.1.0.596/compiler/2024.1/lib/libsycl.so.7
#1  0x00007fffeacfd46e in sycl::_V1::malloc_host(unsigned long, sycl::_V1::queue const&, sycl::_V1::detail::code_location const&) ()
   from /opt/slurm/intel/oneapi/2024.1.0.596/compiler/2024.1/lib/libsycl.so.7
#2  0x000000000055587a in ggml_sycl_host_malloc(unsigned long) ()
#3  0x00000000005e7f42 in ggml_backend_sycl_host_buffer_type_alloc_buffer(ggml_backend_buffer_type*, unsigned long) ()
#4  0x00000000006eeafa in alloc_tensor_range ()
#5  0x00000000006eea40 in ggml_backend_alloc_ctx_tensors_from_buft ()
#6  0x00000000006669bf in llm_load_tensors(llama_model_loader&, llama_model&, int, llama_split_mode, int, float const*, bool, bool (*)(float, void*), void*) ()
#7  0x0000000000636eb2 in llama_load_model_from_file ()
#8  0x000000000043768d in main ()

@bashbaug
Copy link
Contributor

the problem also effects intel devices. I have reproduced the error on a Data Max 1100.

Thanks, I can reproduce the error with these steps on an A750 also. Looking now...

@bashbaug
Copy link
Contributor

I suspect this change will fix the problem: #7909.

To be clear: I'm fine merging this PR (to revert #7777) if needed to get things moving again, especially if it's going to take some time to review #7909 - thanks!

joeatodd added a commit that referenced this pull request Jun 13, 2024
@airMeng airMeng closed this Jun 17, 2024
Alcpz pushed a commit to Alcpz/llama.cpp that referenced this pull request Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants