[SYCL] Fix SIGKILL after setting large WG sizes #2641

dm-vodopyanov · 2020-10-14T19:56:31Z

This patch fixes SIGKILL (out of memory error) caused by large number
of global work size. Now, if work group size wasn't specified in the
kernel source or IL, WGSize sets to {1, 1, 1}.

This patch fixes SIGKILL (out of memory error) caused by large number of global work size. Now, if work group size wasn't specified in the kernel source or IL, `WGSize` sets to `{1, 1, 1}`.

dm-vodopyanov · 2020-10-14T19:58:17Z

/summary:run

alexbatashev

LGTM

kbobrovs · 2020-10-15T15:47:30Z

@bader, @dm-vodopyanov
I don't agree with the fix, as users frequently let RT figure out the WG size. Which means performance will likely suffer.

bader · 2020-10-15T15:51:54Z

If I understand correctly, this patch fixes functional problem. In general it's better to have slowly working application than broken application.

Do you have better solution in mind?

kbobrovs · 2020-10-15T16:40:02Z

In general it's better to have slowly working application than broken application.

it is hard to disagree. But the fixes can different. To me it seems the problem is not in the removed code, but in the device/kernel property APIs reporting incorrect values.
If some urgent W/A is needed, then I'd suggest to add hard WG size limit - e.g. 64. Or smaller value if 64 still causes the problem.
Then continue to investigate.

bader · 2020-10-15T16:56:07Z

If some urgent W/A is needed, then I'd suggest to add hard WG size limit - e.g. 64. Or smaller value if 64 still causes the problem.
Then continue to investigate.

That's exactly what Dmitry did, isn't it? He used 1 WG size limit instead of 64, which seems to be most portable value.

dm-vodopyanov · 2020-10-16T07:22:24Z

@kbobrovs I agree that this is controversial patch and performance may be lower. It reproduces only on OpenCL FPGA Emu on the specific machine. The minimum value from WGSize1D and MaxWGSizes[2] (in our case it is WGSize1D) is correct but we are running out of memory. It could happen on any machine someday. I find 1 the safest way to get rid of the problem. OpenCL FPGA Emu developers don't expect such huge number in WG size when calling clEnqueueNDRangeKernel. Anyway, I can contact with them again and double-check can OpenCL runtime be improved to handle such situation.

kbobrovs · 2020-10-16T20:01:51Z

That's exactly what Dmitry did, isn't it? He used 1 WG size limit instead of 64, which seems to be most portable value.

It definitely isn't. And it will likely make a difference for performance.

I agree that this is controversial patch and performance may be lower. It reproduces only on OpenCL FPGA Emu on the specific machine.

It seems that are we sacrificing performance on e.g. GPU hacking around potential FPGA Emu bug. Can at least device be checked if it is accelerator before using {1,1,1}? A TODO be added as well?

alexbatashev · 2020-10-16T20:22:00Z

We can move responsibility of picking the right WG size from SYCL RT to particular plugins. IIRC, OpenCL spec allows for such behavior. Other plugins may implement more platform-specific logic to achieve better performance.

[CTS] add simple test that combines kernel launch and memcpy

[SYCL] Fix SIGKILL after setting large WG sizes

a1daace

This patch fixes SIGKILL (out of memory error) caused by large number of global work size. Now, if work group size wasn't specified in the kernel source or IL, `WGSize` sets to `{1, 1, 1}`.

dm-vodopyanov requested a review from a team as a code owner October 14, 2020 19:56

dm-vodopyanov requested review from alexbatashev, kbobrovs and romanovvlad October 14, 2020 19:56

alexbatashev approved these changes Oct 15, 2020

View reviewed changes

bader merged commit 4d76de4 into intel:sycl Oct 15, 2020

kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025

Merge pull request intel#2641 from igchor/add_test_mecpy

39fa054

[CTS] add simple test that combines kernel launch and memcpy

Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025

Merge pull request #2641 from igchor/add_test_mecpy

72d8641

[CTS] add simple test that combines kernel launch and memcpy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Fix SIGKILL after setting large WG sizes #2641

[SYCL] Fix SIGKILL after setting large WG sizes #2641

Uh oh!

dm-vodopyanov commented Oct 14, 2020

Uh oh!

dm-vodopyanov commented Oct 14, 2020

Uh oh!

alexbatashev left a comment

Uh oh!

kbobrovs commented Oct 15, 2020

Uh oh!

bader commented Oct 15, 2020

Uh oh!

kbobrovs commented Oct 15, 2020

Uh oh!

bader commented Oct 15, 2020

Uh oh!

dm-vodopyanov commented Oct 16, 2020

Uh oh!

kbobrovs commented Oct 16, 2020

Uh oh!

alexbatashev commented Oct 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SYCL] Fix SIGKILL after setting large WG sizes #2641

[SYCL] Fix SIGKILL after setting large WG sizes #2641

Uh oh!

Conversation

dm-vodopyanov commented Oct 14, 2020

Uh oh!

dm-vodopyanov commented Oct 14, 2020

Uh oh!

alexbatashev left a comment

Choose a reason for hiding this comment

Uh oh!

kbobrovs commented Oct 15, 2020

Uh oh!

bader commented Oct 15, 2020

Uh oh!

kbobrovs commented Oct 15, 2020

Uh oh!

bader commented Oct 15, 2020

Uh oh!

dm-vodopyanov commented Oct 16, 2020

Uh oh!

kbobrovs commented Oct 16, 2020

Uh oh!

alexbatashev commented Oct 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants