Skip to content

Conversation

@dm-vodopyanov
Copy link
Contributor

This patch fixes SIGKILL (out of memory error) caused by large number
of global work size. Now, if work group size wasn't specified in the
kernel source or IL, WGSize sets to {1, 1, 1}.

This patch fixes SIGKILL (out of memory error) caused by large number
of global work size. Now, if work group size wasn't specified in the
kernel source or IL, `WGSize` sets to `{1, 1, 1}`.
@dm-vodopyanov
Copy link
Contributor Author

/summary:run

Copy link
Contributor

@alexbatashev alexbatashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit 4d76de4 into intel:sycl Oct 15, 2020
@kbobrovs
Copy link
Contributor

@bader, @dm-vodopyanov
I don't agree with the fix, as users frequently let RT figure out the WG size. Which means performance will likely suffer.

@bader
Copy link
Contributor

bader commented Oct 15, 2020

If I understand correctly, this patch fixes functional problem. In general it's better to have slowly working application than broken application.

Do you have better solution in mind?

@kbobrovs
Copy link
Contributor

In general it's better to have slowly working application than broken application.

it is hard to disagree. But the fixes can different. To me it seems the problem is not in the removed code, but in the device/kernel property APIs reporting incorrect values.
If some urgent W/A is needed, then I'd suggest to add hard WG size limit - e.g. 64. Or smaller value if 64 still causes the problem.
Then continue to investigate.

@bader
Copy link
Contributor

bader commented Oct 15, 2020

If some urgent W/A is needed, then I'd suggest to add hard WG size limit - e.g. 64. Or smaller value if 64 still causes the problem.
Then continue to investigate.

That's exactly what Dmitry did, isn't it? He used 1 WG size limit instead of 64, which seems to be most portable value.

@dm-vodopyanov
Copy link
Contributor Author

@kbobrovs I agree that this is controversial patch and performance may be lower. It reproduces only on OpenCL FPGA Emu on the specific machine. The minimum value from WGSize1D and MaxWGSizes[2] (in our case it is WGSize1D) is correct but we are running out of memory. It could happen on any machine someday. I find 1 the safest way to get rid of the problem. OpenCL FPGA Emu developers don't expect such huge number in WG size when calling clEnqueueNDRangeKernel. Anyway, I can contact with them again and double-check can OpenCL runtime be improved to handle such situation.

@kbobrovs
Copy link
Contributor

That's exactly what Dmitry did, isn't it? He used 1 WG size limit instead of 64, which seems to be most portable value.

It definitely isn't. And it will likely make a difference for performance.

I agree that this is controversial patch and performance may be lower. It reproduces only on OpenCL FPGA Emu on the specific machine.

It seems that are we sacrificing performance on e.g. GPU hacking around potential FPGA Emu bug. Can at least device be checked if it is accelerator before using {1,1,1}? A TODO be added as well?

@alexbatashev
Copy link
Contributor

We can move responsibility of picking the right WG size from SYCL RT to particular plugins. IIRC, OpenCL spec allows for such behavior. Other plugins may implement more platform-specific logic to achieve better performance.

kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025
[CTS] add simple test that combines kernel launch and memcpy
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
[CTS] add simple test that combines kernel launch and memcpy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants