Skip to content

Conversation

uditagarwal97
Copy link
Contributor

@uditagarwal97 uditagarwal97 commented Sep 22, 2025

Problem

As an optimization in ext_oneapi_submit_barrier, for in-order queue, when waitlist is empty, we return the last event, if it is available. However, after #18277, we no longer store last event and instead submit a marker to get the last event. And this caused performance regression in some of our internal benchmarks.

Solution
Don't return last event and instead submit a barrier with empty waitlist.

@uditagarwal97 uditagarwal97 self-assigned this Sep 22, 2025
@uditagarwal97 uditagarwal97 marked this pull request as ready for review September 22, 2025 21:49
@uditagarwal97 uditagarwal97 requested a review from a team as a code owner September 22, 2025 21:49
@uditagarwal97
Copy link
Contributor Author

@jbrodman @vinser52 fyi.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little surprised that a marker event would be slower than actually submitting a barrier. Of course, the adapters/backends could be optimizing the barriers on in-order queues themselves, i.e. by storing the latest event themselves or by submitting markers internally. As such, I am all for dropping the complexity from the runtime.

@againull
Copy link
Contributor

I am a little surprised that a marker event would be slower than actually submitting a barrier. Of course, the adapters/backends could be optimizing the barriers on in-order queues themselves, i.e. by storing the latest event themselves or by submitting markers internally. As such, I am all for dropping the complexity from the runtime.

Yes, L0 adapter has the same optimization, that's why removing optimization from sycl level restores performance, because markers are not inserted, but optimization is still in place at adapter level.
Optimization was introduced at sycl level to have same benefit for all backends.

Comment on lines +12 to +18
# https://github.com/intel/llvm/pull/20159 prevents returning last event as an
# optimization for submitting barrier to an empty IOQ. However, the test
# actually checks whether last event is returned or not, so it needs to be
# updated to match the new behavior. ext_oneapi_submit_barrier spec doesn't
# require last event to be returned, so this is not an ABI break.
InorderQueue/in_order_ext_oneapi_submit_barrier.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xtian-github @gmlueck I'd need approval for excluding InorderQueue/in_order_ext_oneapi_submit_barrier.cpp from 6.3 ABI compatibility testing. This test checks whether last event is returned by ext_oneapi_submit_barrier() but after this PR, we no longer return last event. Returning last event is not required by ext_oneapi_submit_barrier spec, so it is not strictly an ABI break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK, but just checking to make sure I understand ... The spec for queue::ext_oneapi_submit_barrier does require that function to return an event. However, the test was checking to see if the returned event was the last event (i.e. the event that was returned from the previous submit). This is an implementation detail, not part of the specified API. Therefore, the test being excluded is not really testing the API; it's testing the implementation. Since the implementation changed, we need to change the test also.

Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming. I approved.

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclusion of "InorderQueue/in_order_ext_oneapi_submit_barrier.cpp" is OK.

@uditagarwal97 uditagarwal97 merged commit 5f4c956 into sycl Sep 24, 2025
41 of 44 checks passed
@uditagarwal97 uditagarwal97 deleted the private/udit/fix_perf_reg_last_event branch September 24, 2025 13:52
AlexeySachkov pushed a commit to AlexeySachkov/llvm that referenced this pull request Sep 29, 2025
…20159)

**Problem**

As an optimization in `ext_oneapi_submit_barrier`, for in-order queue,
when waitlist is empty, we return the last event, if it is available.
However, after intel#18277, we no longer
store last event and instead submit a marker to get the last event. And
this caused performance regression in some of our internal benchmarks.

**Solution**
Don't return last event and instead submit a barrier with empty
waitlist.
AlexeySachkov added a commit that referenced this pull request Sep 30, 2025
This is a cherry-pick of #20159 (and #20241)

**Problem**

As an optimization in `ext_oneapi_submit_barrier`, for in-order queue,
when waitlist is empty, we return the last event, if it is available.
However, after #18277, we no longer
store last event and instead submit a marker to get the last event. And
this caused performance regression in some of our internal benchmarks.

**Solution**
Don't return last event and instead submit a barrier with empty
waitlist.


Patch-by: Udit Kumar Agarwal <[email protected]>
Comment on lines -67 to -76
{
// Test cast 3 - empty queue.
std::cout << "Test 3" << std::endl;
sycl::queue EmptyQ({sycl::property::queue::in_order{}});
auto BarrierEvent = EmptyQ.ext_oneapi_submit_barrier();
assert(
BarrierEvent.get_info<sycl::info::event::command_execution_status>() ==
sycl::info::event_command_status::complete);
BarrierEvent.wait();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeySachkov Here's my line of reasoning:
IIUC, for IOQs, ext_submit_barrier() is non-blocking and returns an event, whose state will transition to complete, when all previously submitted commands to IOQ also completes. Now, when we submit a barrier to an empty IOQ, (1) should the implementation immediately return a completed event or (2) can a non-blocking implementation return an event, take some time to figure out if queue is empty or not, and if empty, transition the event to be completed?
The test checks for (1) but IIUC, spec doesn't mandate that.
Looking at UR's implementation of urEnqueueEventsWaitWithBarrierExt, it follows (2) - it returns an event and submits zeCommandListAppendSignalEvent to L0. That's why this test was flakily failing in pre-commit CI of this PR. Before this PR, at SYCL RT level, we check if queue is empty or not and according return a completed event or submit urEnqueueEventsWaitWithBarrierExt. Since getting last event and checking whether queue is empty or not is expensive and (1) is not mandated by spec, I removed the SYCL RT check along with this test.

steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Oct 1, 2025
The changes in intel#20159 removed a test
case checking that barriers on empty queues would be considered complete
immediately. This commit reintroduces it with a fix for the case.

Signed-off-by: Larsen, Steffen <[email protected]>
steffenlarsen added a commit that referenced this pull request Oct 2, 2025
…0263)

The changes in #20159 removed a test
case checking that barriers on empty queues would be considered complete
immediately. This commit reintroduces it with a fix for the case.

---------

Signed-off-by: Larsen, Steffen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants