-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL] Implement event_profiling::command_submit for level-zero #7403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Rauf, Rana <[email protected]>
Signed-off-by: Rauf, Rana <[email protected]>
Signed-off-by: Rauf, Rana <[email protected]>
Signed-off-by: Rauf, Rana <[email protected]>
Not sure the approach is correct.
So, it sounds like the
The What about having something like: In
In the
In the
|
Signed-off-by: Rauf, Rana <[email protected]>
Signed-off-by: Rauf, Rana <[email protected]>
Signed-off-by: Rauf, Rana <[email protected]>
I don't fully understand what it is trying to do and why, please elaborate. How are currently returned times not OK? |
Currently, level-zero constantly returns 0 when querying for command submission time. Which is very misleading |
I am rather asking about the code in piDeviceTime::get |
Ohh, sorry I misunderstood. All
I'm not sure if I understand correctly what you mean. If you're asking at what point it's called when submitting a command to L0; for that, I've set the arbitrary point of right before the call to |
Signed-off-by: Rauf, Rana <[email protected]>
@smaslov-intel If you mean in what functions it should be used in. It should be where commands are being enqueued such as piEnqueueEventsWaitWithBarrier, piEnqueueKernelLaunch, etc. |
@@ -5408,6 +5408,11 @@ piEnqueueKernelLaunch(pi_queue Queue, pi_kernel Kernel, pi_uint32 WorkDim, | |||
// reference count on the kernel, using the kernel saved in CommandData. | |||
PI_CALL(piKernelRetain(Kernel)); | |||
|
|||
auto res=Queue->Device->getSubmitTime(*Event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be moved to executeCommandList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to add a pi_event parameter to executeCommandList
and executeOpenCommandList
. Is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you just record ALL events in the command-list? Maybe we should actually do this at batch submission?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you just record ALL events in the command-list?
Wouldn't it be redundant to record all events as their profiling information wouldn't be visible to the user?
Maybe we should actually do this at batch submission?
I might be wrong but doesn't executeCommandList
submit to the command to a batch if were not using immediatecommandlist ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The executeCommandList
does add commands to a batch, but closes it and submits if the batch is full. This is the time when I think you should record the "command_submit" time. All events in the command-list are from these enqueue interfaces, so you are already recording submit time for all of the commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to SYCL specifications, the submission time has to be calculated before the submit method returns. If we record the submission time when the batch is submitted, can't we run into a possibility that the time is recorded after queue.submit()
returns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right here, still you'd do it inside executeCommandList
There seems to be an with issue with this design and SYCL RT as well, as @romanovvlad highlighted. The following code in hanging indefinetly under the current design:
Seems like |
I've lost track of where this is going.
There is no "waiting" in the L0 Plugin, just querying: llvm/sycl/plugins/level_zero/pi_level_zero.cpp Line 5780 in 7b47ebb
|
There is no waiting in the plugin layer but there is at SYCL runtime, where all specializations of |
So, are you going to remove the wait() since it's against the spec? Why "0" is returned always after that? |
Yes, I'll prepare a separate PR to resolve that issue soon.
I'm not sure. I still need to figure that out. |
Seems like the host_accessor is blocking the enqueue of the kernel in "Command1" (from the example in my earlier post), which in turn queries the submit time |
This feature implements calculating and returning the submission time of a command group associated with an event in level-zero. The submission time calculated by querying the device's wall-clock time right before a commandlist is submitted for execution.
Signed-off-by: Rauf, Rana [email protected]