-
Notifications
You must be signed in to change notification settings - Fork 125
[HIP][CUDA] Refactor using profiling events #1634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
source/adapters/hip/queue.hpp
Outdated
// Stream used solely when profiling is enabled | ||
native_type ProfStream; | ||
bool IsProfStreamCreated{false}; | ||
std::once_flag ProfStreamFlag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep this in the queue struct? Or could we have it local to createProfilingStream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, having static std::once_flag
local to createProfilingStream
was breaking some tests, not sure why...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, having
static std::once_flag
local tocreateProfilingStream
was breaking some tests, not sure why...
Looks like this worked now.
Also I wouldn't mind if |
178dc5e
to
d0cc077
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. LGTM
ffc00a1
to
d082057
Compare
oneapi-src/unified-runtime#1634 is believed to have fixed the issues for HIP in the profiling tag extension. This commit reenables the tests for HIP. Closes intel#12904. Signed-off-by: Larsen, Steffen <[email protected]>
HIP changes:
EvBase
in HIP was moved todevice
andgetElapsedTime
function now handles the profiling events' synchronization. Also, we were usinghipStreamDefault
flag as default, but in CUDA, we useCU_STREAM_NON_BLOCKING
, this was also changed to match with cuda, commits 1-3Queue
which is only created when profiling is enabled and it is used to recordEvQueued
. This was necessary because before we were recording it on theNULL
stream and this might not be the best solution for HIP, see HIP profiling submission time query returns weird values intel/llvm#12904, commit 4CUDA changes:
intel/llvm CI: intel/llvm#13861