-
Notifications
You must be signed in to change notification settings - Fork 768
[CUDA] ProfilingTag/profiling_queue.cpp failing on unrelated changes #14053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See #14053 Signed-off-by: Sarnie, Nick <[email protected]>
Edited the description as it looks like this affects other ProfilingTag tests as well. |
CUDA backend is currently failing the profiling tag tests due to sporadically returning times that do no correspond with the timings of relative time queries (e.g. start happening before submission) or times that are before previous events finish. This commit disables these tests while intel#14053 is being addressed. Signed-off-by: Larsen, Steffen <[email protected]>
CUDA backend is currently failing the profiling tag tests due to sporadically returning times that do no correspond with the timings of relative time queries (e.g. start happening before submission) or times that are before previous events finish. This commit disables these tests while #14053 is being addressed. Signed-off-by: Larsen, Steffen <[email protected]>
CUDA backend is currently failing the profiling tag tests due to sporadically returning times that do no correspond with the timings of relative time queries (e.g. start happening before submission) or times that are before previous events finish. This commit disables these tests while intel#14053 is being addressed. Signed-off-by: Larsen, Steffen <[email protected]>
This surprisingly still fails on CUDA with the latest changes to the timing events: oneapi-src/unified-runtime#1634 The only tests that I am able to trigger to fail are the
Those only fail when other tests are run concurrently e.i. running Since we are using now an extra stream to record the submission time (the |
Without having had a look, it only happening for default_queue.cpp and in_order_queue.cpp could point towards some differing behavior w.r.t. when profiling is enabled on the queue. This could be from either the runtime or the UR adapter, though given it only happens for CUDA I would assume the latter. That said, it could also be timing-based, so it could also be a red herring. |
I think the reason why the other two tests are not failing could be because when profiling is enabled on the queue, that triggers the I think this could be timing-based as you say. In the end, we don't enforce any dependency between submit time event and start time event so we can't guarantee they will be recorded in that order maybe. |
Describe the bug
https://github.com/intel/llvm/actions/runs/9374542664/job/25810825578
This seems to also affect other ProfilingTag tests, e.g. in_order_queue.cpp in https://github.com/intel/llvm/actions/runs/9348977102/job/25729306575.
To reproduce
No response
Environment
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: