Skip to content

[UR] Fix some tests that are broken when run with multiple cuda devices available. #17216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 4, 2025

Conversation

aarongreig
Copy link
Contributor

@aarongreig aarongreig commented Feb 27, 2025

Also removes a test and adds known failures where appropriate (typically where the test only runs when multiple devices are available so the skip doesn't affect behaviour of single-device runs).

The test removed is cudaUrContextCreateTest.ActiveContext. This test seems to be testing the assumption that a urQueueCreate followed by urMemBufferCreate will set the active cuda context to the one associated with the context passed to those calls. Neither of these calls set the active context, this may have changed at some point as the test dates back to a PI unit test. The test currently passes as long as only one device is available because a previous urDeviceGetInfo sets the active context to the one associated with that device, which is inevitably the same as the one associated with the UR context used in the test. Since the test is based on a faulty assumption about the adapter I think we can just delete it.

@aarongreig aarongreig force-pushed the aaron/fixTestsForMultiDevRunner branch from b5c390b to 2b62065 Compare February 28, 2025 13:07
@aarongreig aarongreig marked this pull request as ready for review February 28, 2025 13:13
@aarongreig aarongreig requested review from a team as code owners February 28, 2025 13:13
@aarongreig aarongreig requested a review from npmiller February 28, 2025 13:13
@aarongreig
Copy link
Contributor Author

e2e fails are unrelated: this only touches UR test code

@lukaszstolarczuk
Copy link
Contributor

just want to confirm, after this is merged do we want to change the cuda runner to the multi-gpu one? or perhaps we want to add a separate workflow for the new runner...?

@aarongreig
Copy link
Contributor Author

just want to confirm, after this is merged do we want to change the cuda runner to the multi-gpu one? or perhaps we want to add a separate workflow for the new runner...?

I think the idea was to use it to replace the current one because the multi-gpu one is faster, maybe @pbalcer can confirm

@aarongreig
Copy link
Contributor Author

ping @intel/llvm-reviewers-cuda

@aarongreig
Copy link
Contributor Author

@intel/llvm-gatekeepers please merge, the e2e fails aren't related as this only touches UR tests

@kbenzie kbenzie merged commit a7774f2 into intel:sycl Mar 4, 2025
28 of 30 checks passed
@lukaszstolarczuk
Copy link
Contributor

FYI, enabled new runner, turned off the old one. If you see any issues please let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants