[UR] Fix some tests that are broken when run with multiple cuda devices available. #17216

aarongreig · 2025-02-27T11:01:32Z

Also removes a test and adds known failures where appropriate (typically where the test only runs when multiple devices are available so the skip doesn't affect behaviour of single-device runs).

The test removed is cudaUrContextCreateTest.ActiveContext. This test seems to be testing the assumption that a urQueueCreate followed by urMemBufferCreate will set the active cuda context to the one associated with the context passed to those calls. Neither of these calls set the active context, this may have changed at some point as the test dates back to a PI unit test. The test currently passes as long as only one device is available because a previous urDeviceGetInfo sets the active context to the one associated with that device, which is inevitably the same as the one associated with the UR context used in the test. Since the test is based on a faulty assumption about the adapter I think we can just delete it.

aarongreig · 2025-02-28T13:57:09Z

e2e fails are unrelated: this only touches UR test code

lukaszstolarczuk · 2025-03-03T16:53:04Z

just want to confirm, after this is merged do we want to change the cuda runner to the multi-gpu one? or perhaps we want to add a separate workflow for the new runner...?

aarongreig · 2025-03-03T16:56:45Z

just want to confirm, after this is merged do we want to change the cuda runner to the multi-gpu one? or perhaps we want to add a separate workflow for the new runner...?

I think the idea was to use it to replace the current one because the multi-gpu one is faster, maybe @pbalcer can confirm

aarongreig · 2025-03-04T10:10:20Z

ping @intel/llvm-reviewers-cuda

aarongreig · 2025-03-04T11:18:11Z

@intel/llvm-gatekeepers please merge, the e2e fails aren't related as this only touches UR tests

kbenzie · 2025-03-04T12:52:49Z

Fails are known issues:

HostInteropTask/interop-task-cuda-buffer-migrate.cpp fails flakily on unrelated changes #17026
SYCL :: Basic/accessor/accessor.cpp is failing on PVC/Battlemage with opencl:gpu #17251

lukaszstolarczuk · 2025-03-04T13:56:04Z

FYI, enabled new runner, turned off the old one. If you see any issues please let me know

Refactor cuda adapter tests to work on multi-device runner.

b410c0d

aarongreig temporarily deployed to WindowsCILock February 27, 2025 11:01 — with GitHub Actions Inactive

aarongreig temporarily deployed to WindowsCILock February 27, 2025 11:17 — with GitHub Actions Inactive

aarongreig temporarily deployed to WindowsCILock February 27, 2025 12:00 — with GitHub Actions Inactive

aarongreig temporarily deployed to WindowsCILock February 27, 2025 12:16 — with GitHub Actions Inactive

aarongreig force-pushed the aaron/fixTestsForMultiDevRunner branch from 45dac1b to b5c390b Compare February 27, 2025 13:54

aarongreig temporarily deployed to WindowsCILock February 27, 2025 13:55 — with GitHub Actions Inactive

aarongreig temporarily deployed to WindowsCILock February 27, 2025 14:19 — with GitHub Actions Inactive

Allow pseudo-multi-device tests to work in a multi-device environment.

2b62065

aarongreig force-pushed the aaron/fixTestsForMultiDevRunner branch from b5c390b to 2b62065 Compare February 28, 2025 13:07

aarongreig temporarily deployed to WindowsCILock February 28, 2025 13:08 — with GitHub Actions Inactive

aarongreig marked this pull request as ready for review February 28, 2025 13:13

aarongreig requested review from a team as code owners February 28, 2025 13:13

aarongreig requested a review from npmiller February 28, 2025 13:13

aarongreig temporarily deployed to WindowsCILock February 28, 2025 13:22 — with GitHub Actions Inactive

kbenzie approved these changes Mar 3, 2025

View reviewed changes

Seanst98 approved these changes Mar 4, 2025

View reviewed changes

kbenzie merged commit a7774f2 into intel:sycl Mar 4, 2025
28 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UR] Fix some tests that are broken when run with multiple cuda devices available. #17216

[UR] Fix some tests that are broken when run with multiple cuda devices available. #17216

aarongreig commented Feb 27, 2025 •

edited

Loading

aarongreig commented Feb 28, 2025

lukaszstolarczuk commented Mar 3, 2025

aarongreig commented Mar 3, 2025

aarongreig commented Mar 4, 2025

aarongreig commented Mar 4, 2025

kbenzie commented Mar 4, 2025

lukaszstolarczuk commented Mar 4, 2025

[UR] Fix some tests that are broken when run with multiple cuda devices available. #17216

[UR] Fix some tests that are broken when run with multiple cuda devices available. #17216

Conversation

aarongreig commented Feb 27, 2025 • edited Loading

aarongreig commented Feb 28, 2025

lukaszstolarczuk commented Mar 3, 2025

aarongreig commented Mar 3, 2025

aarongreig commented Mar 4, 2025

aarongreig commented Mar 4, 2025

kbenzie commented Mar 4, 2025

lukaszstolarczuk commented Mar 4, 2025

aarongreig commented Feb 27, 2025 •

edited

Loading