Skip to content

E2E on AWS CUDA Not finding devices #14071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Alcpz opened this issue Jun 6, 2024 · 4 comments · Fixed by #14074
Closed

E2E on AWS CUDA Not finding devices #14071

Alcpz opened this issue Jun 6, 2024 · 4 comments · Fixed by #14074
Labels
bug Something isn't working infrastructure

Comments

@Alcpz
Copy link
Contributor

Alcpz commented Jun 6, 2024

Describe the bug

E2E on AWS CUDA is currently failing for multiple PRs because the runner doesn't seem to detect a CUDA device.

Run ninja -C build-e2e check-sycl-e2e > e2e.log 2>&1
ninja: Entering directory `build-e2e'
[0/1] Running SYCL End-to-End tests
lit.py: /__w/llvm/llvm/llvm/sycl/test-e2e/lit.cfg.py:412: note: Targeted devices: ext_oneapi_cuda:gpu
lit.py: /__w/llvm/llvm/llvm/sycl/test-e2e/lit.cfg.py:580: warning: Couldn't find pre-installed AOT device compiler ocloc
lit.py: /__w/llvm/llvm/llvm/sycl/test-e2e/lit.cfg.py:577: note: Found pre-installed AOT device compiler opencl-aot
lit.py: /__w/llvm/llvm/llvm/sycl/test-e2e/lit.cfg.py:599: note: Kernel fusion extension enabled
lit.py: /__w/llvm/llvm/llvm/sycl/test-e2e/lit.cfg.py:665: error: Cannot detect device aspect for cuda:gpu
stdout:

Platforms: 0
default_selector()      : No device of requested type available. -1 (PI_ERRO...
accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
gpu_selector()          : No device of requested type available. -1 (PI_ERRO...
custom_selector(gpu)    : No device of requested type available. -1 (PI_ERRO...
custom_selector(cpu)    : No device of requested type available. -1 (PI_ERRO...
custom_selector(acc)    : No device of requested type available. -1 (PI_ERRO...

To reproduce

Run E2E on AWS CUDA action

Environment

No response

Additional context

No response

@Alcpz Alcpz added bug Something isn't working infrastructure labels Jun 6, 2024
@Alcpz
Copy link
Contributor Author

Alcpz commented Jun 6, 2024

@KornevNikita
Copy link
Contributor

FYI @aelovikov-intel

@JackAKirk
Copy link
Contributor

I think it is due to #14049
Will point back the docker image to the previous commit.

@JackAKirk
Copy link
Contributor

This points to the docker from the previous commit, and should fix these failures #14074

steffenlarsen pushed a commit that referenced this issue Jun 6, 2024
temp fix for problems from cuda 12.5 uplift that were caused by
#14049. Should fix
#14071

---------

Signed-off-by: JackAKirk <[email protected]>
ianayl pushed a commit to ianayl/sycl that referenced this issue Jun 13, 2024
temp fix for problems from cuda 12.5 uplift that were caused by
intel#14049. Should fix
intel#14071

---------

Signed-off-by: JackAKirk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants