[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on CUDA #12995

uditagarwal97 · 2024-03-12T15:29:49Z

Describe the bug

NonUniformGroups/ballot_group_algorithms.cpp failed on self-hosted CUDA runner during SYCL Nightly testing: https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484

FAIL: SYCL :: NonUniformGroups/ballot_group_algorithms.cpp (1450 of [19](https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484#step:21:20)31)
******************** TEST 'SYCL :: NonUniformGroups/ballot_group_algorithms.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 1
/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/NonUniformGroups/ballot_group_algorithms.cpp -o /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/NonUniformGroups/ballot_group_algorithms.cpp -o /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 2
env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu  /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# executed command: env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# .---command stderr------------
# | ballot_group_algorithms.cpp.tmp.out: /__w/llvm/llvm/llvm/sycl/test-e2e/NonUniformGroups/ballot_group_algorithms.cpp:1: int main(): Assertion `AllAcc[WI] == true' failed.
# `-----------------------------
# error: command failed with exit status: -6

To reproduce

intel/llvm commit id: ad6085c

Environment

sycl-ls --verbose output:

> sycl-ls --verbose

ur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 [8](https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484#step:17:9).6 [CUDA 12.4]

Platforms: 1
Platform [#1]:
    Version  : CUDA [12](https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484#step:17:13).4
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 8.6
        Name       : NVIDIA GeForce RTX 3090
        Vendor     : NVIDIA Corporation
        Driver     : CUDA 12.4
        Aspects    : gpu fp fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_groupcl_khr_fp64 cl_khr_subgroups pi_ext_intel_devicelib_assert ur_exp_command_buffer  cl_khr_fp16  ext_oneapi_graph
        info::device::sub_group_sizes: 32
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.4]
accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.4]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.4]

Additional context

No response

The text was updated successfully, but these errors were encountered:

uditagarwal97 · 2024-03-12T15:30:16Z

@steffenlarsen FYI

JackAKirk · 2024-03-13T11:55:24Z

Cuda 12.4 is not tested yet as it was only released last week, and we don't have that machine.
If you comment out https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/NonUniformGroups/ballot_group_algorithms.cpp#L121 does it pass? I guess that "none" would also fail if "any" does, so you may want to comment that out too.

Would be useful to know if it is only the "any" that is failing.

uditagarwal97 · 2024-03-14T16:24:28Z

@JackAKirk Since we don't support CUDA 12.4 yet, I think it would be better to downgrade the CUDA version on the CI machine instead of changing the test case, to prevent issues like this in the future.

JackAKirk · 2024-03-14T17:30:09Z

@JackAKirk Since we don't support CUDA 12.4 yet, I think it would be better to downgrade the CUDA version on the CI machine instead of changing the test case, to prevent issues like this in the future.

I think that the CI is using cuda 12.1 (with an a10 gpu). Isn't the CUDA 12.4 just for your self hosted runner?

uditagarwal97 · 2024-03-14T17:33:12Z

@JackAKirk Since we don't support CUDA 12.4 yet, I think it would be better to downgrade the CUDA version on the CI machine instead of changing the test case, to prevent issues like this in the future.

I think that the CI is using cuda 12.1 (with an a10 gpu). Isn't the CUDA 12.4 just for your self hosted runner?

Yes. I mean downgrading CUDA version in the self-hosted CUDA runner.

JackAKirk · 2024-03-14T17:41:58Z

@JackAKirk Since we don't support CUDA 12.4 yet, I think it would be better to downgrade the CUDA version on the CI machine instead of changing the test case, to prevent issues like this in the future.

I think that the CI is using cuda 12.1 (with an a10 gpu). Isn't the CUDA 12.4 just for your self hosted runner?

Yes. I mean downgrading CUDA version in the self-hosted CUDA runner.

This is the first I have head of this self-hosted runner. tbh normally new cuda versions should not be an issue, although 12.4 does make some interesting changes to ptxas. When one of the systems I have access to gets 12.4 I will test it.

bader · 2024-03-14T22:08:05Z

I expect CI to use CUDA SDK 12.1 which is installed into the docker container we use.

uditagarwal97 · 2024-03-14T22:47:43Z

I expect CI to use CUDA SDK 12.1 which is installed into the docker container we use.

Yes. For now, I'll get the CUDA version downgraded on the self-hosted runner to 12.1. In future, if we decide to upgrade the CUDA version, we should do that uniformly across all CI machines, including AWS ones. We would also have to update some dockerfiles (like https://github.com/intel/llvm/blob/sycl/devops/containers/ubuntu2204_build.Dockerfile) in that case.

uditagarwal97 · 2024-04-01T02:21:49Z

Closing this issue as we have downgraded the CUDA version to 12.1 and this test failure is gone: https://github.com/intel/llvm/actions/runs/8495325861/job/23286803900

JackAKirk · 2024-04-05T09:49:17Z

I've now tested this on a100 using cuda 12.4 and the test passes.

JackAKirk · 2024-05-03T15:10:26Z

I reproduced this test failing on rtx30 series (sm_86) and a100 using cuda 12.4, ~~but passes on a100 (sm_80)~~. I'll look into it.

Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

…UDA (#14058) Fails in Nightly testing on the self-hosted CUDA runner: #12995.

…UDA (intel#14058) Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

JackAKirk · 2024-09-25T14:11:14Z

This was identified as a cuda runtime issue that was fixed in later versions of the cuda runtime and is nothing to do with dpc++, so closing the issue.

uditagarwal97 added bug Something isn't working cuda CUDA back-end labels Mar 12, 2024

uditagarwal97 self-assigned this Mar 14, 2024

AlexeySachkov added the confirmed label Mar 27, 2024

uditagarwal97 closed this as completed Apr 1, 2024

JackAKirk reopened this May 3, 2024

JackAKirk mentioned this issue May 6, 2024

[E2E][CUDA] Add barrier before all_of_group in ballot_group_algorithms test. #13661

Closed

aelovikov-intel mentioned this issue Jun 5, 2024

[SYCL][E2E] Disable NonUniformGroups/ballot_group_algorithms.cpp on CUDA #14058

Merged

aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this issue Jun 5, 2024

[SYCL][E2E] Disable NonUniformGroups/ballot_group_algorithms.cpp on CUDA

501c467

Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

aelovikov-intel added a commit that referenced this issue Jun 6, 2024

[SYCL][E2E] Disable NonUniformGroups/ballot_group_algorithms.cpp on C…

0f796bc

…UDA (#14058) Fails in Nightly testing on the self-hosted CUDA runner: #12995.

ianayl pushed a commit to ianayl/sycl that referenced this issue Jun 13, 2024

[SYCL][E2E] Disable NonUniformGroups/ballot_group_algorithms.cpp on C…

2f8a39d

…UDA (intel#14058) Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

JackAKirk closed this as completed Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on CUDA #12995

[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on CUDA #12995

uditagarwal97 commented Mar 12, 2024

uditagarwal97 commented Mar 12, 2024

JackAKirk commented Mar 13, 2024

uditagarwal97 commented Mar 14, 2024

JackAKirk commented Mar 14, 2024 •

edited

Loading

uditagarwal97 commented Mar 14, 2024

JackAKirk commented Mar 14, 2024

bader commented Mar 14, 2024

uditagarwal97 commented Mar 14, 2024

uditagarwal97 commented Apr 1, 2024

JackAKirk commented Apr 5, 2024

JackAKirk commented May 3, 2024 •

edited

Loading

JackAKirk commented Sep 25, 2024

[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on CUDA #12995

[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on CUDA #12995

Comments

uditagarwal97 commented Mar 12, 2024

Describe the bug

To reproduce

Environment

Additional context

uditagarwal97 commented Mar 12, 2024

JackAKirk commented Mar 13, 2024

uditagarwal97 commented Mar 14, 2024

JackAKirk commented Mar 14, 2024 • edited Loading

uditagarwal97 commented Mar 14, 2024

JackAKirk commented Mar 14, 2024

bader commented Mar 14, 2024

uditagarwal97 commented Mar 14, 2024

uditagarwal97 commented Apr 1, 2024

JackAKirk commented Apr 5, 2024

JackAKirk commented May 3, 2024 • edited Loading

JackAKirk commented Sep 25, 2024

JackAKirk commented Mar 14, 2024 •

edited

Loading

JackAKirk commented May 3, 2024 •

edited

Loading