Skip to content

[SYCL][CUDA] Default selector behaviour #1665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Ruyk opened this issue May 11, 2020 · 6 comments
Closed

[SYCL][CUDA] Default selector behaviour #1665

Ruyk opened this issue May 11, 2020 · 6 comments
Assignees
Labels
cuda CUDA back-end runtime Runtime library related issue

Comments

@Ruyk
Copy link
Contributor

Ruyk commented May 11, 2020

There were some interesting discussions about the behavior of the SYCL device selector on - the now closed - PR #1543.
I think is better if we discuss this particular topic separately and make a decision on what is better for the community.
The current default device selection works like this:

  • [A] If there is no CUDA device on the system:
  1. Give scores to devices (I think: GPU > CPU > FPGA > host)
  2. Return device with highest score
  • [B] If there is a CUDA platform on the system, and no SYCL_BE is passed, or SYCL_BE=PI_OPENCL
  1. The OpenCL backend is preferred, so the NVIDIA OpenCL platform GPU is returned first
  2. User gets invalid triple error, since default compilation doesn't generate the binary format. If using the CUDA triple, still fails with different errors we haven't investigated.
  • [C] If there is a CUDA platform on the system, and the SYCL_BE CUDA is passed:
  1. The CUDA backend is preffered, so the PI CUDA backend is returned.
  2. The default compilation will fail, since the user hasn't pass the right triple. If the user has passed the right triple for the CUDA backend, the program runs.

On Codeplay side, we get a lot of feedback and user questions from people accidentally running their SYCL application on an NVIDIA OpenCL platform and getting strange failures.
We recommend users to remove the NVIDIA OpenCL icd from the system so its not used on device selection, and only the PI CUDA backend is available for DPC++ applications.
This is not always possible, since not everyone has permission to edit the file. There are environmental flags that can be used when using the Khronos ICD loader, such as the OPENCL_ICD_VENDORS that alters the path from where the ICD files are loaded, but is not a practical solution.
For that, we propose to remove the NVIDIA OpenCL platform from the device selection all together, so users don't accidentally use a platform that won't work with the default configuration.
We do not want to remove any other OpenCL 1.2 platform since others may work by default (e.g., POCL or ComputeAorta).

After users have dealt with their NVIDIA OpenCL problems, still remains the issue of selecting the CUDA backend.
When the NVIDIA OpenCL platform is removed from the system, and there is no SYCL_BE preference, the CUDA backend, its selected first on default_selector (or even in gpu_selector), since it exposes the GPU. This causes problems to users that are deploying a SYCL application with a default selector on a system that has an NVIDIA GPU, because it will be selected first. This fails if the application has not been built with the right triple for the CUDA backend.
When using the SYCL_BE=PI_OPENCL , this problems goes away as the CUDA backend is not selected.

To prevent users from accidentally triggering the selection of the NVIDIA CUDA backend, we suggested on the ill-fated PR to make the selection of the CUDA backend explicit. CUDA backend will not be used on default (or GPU) selection unless SYCL_BE=PI_CUDA is exported.
If it is, then CUDA device will likely be selected first on default or device selection, but user has opted in for this, so its not accidental.
This shouldn't prevent selection of other devices (e.g., when using the cpu_selector or accelerator_selector), so multiple devices on the same SYCL application is still possible.

Note that all of this is for the default device selection, and users can still write their own custom device selectors to bypass this. If a user wants to use a CUDA device, she can write a CUDA selector that will force usage of a CUDA device or bail out.

To summarize (and conclude):

  1. I'll make a PR to expunge NVIDIA OpenCL platform from device selection all together (Won't be possible even to write a custom device selector to use it).
  2. I'll make a PR to make usage of CUDA backend on default selectors an "opt-in". This won't affect custom selectors.

Does this seems a good approach? Are there any alternatives proposed?

@Ruyk
Copy link
Contributor Author

Ruyk commented May 11, 2020

Ping @hiaselhans

@hiaselhans
Copy link
Contributor

hiaselhans commented May 11, 2020

thanks for involving me on this @Ruyk :)

to point 2, i don't have a lot to add, i just want to verify current upstream behaviour:

  • SYCL_BE is empty: opencl and cuda backend are available
  • SYCL_BE=PI_OPENCL: only opencl backend available
  • SYCL_BE=PI_CUDA: only cuda backend

I don't have a strong preference on how to do it but i think we should enable all of these choices. So having cuda as opt-in should still provide a way to have both: cuda and opencl available.

1: it might be a good timing to hide (or put a low rating on-) other unsupported devices too. You are right to not hide 1.2 devices alltogether but i think isDeviceBinaryTypeSupported's logic is quite well suited and enforced at a later point anyways..?

static bool isDeviceBinaryTypeSupported(const context &C,

@bader bader added the cuda CUDA back-end label May 12, 2020
@bader bader assigned romanovvlad and unassigned bader May 12, 2020
@Ruyk
Copy link
Contributor Author

Ruyk commented May 15, 2020

#1689 deals with removal of NVIDIA OpenCL from platform list, includes a better error message on isDeviceBinaryTypeSupported for CUDA backend case.

@romanovvlad
Copy link
Contributor

This fails if the application has not been built with the right triple for the CUDA backend.
I'll make a PR to expunge NVIDIA OpenCL platform from device selection all together (Won't be possible even to write a custom device selector to use it).

This can prevent users to use NVIDIA OpenCL with online compilation(aka sycl::program::build_with_source). Do not think a lot of people user it, so I support this suggestion. Probably we could use SYCL_DEVICE_ALLOWLIST config which would be set in sycl.conf(https://github.com/intel/llvm/blob/sycl/sycl/source/detail/config.cpp#L25) which would need to be created for this, so the user will be able to override it if he really wants.

I'll make a PR to make usage of CUDA backend on default selectors an "opt-in". This won't affect custom selectors.

I'm not a proper SYCL user, but I think I would be confused if I need to set some env var to just use working target. I remember there was a suggestion to have a driver option which enables build of the device code for both SPIRV and PTX triples; and advertise it everywhere in the examples, so novice users have most "portable" binaries.

@npmiller
Copy link
Contributor

The PR: #6203 should partially address point 2 by giving extra preference to devices with available images in the fat binary.

So for example, a program with only a SPIR-V image, will prefer any device with SPIR-V support over the CUDA backend GPU device, so we shouldn't need the "opt-in" as simply using the CUDA triple or not during compilation will influence device selection.

Note that with this patch it may still incorrectly select the CUDA backend if there is no other suitable devices, and won't fall back on the host device, but I'm not sure if that would be desirable.

pvchupin pushed a commit that referenced this issue Jun 7, 2022
This patch solves:

* [SYCL] Default selector should filter devices based on available device images #2004
* [SYCL][CUDA] Default selector behaviour #1665

In some cases the current selector may select a device for which we don't have an AOT or SPIR-V binary for, this patch ensures that such devices get skipped.
@AerialMantis
Copy link
Contributor

Since #6203 has been merged I think this issue can be closed. Support for Nvidia GPUs via the OpenCL backend is still disabled, so if we were to enable this in the future we may want to revisit this again, but in that case we could re-open this ticket or create a new ticket.

preethi-intel pushed a commit to preethi-intel/llvm that referenced this issue Oct 26, 2022
Add the `n` argument in `vloadn.spvasm`, `vload_halfn.spvasm`, and
`vloada_halfn.spvasm`.  Add a trailing `(` to the CL20 LLVM IR
patterns.

Add extra integer arguments in `OpImageWrite.cl` which hold the Image
Operands `SignExtend`/`ZeroExtend`.

Add the accumulator arguments in `SPV_KHR_integer_dot_product-sat.ll`.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@9f00637
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end runtime Runtime library related issue
Projects
None yet
Development

No branches or pull requests

6 participants