[SYCL][CUDA] Improvements to CUDA device selection #1689

Ruyk · 2020-05-14T20:02:03Z

Prevents NVIDIA OpenCL platform to be selected by a SYCL application
NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing
Introduces device selection logic to reject devices
Changes name of NVIDIA CUDA Backend to differentiate from OpenCL
Provides better error message when SPIRV is passed to CUDA backend
Using backend types to check for CUDA backend instead of strings

Signed-off-by: Ruyman Reyes [email protected]

* Prevents NVIDIA OpenCL platform to be selected by a SYCL application * NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing * Introduces device selection logic to reject devices * Changes name of NVIDIA CUDA Backend to differentiate from OpenCL * Provides better error message when SPIRV is passed to CUDA backend * Using backend types to check for CUDA backend instead of strings Signed-off-by: Ruyman Reyes <[email protected]>

vladimirlaz · 2020-05-17T09:18:18Z

Prevents NVIDIA OpenCL platform to be selected by a SYCL application

NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing

Introduces device selection logic to reject devices

Changes name of NVIDIA CUDA Backend to differentiate from OpenCL

Provides better error message when SPIRV is passed to CUDA backend

Using backend types to check for CUDA backend instead of strings

Signed-off-by: Ruyman Reyes [email protected]

Isn't it better to use SYCL_DEVICE_ALLOWLIST (https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md) to skip NVidia OpenCL platform (the best design does not require coding:-))

sycl/source/detail/context_impl.cpp

s-kanaev · 2020-05-18T08:57:13Z

sycl/source/detail/context_impl.cpp

@@ -41,7 +41,8 @@ context_impl::context_impl(const vector_class<cl::sycl::device> Devices,
    DeviceIds.push_back(getSyclObjImpl(D)->getHandleRef());
  }

-  if (MPlatform->is_cuda()) {
+  const auto Backend = getPlugin().getBackend();


Also, why auto? See https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

From the link you reference:

Don’t “almost always” use auto, but do use auto with initializers like cast(...) or other places where the type is already obvious from the context.

In this case the type is obvious from the context.

sycl/source/detail/platform_impl.cpp

s-kanaev · 2020-05-18T09:02:01Z

sycl/source/detail/program_manager/program_manager.cpp

  // TODO: Implement `piProgramCreateWithBinary` to not require extra logic for
  //       the CUDA backend.


I believe you didn't remove this TODO because it's still an issue.

Yes, the piProgramCreateWithBinary is still not implemented in the CUDA backend (someone else is working on that patch, should be there soon!)

sycl/source/detail/program_manager/program_manager.cpp

sycl/source/device_selector.cpp

Ruyk · 2020-05-18T12:43:50Z

Prevents NVIDIA OpenCL platform to be selected by a SYCL application

NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing

Introduces device selection logic to reject devices

Changes name of NVIDIA CUDA Backend to differentiate from OpenCL

Provides better error message when SPIRV is passed to CUDA backend

Using backend types to check for CUDA backend instead of strings

Signed-off-by: Ruyman Reyes [email protected]

Isn't it better to use SYCL_DEVICE_ALLOWLIST (https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md) to skip NVidia OpenCL platform (the best design does not require coding:-))

It is not very user-friendly, because you would need to know what platforms you are enabling, whereas the only thing that you want to disable, in all cases is the NVIDIA OpenCL platform.
So after some discussions we thought it would be better to simply disable it all together, and make users life easier.
Alternatively, the opposite flag (SYCL_DEVICE_DISABLELIST) would be better, since then you can simply pass the NVIDIA OpenCL platform and allow whatever else is there.
There may be a good case for a disable list, but, for the particular case of the NVIDIA OpenCL device is better if we just ignore it in DPC++.

Signed-off-by: Ruyman Reyes <[email protected]>

Ruyk · 2020-05-18T14:33:53Z

Thanks for the review @s-kanaev , I think i've addressed all comments

s-kanaev

LGTM

smaslov-intel · 2020-05-14T20:14:34Z

sycl/source/detail/platform_impl.cpp

+        detail::getSyclObjImpl(Platform)->getPlugin().getBackend();
+    return (HasCUDA && Backend == backend::opencl);
+  };
+  return IsNVIDIAOpenCL(Platform);


It looks like you unconditionally ban OpenCL CUDA forever. Why is it OK?
Should you at least make sure that the CUDA Platform is present?

The intention is to disable NVIDIA OpenCL platform for the foreseeable future, among many reasons, because its not really needed when having the CUDA backend. See #1665 for a longer discussion about this.

Thanks for pointing to the discussion. Should we at least check that PI CUDA backend is available before shooting the OpenCL CUDA backend?

So you mean, if the DPCPP is not built with CUDA support, the NVIDIA OpenCL should still be available for device selection? That is still untested.
Maybe its better to have an env flag to disable the banned platform list and let users shoot themselves in the foot if they want. But I think that should happen on a separate PR.

Can't we use existing whitelist functionality to filter/ban this? I am OK with doing it separately.

sycl/source/device_selector.cpp

Signed-off-by: Ruyman Reyes <[email protected]>

sycl/source/detail/platform_impl.cpp

sycl/source/detail/program_manager/program_manager.cpp

sycl/source/device_selector.cpp

Signed-off-by: Ruyman Reyes <[email protected]>

kbobrovs

LGTM

Ruyk · 2020-05-21T16:31:55Z

@v-klochkov any further comments?

bader

LGTM, just a few nits.

sycl/include/CL/sycl/device_selector.hpp

sycl/source/device_selector.cpp

sycl/source/detail/program_manager/program_manager.cpp

Co-authored-by: Alexey Bader <[email protected]>

Signed-off-by: Ruyman Reyes <[email protected]>

s-kanaev

LGTM.
Just a question there out of curiosity.

s-kanaev · 2020-05-25T08:54:51Z

sycl/source/detail/program_manager/program_manager.cpp

@@ -272,8 +260,7 @@ static bool isDeviceBinaryTypeSupported(const context &C,
  }

  // OpenCL 2.1 and greater require clCreateProgramWithIL
-  backend CBackend = (detail::getSyclObjImpl(C)->getPlugin()).getBackend();
-  if ((CBackend == backend::opencl) &&
+  if ((ContextBackend == backend::opencl) &&
      C.get_platform().get_info<info::platform::version>() >= "2.1")


Just curious does this really work as it's intended to?
Will operator>= for std::string be called here?

romanovvlad · 2020-05-26T10:14:26Z

sycl/source/device_selector.cpp

                << "SYCL_PI_TRACE[all]: "
                << "  platform: " << PlatformVersion << std::endl
                << "SYCL_PI_TRACE[all]: "
                << "  device: " << DeviceName << std::endl;
    }

+    // Device is discarded if is marked with REJECT_DEVICE_SCORE
+    if (dev_score == REJECT_DEVICE_SCORE)


I think it is not quite correct. The SYCL spec(4.6.1.1 Device selector interface) says:
If a negative score is returned then the corresponding SYCL device will never be chosen.
So if a user provides a custom selector which returns -2 this issue will be still here.
I've created a PR which resolves the same issue: #1751.
Please, let me know if you want to fix the issue in your PR or we commit 1751.

@romanovvlad, are you okay if we merge this PR and rebase #1751?

I would prefer that we do not merge incorrect implementation.

I don't think this patch makes it worse than it is today.

I don't think either.

This reverts commit 7146426.

Ruyk requested review from kbobrovs and a team as code owners May 14, 2020 20:02

Ruyk requested review from smaslov-intel and v-klochkov May 14, 2020 20:02

Ruyk self-assigned this May 14, 2020

Ruyk added the cuda CUDA back-end label May 14, 2020

This was referenced May 14, 2020

[SYCL] gpu_selector not failing when no GPU devices #1679

Closed

[SYCL][CUDA] Default selector behaviour #1665

Closed

s-kanaev reviewed May 18, 2020

View reviewed changes

Ruyk added 4 commits May 18, 2020 13:01

Merge branch 'sycl' into remove-nvidia-ocl

0451bd6

[SYCL] Renamed variable as suggested by reviewers

aedaacc

Signed-off-by: Ruyman Reyes <[email protected]>

[SYCL] Promoted REJECT_DEVICE_SCORE as class member

255af95

Signed-off-by: Ruyman Reyes <[email protected]>

[SYCL] Added missing const

ec971f9

Signed-off-by: Ruyman Reyes <[email protected]>

s-kanaev previously approved these changes May 18, 2020

View reviewed changes

smaslov-intel reviewed May 18, 2020

View reviewed changes

[SYCL] Removed spurious initialization

8fd3208

Signed-off-by: Ruyman Reyes <[email protected]>

Ruyk dismissed s-kanaev’s stale review via 8fd3208 May 18, 2020 19:45

kbobrovs reviewed May 18, 2020

View reviewed changes

sycl/source/detail/platform_impl.cpp Show resolved Hide resolved

sycl/source/detail/program_manager/program_manager.cpp Outdated Show resolved Hide resolved

sycl/source/detail/program_manager/program_manager.cpp Show resolved Hide resolved

keryell reviewed May 19, 2020

View reviewed changes

sycl/source/device_selector.cpp Outdated Show resolved Hide resolved

Ruyk added 2 commits May 19, 2020 09:23

[SYCL] Score is negative for rejection

6697fff

Signed-off-by: Ruyman Reyes <[email protected]>

[SYCL] Addressing feedback from reviewers

bcd9a41

Signed-off-by: Ruyman Reyes <[email protected]>

kbobrovs previously approved these changes May 20, 2020

View reviewed changes

v-klochkov previously approved these changes May 22, 2020

View reviewed changes

bader requested a review from smaslov-intel May 22, 2020 10:10

bader previously approved these changes May 22, 2020

View reviewed changes

sycl/include/CL/sycl/device_selector.hpp Outdated Show resolved Hide resolved

sycl/source/device_selector.cpp Show resolved Hide resolved

sycl/source/detail/program_manager/program_manager.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

5c92d5f

Co-authored-by: Alexey Bader <[email protected]>

Ruyk dismissed stale reviews from bader, v-klochkov, and kbobrovs via 5c92d5f May 22, 2020 16:40

Ruyk added 2 commits May 22, 2020 16:44

[SYCL][CUDA] Moving comment to relevant part of the code

1a56184

Signed-off-by: Ruyman Reyes <[email protected]>

Removed comment from device selector

76b279b

v-klochkov approved these changes May 22, 2020

View reviewed changes

s-kanaev approved these changes May 25, 2020

View reviewed changes

bader requested a review from kbobrovs May 26, 2020 09:31

bader mentioned this pull request May 26, 2020

[SYCL] Do not select device with a negative score #1751

Merged

romanovvlad reviewed May 26, 2020

View reviewed changes

smaslov-intel approved these changes May 26, 2020

View reviewed changes

kbobrovs approved these changes May 26, 2020

View reviewed changes

bader merged commit 7146426 into intel:sycl May 27, 2020

againull added a commit to againull/llvm that referenced this pull request Jun 5, 2020

Revert "[SYCL][CUDA] Improvements to CUDA device selection (intel#1689)"

b631fd8

This reverts commit 7146426.

This was referenced Jun 8, 2020

runtime error when using CUDA plugin #1240

Closed

CUDA version simultaneously needs and cannot have OpenCL #1559

Closed

npmiller mentioned this pull request Oct 19, 2022

Using the nvidia opencl runtime #7114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Improvements to CUDA device selection #1689

[SYCL][CUDA] Improvements to CUDA device selection #1689

Ruyk commented May 14, 2020

vladimirlaz commented May 17, 2020 •

edited

Loading

s-kanaev May 18, 2020

Ruyk May 18, 2020

s-kanaev May 18, 2020

Ruyk May 18, 2020

Ruyk commented May 18, 2020

Ruyk commented May 18, 2020

s-kanaev left a comment

smaslov-intel May 14, 2020

Ruyk May 18, 2020

smaslov-intel May 18, 2020

Ruyk May 19, 2020

smaslov-intel May 26, 2020

kbobrovs left a comment

Ruyk commented May 21, 2020

bader left a comment

s-kanaev left a comment

s-kanaev May 25, 2020

romanovvlad May 26, 2020

bader May 27, 2020

romanovvlad May 27, 2020

bader May 27, 2020

romanovvlad May 27, 2020

		// TODO: Implement `piProgramCreateWithBinary` to not require extra logic for
		// the CUDA backend.

[SYCL][CUDA] Improvements to CUDA device selection #1689

[SYCL][CUDA] Improvements to CUDA device selection #1689

Conversation

Ruyk commented May 14, 2020

vladimirlaz commented May 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ruyk commented May 18, 2020

Ruyk commented May 18, 2020

s-kanaev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kbobrovs left a comment

Choose a reason for hiding this comment

Ruyk commented May 21, 2020

bader left a comment

Choose a reason for hiding this comment

s-kanaev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimirlaz commented May 17, 2020 •

edited

Loading