Skip to content

[SYCL][CUDA] Improvements to CUDA device selection #1689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 27, 2020

Conversation

Ruyk
Copy link
Contributor

@Ruyk Ruyk commented May 14, 2020

  • Prevents NVIDIA OpenCL platform to be selected by a SYCL application
  • NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing
  • Introduces device selection logic to reject devices
  • Changes name of NVIDIA CUDA Backend to differentiate from OpenCL
  • Provides better error message when SPIRV is passed to CUDA backend
  • Using backend types to check for CUDA backend instead of strings

Signed-off-by: Ruyman Reyes [email protected]

* Prevents NVIDIA OpenCL platform to be selected by a SYCL application
* NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing
* Introduces device selection logic to reject devices
* Changes name of NVIDIA CUDA Backend to differentiate from OpenCL
* Provides better error message when SPIRV is passed to CUDA backend
* Using backend types to check for CUDA backend instead of strings

Signed-off-by: Ruyman Reyes <[email protected]>
@Ruyk Ruyk requested review from kbobrovs and a team as code owners May 14, 2020 20:02
@Ruyk Ruyk requested review from smaslov-intel and v-klochkov May 14, 2020 20:02
@Ruyk Ruyk self-assigned this May 14, 2020
@Ruyk Ruyk added the cuda CUDA back-end label May 14, 2020
@vladimirlaz
Copy link
Contributor

vladimirlaz commented May 17, 2020

  • Prevents NVIDIA OpenCL platform to be selected by a SYCL application
  • NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing
  • Introduces device selection logic to reject devices
  • Changes name of NVIDIA CUDA Backend to differentiate from OpenCL
  • Provides better error message when SPIRV is passed to CUDA backend
  • Using backend types to check for CUDA backend instead of strings

Signed-off-by: Ruyman Reyes [email protected]

Isn't it better to use SYCL_DEVICE_ALLOWLIST (https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md) to skip NVidia OpenCL platform (the best design does not require coding:-))

@@ -41,7 +41,8 @@ context_impl::context_impl(const vector_class<cl::sycl::device> Devices,
DeviceIds.push_back(getSyclObjImpl(D)->getHandleRef());
}

if (MPlatform->is_cuda()) {
const auto Backend = getPlugin().getBackend();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the link you reference:

Don’t “almost always” use auto, but do use auto with initializers like cast(...) or other places where the type is already obvious from the context.

In this case the type is obvious from the context.

Comment on lines 89 to 90
// TODO: Implement `piProgramCreateWithBinary` to not require extra logic for
// the CUDA backend.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you didn't remove this TODO because it's still an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the piProgramCreateWithBinary is still not implemented in the CUDA backend (someone else is working on that patch, should be there soon!)

@Ruyk
Copy link
Contributor Author

Ruyk commented May 18, 2020

  • Prevents NVIDIA OpenCL platform to be selected by a SYCL application
  • NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing
  • Introduces device selection logic to reject devices
  • Changes name of NVIDIA CUDA Backend to differentiate from OpenCL
  • Provides better error message when SPIRV is passed to CUDA backend
  • Using backend types to check for CUDA backend instead of strings

Signed-off-by: Ruyman Reyes [email protected]

Isn't it better to use SYCL_DEVICE_ALLOWLIST (https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md) to skip NVidia OpenCL platform (the best design does not require coding:-))

It is not very user-friendly, because you would need to know what platforms you are enabling, whereas the only thing that you want to disable, in all cases is the NVIDIA OpenCL platform.
So after some discussions we thought it would be better to simply disable it all together, and make users life easier.
Alternatively, the opposite flag (SYCL_DEVICE_DISABLELIST) would be better, since then you can simply pass the NVIDIA OpenCL platform and allow whatever else is there.
There may be a good case for a disable list, but, for the particular case of the NVIDIA OpenCL device is better if we just ignore it in DPC++.

@Ruyk
Copy link
Contributor Author

Ruyk commented May 18, 2020

Thanks for the review @s-kanaev , I think i've addressed all comments

s-kanaev
s-kanaev previously approved these changes May 18, 2020
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

detail::getSyclObjImpl(Platform)->getPlugin().getBackend();
return (HasCUDA && Backend == backend::opencl);
};
return IsNVIDIAOpenCL(Platform);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you unconditionally ban OpenCL CUDA forever. Why is it OK?
Should you at least make sure that the CUDA Platform is present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to disable NVIDIA OpenCL platform for the foreseeable future, among many reasons, because its not really needed when having the CUDA backend. See #1665 for a longer discussion about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing to the discussion. Should we at least check that PI CUDA backend is available before shooting the OpenCL CUDA backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean, if the DPCPP is not built with CUDA support, the NVIDIA OpenCL should still be available for device selection? That is still untested.
Maybe its better to have an env flag to disable the banned platform list and let users shoot themselves in the foot if they want. But I think that should happen on a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use existing whitelist functionality to filter/ban this? I am OK with doing it separately.

kbobrovs
kbobrovs previously approved these changes May 20, 2020
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Ruyk
Copy link
Contributor Author

Ruyk commented May 21, 2020

@v-klochkov any further comments?

v-klochkov
v-klochkov previously approved these changes May 22, 2020
@bader bader requested a review from smaslov-intel May 22, 2020 10:10
bader
bader previously approved these changes May 22, 2020
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits.

@Ruyk Ruyk dismissed stale reviews from bader, v-klochkov, and kbobrovs via 5c92d5f May 22, 2020 16:40
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Just a question there out of curiosity.

@@ -272,8 +260,7 @@ static bool isDeviceBinaryTypeSupported(const context &C,
}

// OpenCL 2.1 and greater require clCreateProgramWithIL
backend CBackend = (detail::getSyclObjImpl(C)->getPlugin()).getBackend();
if ((CBackend == backend::opencl) &&
if ((ContextBackend == backend::opencl) &&
C.get_platform().get_info<info::platform::version>() >= "2.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious does this really work as it's intended to?
Will operator>= for std::string be called here?

<< "SYCL_PI_TRACE[all]: "
<< " platform: " << PlatformVersion << std::endl
<< "SYCL_PI_TRACE[all]: "
<< " device: " << DeviceName << std::endl;
}

// Device is discarded if is marked with REJECT_DEVICE_SCORE
if (dev_score == REJECT_DEVICE_SCORE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not quite correct. The SYCL spec(4.6.1.1 Device selector interface) says:
If a negative score is returned then the corresponding SYCL device will never be chosen.
So if a user provides a custom selector which returns -2 this issue will be still here.
I've created a PR which resolves the same issue: #1751.
Please, let me know if you want to fix the issue in your PR or we commit 1751.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanovvlad, are you okay if we merge this PR and rebase #1751?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that we do not merge incorrect implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this patch makes it worse than it is today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think either.

@bader bader merged commit 7146426 into intel:sycl May 27, 2020
againull added a commit to againull/llvm that referenced this pull request Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants