Skip to content

[SYCL][CUDA] Improvements to CUDA device selection #1689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 27, 2020
4 changes: 4 additions & 0 deletions sycl/include/CL/sycl/device_selector.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ namespace sycl {
class device;

class __SYCL_EXPORT device_selector {
protected:
// SYCL 1.2.1 defines a negative score to reject a device from selection
static constexpr int REJECT_DEVICE_SCORE = -1;

public:
virtual ~device_selector() = default;

Expand Down
2 changes: 1 addition & 1 deletion sycl/plugins/cuda/pi_cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -684,7 +684,7 @@ pi_result cuda_piPlatformGetInfo(pi_platform platform,
switch (param_name) {
case PI_PLATFORM_INFO_NAME:
return getInfo(param_value_size, param_value, param_value_size_ret,
"NVIDIA CUDA");
"NVIDIA CUDA BACKEND");
case PI_PLATFORM_INFO_VENDOR:
return getInfo(param_value_size, param_value, param_value_size_ret,
"NVIDIA Corporation");
Expand Down
3 changes: 2 additions & 1 deletion sycl/source/detail/context_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ context_impl::context_impl(const vector_class<cl::sycl::device> Devices,
DeviceIds.push_back(getSyclObjImpl(D)->getHandleRef());
}

if (MPlatform->is_cuda()) {
const auto Backend = getPlugin().getBackend();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the link you reference:

Don’t “almost always” use auto, but do use auto with initializers like cast(...) or other places where the type is already obvious from the context.

In this case the type is obvious from the context.

if (Backend == backend::cuda) {
#if USE_PI_CUDA
const pi_context_properties props[] = {PI_CONTEXT_PROPERTIES_CUDA_PRIMARY,
UseCUDAPrimaryContext, 0};
Expand Down
30 changes: 29 additions & 1 deletion sycl/source/detail/platform_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,33 @@ __SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {

static bool IsBannedPlatform(platform Platform) {
// The NVIDIA OpenCL platform is currently not compatible with DPC++
// since it is only 1.2 but gets selected by default in many systems
// There is also no support on the PTX backend for OpenCL consumption,
// and there have been some internal reports.
// To avoid problems on default users and deployment of DPC++ on platforms
// where CUDA is available, the OpenCL support is disabled.
//
auto IsNVIDIAOpenCL = [](platform Platform) {
if (Platform.is_host())
return false;

const bool HasCUDA = Platform.get_info<info::platform::name>().find(
"NVIDIA CUDA") != std::string::npos;
const auto Backend =
detail::getSyclObjImpl(Platform)->getPlugin().getBackend();
const bool IsCUDAOCL = (HasCUDA && Backend == backend::opencl);
if (detail::pi::trace(detail::pi::TraceLevel::PI_TRACE_ALL) && IsCUDAOCL) {
std::cout << "SYCL_PI_TRACE[all]: "
<< "NVIDIA CUDA OpenCL platform found but is not compatible."
<< std::endl;
}
return IsCUDAOCL;
};
return IsNVIDIAOpenCL(Platform);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you unconditionally ban OpenCL CUDA forever. Why is it OK?
Should you at least make sure that the CUDA Platform is present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to disable NVIDIA OpenCL platform for the foreseeable future, among many reasons, because its not really needed when having the CUDA backend. See #1665 for a longer discussion about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing to the discussion. Should we at least check that PI CUDA backend is available before shooting the OpenCL CUDA backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean, if the DPCPP is not built with CUDA support, the NVIDIA OpenCL should still be available for device selection? That is still untested.
Maybe its better to have an env flag to disable the banned platform list and let users shoot themselves in the foot if they want. But I think that should happen on a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use existing whitelist functionality to filter/ban this? I am OK with doing it separately.

}

vector_class<platform> platform_impl::get_platforms() {
vector_class<platform> Platforms;
vector_class<plugin> Plugins = RT::initialize();
Expand All @@ -39,7 +66,8 @@ vector_class<platform> platform_impl::get_platforms() {
platform Platform = detail::createSyclObjFromImpl<platform>(
std::make_shared<platform_impl>(PiPlatform, Plugins[i]));
// Skip platforms which do not contain requested device types
if (!Platform.get_devices(ForcedType).empty())
if (!Platform.get_devices(ForcedType).empty() &&
!IsBannedPlatform(Platform))
Platforms.push_back(Platform);
}
}
Expand Down
8 changes: 0 additions & 8 deletions sycl/source/detail/platform_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -73,14 +73,6 @@ class platform_impl {
/// \return true if this SYCL platform is a host platform.
bool is_host() const { return MHostPlatform; };

bool is_cuda() const {
const string_class CUDA_PLATFORM_STRING = "NVIDIA CUDA";
const string_class PlatformName =
get_platform_info<string_class, info::platform::name>::get(MPlatform,
getPlugin());
return PlatformName == CUDA_PLATFORM_STRING;
}

/// \return an instance of OpenCL cl_platform_id.
cl_platform_id get() const {
if (is_host())
Expand Down
35 changes: 11 additions & 24 deletions sycl/source/detail/program_manager/program_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,29 +86,10 @@ static RT::PiProgram createBinaryProgram(const ContextImplPtr Context,

RT::PiProgram Program;

bool IsCUDA = false;

// TODO: Implement `piProgramCreateWithBinary` to not require extra logic for
// the CUDA backend.
Comment on lines 89 to 90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you didn't remove this TODO because it's still an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the piProgramCreateWithBinary is still not implemented in the CUDA backend (someone else is working on that patch, should be there soon!)

#if USE_PI_CUDA
// All devices in a context are from the same platform.
RT::PiDevice Device = getFirstDevice(Context);
RT::PiPlatform Platform = nullptr;
Plugin.call<PiApiKind::piDeviceGetInfo>(Device, PI_DEVICE_INFO_PLATFORM, sizeof(Platform),
&Platform, nullptr);
size_t PlatformNameSize = 0u;
Plugin.call<PiApiKind::piPlatformGetInfo>(Platform, PI_PLATFORM_INFO_NAME, 0u, nullptr,
&PlatformNameSize);
std::vector<char> PlatformName(PlatformNameSize, '\0');
Plugin.call<PiApiKind::piPlatformGetInfo>(Platform, PI_PLATFORM_INFO_NAME,
PlatformName.size(), PlatformName.data(), nullptr);
if (PlatformNameSize > 0u &&
std::strncmp(PlatformName.data(), "NVIDIA CUDA", PlatformNameSize) == 0) {
IsCUDA = true;
}
#endif // USE_PI_CUDA

if (IsCUDA) {
const auto Backend = Context->getPlugin().getBackend();
if (Backend == backend::cuda) {
// TODO: Reemplace CreateWithSource with CreateWithBinary in CUDA backend
const char *SignedData = reinterpret_cast<const char *>(Data);
Plugin.call<PiApiKind::piclProgramCreateWithSource>(Context->getHandleRef(), 1 /*one binary*/, &SignedData,
Expand Down Expand Up @@ -259,6 +240,13 @@ RetT *getOrBuild(KernelProgramCache &KPCache, KeyT &&CacheKey,

static bool isDeviceBinaryTypeSupported(const context &C,
RT::PiDeviceBinaryType Format) {
const backend ContextBackend =
detail::getSyclObjImpl(C)->getPlugin().getBackend();

// The CUDA backend cannot use SPIRV
if (ContextBackend == backend::cuda && Format == PI_DEVICE_BINARY_TYPE_SPIRV)
return false;

// All formats except PI_DEVICE_BINARY_TYPE_SPIRV are supported.
if (Format != PI_DEVICE_BINARY_TYPE_SPIRV)
return true;
Expand All @@ -272,8 +260,7 @@ static bool isDeviceBinaryTypeSupported(const context &C,
}

// OpenCL 2.1 and greater require clCreateProgramWithIL
backend CBackend = (detail::getSyclObjImpl(C)->getPlugin()).getBackend();
if ((CBackend == backend::opencl) &&
if ((ContextBackend == backend::opencl) &&
C.get_platform().get_info<info::platform::version>() >= "2.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious does this really work as it's intended to?
Will operator>= for std::string be called here?

return true;

Expand Down Expand Up @@ -337,7 +324,7 @@ RT::PiProgram ProgramManager::createPIProgram(const RTDeviceBinaryImage &Img,

if (!isDeviceBinaryTypeSupported(Context, Format))
throw feature_not_supported(
"Online compilation is not supported in this context",
"SPIR-V online compilation is not supported in this context",
PI_INVALID_OPERATION);

// Load the image
Expand Down
25 changes: 17 additions & 8 deletions sycl/source/device_selector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,28 +30,36 @@ static bool isDeviceOfPreferredSyclBe(const device &Device) {

device device_selector::select_device() const {
vector_class<device> devices = device::get_devices();
int score = -1;
int score = REJECT_DEVICE_SCORE;
const device *res = nullptr;

for (const auto &dev : devices) {
int dev_score = (*this)(dev);

if (detail::pi::trace(detail::pi::TraceLevel::PI_TRACE_ALL)) {
string_class PlatformVersion = dev.get_info<info::device::platform>()
.get_info<info::platform::version>();
string_class DeviceName = dev.get_info<info::device::name>();
std::cout << "SYCL_PI_TRACE[all]: "
<< "select_device(): -> score = " << score << std::endl
<< "select_device(): -> score = " << score
<< ((score == REJECT_DEVICE_SCORE) ? "(REJECTED)" : " ")
<< std::endl
<< "SYCL_PI_TRACE[all]: "
<< " platform: " << PlatformVersion << std::endl
<< "SYCL_PI_TRACE[all]: "
<< " device: " << DeviceName << std::endl;
}

// Device is discarded if is marked with REJECT_DEVICE_SCORE
if (dev_score == REJECT_DEVICE_SCORE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not quite correct. The SYCL spec(4.6.1.1 Device selector interface) says:
If a negative score is returned then the corresponding SYCL device will never be chosen.
So if a user provides a custom selector which returns -2 this issue will be still here.
I've created a PR which resolves the same issue: #1751.
Please, let me know if you want to fix the issue in your PR or we commit 1751.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanovvlad, are you okay if we merge this PR and rebase #1751?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that we do not merge incorrect implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this patch makes it worse than it is today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think either.

continue;

// SYCL spec says: "If more than one device receives the high score then
// one of those tied devices will be returned, but which of the devices
// from the tied set is to be returned is not defined". Here we give a
// preference to the device of the preferred BE.
//
if (score < dev_score ||
if ((score < dev_score) ||
(score == dev_score && isDeviceOfPreferredSyclBe(dev))) {
res = &dev;
score = dev_score;
Expand Down Expand Up @@ -79,7 +87,7 @@ device device_selector::select_device() const {

int default_selector::operator()(const device &dev) const {

int Score = -1;
int Score = REJECT_DEVICE_SCORE;

// Give preference to device of SYCL BE.
if (isDeviceOfPreferredSyclBe(dev))
Expand All @@ -102,7 +110,8 @@ int default_selector::operator()(const device &dev) const {
}

int gpu_selector::operator()(const device &dev) const {
int Score = -1;
int Score = REJECT_DEVICE_SCORE;

if (dev.is_gpu()) {
Score = 1000;
// Give preference to device of SYCL BE.
Expand All @@ -113,7 +122,7 @@ int gpu_selector::operator()(const device &dev) const {
}

int cpu_selector::operator()(const device &dev) const {
int Score = -1;
int Score = REJECT_DEVICE_SCORE;
if (dev.is_cpu()) {
Score = 1000;
// Give preference to device of SYCL BE.
Expand All @@ -124,7 +133,7 @@ int cpu_selector::operator()(const device &dev) const {
}

int accelerator_selector::operator()(const device &dev) const {
int Score = -1;
int Score = REJECT_DEVICE_SCORE;
if (dev.is_accelerator()) {
Score = 1000;
// Give preference to device of SYCL BE.
Expand All @@ -135,7 +144,7 @@ int accelerator_selector::operator()(const device &dev) const {
}

int host_selector::operator()(const device &dev) const {
int Score = -1;
int Score = REJECT_DEVICE_SCORE;
if (dev.is_host()) {
Score = 1000;
// Give preference to device of SYCL BE.
Expand Down
12 changes: 12 additions & 0 deletions sycl/tools/get_device_count_by_type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

#include <algorithm>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <sstream>
#include <string>
Expand Down Expand Up @@ -88,6 +89,17 @@ static bool queryOpenCL(cl_device_type deviceType, cl_uint &deviceCount,
}

for (cl_uint i = 0; i < platformCount; i++) {
const size_t MAX_PLATFORM_VENDOR = 100u;
char info[MAX_PLATFORM_VENDOR];
// get platform attribute value
clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, MAX_PLATFORM_VENDOR,
info, NULL);
const auto IsNVIDIAOpenCL = strstr(info, "NVIDIA") != NULL;
if (IsNVIDIAOpenCL) {
// Ignore NVIDIA OpenCL platform for testing
continue;
}

cl_uint deviceCountPart = 0;
iRet =
clGetDeviceIDs(platforms[i], deviceType, 0, nullptr, &deviceCountPart);
Expand Down