-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL][CUDA] Improvements to CUDA device selection #1689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Prevents NVIDIA OpenCL platform to be selected by a SYCL application * NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing * Introduces device selection logic to reject devices * Changes name of NVIDIA CUDA Backend to differentiate from OpenCL * Provides better error message when SPIRV is passed to CUDA backend * Using backend types to check for CUDA backend instead of strings Signed-off-by: Ruyman Reyes <[email protected]>
Isn't it better to use SYCL_DEVICE_ALLOWLIST (https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md) to skip NVidia OpenCL platform (the best design does not require coding:-)) |
@@ -41,7 +41,8 @@ context_impl::context_impl(const vector_class<cl::sycl::device> Devices, | |||
DeviceIds.push_back(getSyclObjImpl(D)->getHandleRef()); | |||
} | |||
|
|||
if (MPlatform->is_cuda()) { | |||
const auto Backend = getPlugin().getBackend(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the link you reference:
Don’t “almost always” use auto, but do use auto with initializers like cast(...) or other places where the type is already obvious from the context.
In this case the type is obvious from the context.
// TODO: Implement `piProgramCreateWithBinary` to not require extra logic for | ||
// the CUDA backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you didn't remove this TODO
because it's still an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the piProgramCreateWithBinary
is still not implemented in the CUDA backend (someone else is working on that patch, should be there soon!)
It is not very user-friendly, because you would need to know what platforms you are enabling, whereas the only thing that you want to disable, in all cases is the NVIDIA OpenCL platform. |
Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
Thanks for the review @s-kanaev , I think i've addressed all comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
detail::getSyclObjImpl(Platform)->getPlugin().getBackend(); | ||
return (HasCUDA && Backend == backend::opencl); | ||
}; | ||
return IsNVIDIAOpenCL(Platform); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you unconditionally ban OpenCL CUDA forever. Why is it OK?
Should you at least make sure that the CUDA Platform is present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to disable NVIDIA OpenCL platform for the foreseeable future, among many reasons, because its not really needed when having the CUDA backend. See #1665 for a longer discussion about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing to the discussion. Should we at least check that PI CUDA backend is available before shooting the OpenCL CUDA backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean, if the DPCPP is not built with CUDA support, the NVIDIA OpenCL should still be available for device selection? That is still untested.
Maybe its better to have an env flag to disable the banned platform list and let users shoot themselves in the foot if they want. But I think that should happen on a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use existing whitelist functionality to filter/ban this? I am OK with doing it separately.
Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@v-klochkov any further comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits.
Co-authored-by: Alexey Bader <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just a question there out of curiosity.
@@ -272,8 +260,7 @@ static bool isDeviceBinaryTypeSupported(const context &C, | |||
} | |||
|
|||
// OpenCL 2.1 and greater require clCreateProgramWithIL | |||
backend CBackend = (detail::getSyclObjImpl(C)->getPlugin()).getBackend(); | |||
if ((CBackend == backend::opencl) && | |||
if ((ContextBackend == backend::opencl) && | |||
C.get_platform().get_info<info::platform::version>() >= "2.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious does this really work as it's intended to?
Will operator>=
for std::string
be called here?
<< "SYCL_PI_TRACE[all]: " | ||
<< " platform: " << PlatformVersion << std::endl | ||
<< "SYCL_PI_TRACE[all]: " | ||
<< " device: " << DeviceName << std::endl; | ||
} | ||
|
||
// Device is discarded if is marked with REJECT_DEVICE_SCORE | ||
if (dev_score == REJECT_DEVICE_SCORE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not quite correct. The SYCL spec(4.6.1.1 Device selector interface) says:
If a negative score is returned then the corresponding SYCL device will never be chosen.
So if a user provides a custom selector which returns -2 this issue will be still here.
I've created a PR which resolves the same issue: #1751.
Please, let me know if you want to fix the issue in your PR or we commit 1751.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanovvlad, are you okay if we merge this PR and rebase #1751?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we do not merge incorrect implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this patch makes it worse than it is today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think either.
This reverts commit 7146426.
Signed-off-by: Ruyman Reyes [email protected]