-
Notifications
You must be signed in to change notification settings - Fork 769
Rewrite OpenCL device discovery to SYCL. #1401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
35b9347
to
dbdad23
Compare
Signed-off-by: Cory Levels <[email protected]>
@@ -8,6 +8,7 @@ | |||
|
|||
#include <CL/cl.h> | |||
#include <CL/cl_ext.h> | |||
#include <CL/sycl.hpp> | |||
|
|||
#ifdef USE_PI_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need CUDA specific code now?
@@ -4,7 +4,9 @@ set(CMAKE_CXX_EXTENSIONS OFF) | |||
|
|||
add_executable(get_device_count_by_type get_device_count_by_type.cpp) | |||
add_dependencies(get_device_count_by_type ocl-headers ocl-icd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_dependencies(get_device_count_by_type ocl-headers ocl-icd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot remember why this was added but I ran into situations where the use of the OpenCL::Headers
target was not enough. I suggest to test this when removing it, e.g., by starting from a completely fresh repo with and without the OCL headers and Khronos ICD.
if (iRet == CL_SUCCESS) { | ||
deviceCount += deviceCountPart; | ||
} | ||
std::vector<platform> platforms(platform::get_platforms()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it's right approach to use SYCL here.
Instead I suggest removing get_device_count_by_type completely and updating tests so they try to run on all device types so if such a device is not available just print "Skip".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on not using SYCL RT. For example, if there is a bug in platforms or device implementation, or something triggered by the plugin discovered, all test will fail and could be very difficult to figure out what the problem is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to run all SYCL LIT testing if SYCL RT platforms/device discovery is broken, i.e. can't detect correctly what's there. I think that for the sake of SYCL LIT testing we should run it on all platforms/devices detected by SYCL RT. There should be a separate single test that verifies that SYCL RT is doing the correct discovery as compared to what's discovered by low-level run-times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on not using SYCL RT. For example, if there is a bug in platforms or device implementation, or something triggered by the plugin discovered, all test will fail and could be very difficult to figure out what the problem is.
To check that the plugin functionality or platform/device implementations are working fine, we can add a test case which checks the listed devices by calling all low level APIs and then checking it w.r.t. the devices returned from plugin. This doesn't need to be checked by all tests.
We would ideally like to start recognizing all devices from all plugins. As that code is being changed, this logic will also need to change in parallel. It is better to use the SYCL APIs, so that it is maintained at one location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it's right approach to use SYCL here.
Instead I suggest removing get_device_count_by_type completely and updating tests so they try to run on all device types so if such a device is not available just print "Skip".
I agree, the tests could be written such that they execute on all attached devices and report any error found with any attached device.
Any test needing a special device (opencl only)/ avoiding any specific device testing (like skipping cuda), should use custom device_selectors. We can create common utilities for this purpose. This approach will mean lesser maintenance and customization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check that the plugin functionality or platform/device implementations are working fine, we can add a test case which checks the listed devices by calling all low level APIs and then checking it w.r.t. the devices returned from plugin. This doesn't need to be checked by all tests.
Seems to me that, somehow, that test should run before any other, so that if that code is broken, no other test is run and we avoid a massive list of failure which can waste a lot of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if LIT has a mode to stop all further testing if a specific test breaks.
Hey Vlad, Yes I know there is some discussion about this change with @garimagu and @smaslov-intel . It was unclear to me what the next steps should be so I thought the conversation should continue here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides this PR this code looks strange at the first place.
Like someone tried for each line to do a demo of a different way/style to express a thing in C/C++...
For example line 35 & 36.
std::vector<platform> platforms(platform::get_platforms()); | ||
for (cl_uint i = 0; i < platforms.size(); i++) { | ||
std::vector<device> result = platforms[i].get_devices(device_type); | ||
deviceCount += result.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the SYCL interface is used, then the CUDA backend check could be done in this loop as well, are you planning to do that?
At the moment our LIT tests are annoted with Dealing with this could/should be part of another PR though I wonder what the approach should be? We could completely drive this from LIT parameters or env vars (or a combination of both), though then we still would need a way to tell this tool to only focus on platforms/backends that have the expected features. As an example, the following PR does a less high-level fix for the tool but adds extra backend parameters for it to focus either on OpenCL or on CUDA, and the LIT config has been adapted accordingly: #1300 |
#1300 got merged which cleans up the functionality expected from |
Co-Authored-By: Alexey Bader <[email protected]>
Still think that it does not make sense to rewrite get_device_count_by_type to use SYCL API.
|
I like the idea of makeing tests more "atomic" in the device they are testing. It feels cleaner - though it might add time overhead to run due to going through more LIT/Python (just a guess). Though even with that route, won't we need a way to query the platform we run on to understand which tests can be run, e.g., which backends are available, and what kind of devices per backend, and perhaps even what kind of extensions per device? |
Here is an example how it can be made: https://github.com/vladimirlaz/llvm-test-suite/tree/sycl_poc/SYCL/Wimpy. Target device is passed to LIT framework as a parameter. |
tag @rbegam |
In the proposal tests just early exit with message "Skipped" if they cannot find a device requested. |
Closing due to lack of activity. Feel free to reopen if needed. |
Signed-off-by: Cory Levels [email protected]