-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Enable querying kernel's number of registers #4665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Some small comments, but otherwise it looks good.
These changes don't add anything to the OpenCL backend, which is technically fine as clGetKernelWorkGroupInfo
should fail with CL_INVALID_VALUE
when not recognizing a descriptor. However, other query functions with an extended set of descriptors (like piDeviceGetInfo
) define cases for unsupported descriptors. In principle the effect is the same, but it makes it easier to keep track of which queries are unimplemented/unsupported in the backend. Would you mind giving piKernelGetGroupInfo
a similar treatment?
@@ -250,6 +250,7 @@ enum class kernel_device_specific : cl_kernel_work_group_info { | |||
preferred_work_group_size_multiple = | |||
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, | |||
private_mem_size = CL_KERNEL_PRIVATE_MEM_SIZE, | |||
ext_num_regs = PI_KERNEL_GROUP_INFO_NUM_REGS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other extension info descriptors have ext_intel_
or ext_oneapi_
as their prefix. To keep this consistent, could this be renamed to something like ext_oneapi_num_regs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is motivated by oneAPI project then I'd also expect to see the Level-Zero support added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed ext_num_regs
to ext_codeplay_num_regs
. For Level-Zero, I don't think that it exposes this information, so I'm not sure how I could implement it in the level zero PI.
@@ -346,7 +346,8 @@ typedef enum { | |||
PI_KERNEL_GROUP_INFO_LOCAL_MEM_SIZE = CL_KERNEL_LOCAL_MEM_SIZE, | |||
PI_KERNEL_GROUP_INFO_PREFERRED_WORK_GROUP_SIZE_MULTIPLE = | |||
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, | |||
PI_KERNEL_GROUP_INFO_PRIVATE_MEM_SIZE = CL_KERNEL_PRIVATE_MEM_SIZE | |||
PI_KERNEL_GROUP_INFO_PRIVATE_MEM_SIZE = CL_KERNEL_PRIVATE_MEM_SIZE, | |||
PI_KERNEL_GROUP_INFO_NUM_REGS = 0x10112 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this one doesn't have analog in OpenCL it needs to be documented here: what exactly is being queried/returned.
Also let's follow
llvm/sycl/include/CL/sycl/detail/pi.h
Line 115 in b3977b5
PI_ERROR_UNKNOWN = -999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment to explain what is being returned. Could you explain a little more what you mean about adding new values from the top? I thought I had done this - the largest value I found was
llvm/sycl/include/CL/sycl/detail/pi.h
Line 299 in cd48622
PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES = 0x10111 |
@steffenlarsen Thanks for your feedback! I've added |
Looks good, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I've added the new symbols to the ABI. I did this automatically on Linux but I don't have access to a windows environment to add the Windows symbol, so I added it manually. I put it in lexicographical order - I hope that's right. |
This patch defines a new member of
kernel_device_specific
infoext_num_regs
which returns the number of registers used by the given compiled kernel. This is implemented for the CUDA plugin, but not for HIP or level zero.