-
Notifications
You must be signed in to change notification settings - Fork 125
Propose adding a new API "urKernelSuggestGroupSize" #1270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We also need to consider if CUDA and HIP could support this API. |
This seems fairly straightforward, I don't envision any issues implementing this for CUDA and HIP since we already need to be able to pick local work size when enqueuing kernels, this would use the same mechanism. |
Hi all, I drafted a PR to implement the API in #1385, would you care to review that PR? Thank you. |
Might be best to take it out of draft, then the reviewer groups will be notified @yingcong-wu. |
Sure. |
Close issue since #1385 has been merged |
Uh oh!
There was an error while loading. Please reload this page.
For the sanitizer layer, we need to calculate the total number of workgroups in urEnqueueKernelLaunch. But sometimes, users omit to pass "pLocalWorkSize" parameter, which causes us to be unable to calculate workgroups.
Therefore, I propose adding a new API "urKernelSuggestGroupSize" which will return the LocalWorkSize which is the same as urEnqueueKernelLaunch.
LevelZero has "zeKernelSuggestGroupSize", and OpenCL has "clGetKernelSuggestedLocalWorkSizeKHR". But currently, OpenCL is WIP.
LevelZero
OpenCL
Proposed prototype:
The text was updated successfully, but these errors were encountered: