Skip to content

Propose adding a new API "urKernelSuggestGroupSize" #1270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AllanZyne opened this issue Jan 23, 2024 · 6 comments
Closed

Propose adding a new API "urKernelSuggestGroupSize" #1270

AllanZyne opened this issue Jan 23, 2024 · 6 comments
Labels
specification Changes or additions to the specification

Comments

@AllanZyne
Copy link
Contributor

AllanZyne commented Jan 23, 2024

For the sanitizer layer, we need to calculate the total number of workgroups in urEnqueueKernelLaunch. But sometimes, users omit to pass "pLocalWorkSize" parameter, which causes us to be unable to calculate workgroups.

Therefore, I propose adding a new API "urKernelSuggestGroupSize" which will return the LocalWorkSize which is the same as urEnqueueKernelLaunch.

LevelZero has "zeKernelSuggestGroupSize", and OpenCL has "clGetKernelSuggestedLocalWorkSizeKHR". But currently, OpenCL is WIP.

LevelZero

ze_result_t ZE_APICALL
zeKernelSuggestGroupSize(
    ze_kernel_handle_t hKernel,                     ///< [in] handle of the kernel object
    uint32_t globalSizeX,                           ///< [in] global width for X dimension
    uint32_t globalSizeY,                           ///< [in] global width for Y dimension
    uint32_t globalSizeZ,                           ///< [in] global width for Z dimension
    uint32_t* groupSizeX,                           ///< [out] recommended size of group for X dimension
    uint32_t* groupSizeY,                           ///< [out] recommended size of group for Y dimension
    uint32_t* groupSizeZ                            ///< [out] recommended size of group for Z dimension
    );

OpenCL

cl_int clGetKernelSuggestedLocalWorkSizeKHR(
cl_command_queue command_queue,
cl_kernel kernel,
cl_uint work_dim,
const size_t* global_work_offset,
const size_t* global_work_size,
size_t* suggested_local_work_size);

Proposed prototype:

ur_result_t UR_APICALL urKernelSuggestGroupSize(
    ur_queue_handle_t hQueue, ur_kernel_handle_t hKernel, uint32_t workDim,
    const size_t *pGlobalWorkOffset, const size_t *pGlobalWorkSize,
    size_t *pSuggestedLocalWorkSize);
@AllanZyne AllanZyne changed the title Propose adding a new API "urGetKernelSuggestedLocalWorkSize" Propose adding a new API "urKernelSuggestGroupSize" Jan 23, 2024
@kbenzie
Copy link
Contributor

kbenzie commented Jan 23, 2024

We also need to consider if CUDA and HIP could support this API.

@kbenzie kbenzie added the needs-discussion This needs further discussion label Jan 23, 2024
@npmiller
Copy link
Contributor

This seems fairly straightforward, I don't envision any issues implementing this for CUDA and HIP since we already need to be able to pick local work size when enqueuing kernels, this would use the same mechanism.

@yingcong-wu
Copy link
Contributor

Hi all, I drafted a PR to implement the API in #1385, would you care to review that PR? Thank you.

@kbenzie
Copy link
Contributor

kbenzie commented Mar 6, 2024

Might be best to take it out of draft, then the reviewer groups will be notified @yingcong-wu.

@yingcong-wu
Copy link
Contributor

Sure.

@kbenzie kbenzie removed the needs-discussion This needs further discussion label Apr 9, 2024
@AllanZyne
Copy link
Contributor Author

Close issue since #1385 has been merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
specification Changes or additions to the specification
Projects
None yet
Development

No branches or pull requests

4 participants