-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Make local accessor argument allocation 1-byte minimum #6032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Make local accessor argument allocation 1-byte minimum #6032
Conversation
Some DPC++ backends, such as OpenCL, does not allow setting local memory kernel arguments with a size of zero bytes. As such, the runtime library can fail with a backend error if a local accessor has a zero-size. These changes make a special case for local accessors with a zero byte size making it allocate a single byte of memory. Signed-off-by: Larsen, Steffen <[email protected]>
This tests intel/llvm#6032. Signed-off-by: Larsen, Steffen <[email protected]>
/verify with intel/llvm-test-suite#1001 |
code-wise this is fine. But do we really need to special case 0 size? Why isn't failure an option? Are zero sized local accessors a thing? |
0-sized local accessors are not disallowed by the SYCL 2020 specification so we should allow it through although it is an edge-case. We cannot simply skip setting the kernel argument as OpenCL will start complaining about unset kernel arguments, so using a minimal-sized allocation seems to be the best solution. |
@steffenlarsen I think you're right that the SYCL specification doesn't specify whether zero-sized local accessors, a clarification was added recently which allows zero-sized buffers so there is a precedent to allow it, though perhaps we should raise a ticket for the specification to clarify this? On a separate note, would I be correct in thinking this change would make #5682 redundant as the backend would never receive a zero-size local memory argument? |
I am not opposed to having a clarification for it, though I'd argue that since we don't specify an error for it, it should be assumed to work. The minimum allocation here is just an implementation detail and the user should still not access the byte as it is undefined behavior to do so (from the spec's perspective).
Thank you for pointing that one out! Yes, I would expect so. I assume the CUDA backend is fine without this change, so if you think it would be better off with simply a zero-sized allocation, we can make this strategy backend-dependent, though in that case #5682 would still be an issue. |
I agree, since it's not specified explicitly it's really an implementation detail, and I think this is a fair interpretation.
That's a good point, the CUDA backend would be fine with this change, we could potentially make this backend-dependent which would allow us to optimize for the zero-size case, though without eliding the argument entirely I suspect this wouldn't have much of an impact, so I'm tempted to say this is better done in the runtime, this also enforces consistent behaviour and avoids errors across all backends. |
@cperkinsintel Because this is good quality software and you might want to keep your job. :-) But AMD is hiring... ;-) |
@keryell - it may not be relevant here, but my question about failure being an option is that if a user is making a 0 sized accessor that seems like it is most likely a mistake in their code. By us making it pretend to work, we are hiding their mistake which may lead them to spend a lot of time hunting down some obscure bug in the future. But if it fails the first time they run it, then they'll find their mistake and fix it. That's why I was asking if 0 sized accessors are a real thing. Also, I like my job. ;-) |
@cperkinsintel joking aside, when writing some generic code it makes sense to accept a 0-sized accessor instead of asking the programmer to write an exponential number of functions specialized for all the cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This tests intel/llvm#6032. Signed-off-by: Larsen, Steffen <[email protected]>
…t-suite#1001) This tests intel#6032. Signed-off-by: Larsen, Steffen <[email protected]>
Some DPC++ backends, such as OpenCL, does not allow setting local memory kernel arguments with a size of zero bytes. As such, the runtime library can fail with a backend error if a local accessor has a zero-size. These changes make a special case for local accessors with a zero byte size making it allocate a single byte of memory.