You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][CUDA] Fix alignment of local arguments (#5113)
The issue there is that for local kernel argument the CUDA plugin uses
CUDA dynamic shared memory, which gives us a single chunk of shared
memory to work with.
The CUDA plugin then lays out all the local kernel arguments
consecutively in this single chunk of memory.
And this can cause issues because simply laying the arguments out one
after the other can result in misaligned arguments.
So this patch is changing the argument layout to align them to the
maximum necessary alignment which is the size of the largest vector
type. Additionally if there is a local buffer smaller than this maximum
alignment, the size of that buffer is simply used for alignment.
This fixes the issue in #5007.
See also the discussion on #5104 for alternative solution, that may be
more efficient but would require a more intrusive ABI changing patch.
0 commit comments