-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL] Generalize GlobalOffset and enable it for AMDGPU #5855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libclc changes look okay to me.
It would be interesting to get the feedback from AMDGPU target maintainers on LLVM changes.
@jchlanda, are you planning to upstream new intrinsic and passes?
629c977
to
593e51f
Compare
I'm not sure if it would ever make sense outside of SYCL. And it relies on the metadata generated through SYCL driver and |
Hi @jchlanda and reviewers! |
Yeap, will do that, thanks. |
33031fe
to
f4c4abb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice there were no FE tests earlier. Can you add a FE test since you're changing CodeGen?
@elizabethandrews |
d0c99fe
to
c450a3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! FE changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes Ok.
@bader would you like me to do anything else for this PR?
look unrelated, |
@jchlanda, two things: please, resolve merge conflicts and ping reviewers. I see approvals only from front-end team and Greg, so merge tip of the |
In order to emulate this and make generated kernel compliant, an intrinsic | ||
`llvm.nvvm.implicit.offset` (clang builtin `__builtin_ptx_implicit_offset`) was | ||
introduced materializing the use of this implicit parameter for the NVPTX | ||
backend. AMDGCN uses the same approach with `llvm.andgpu.implicit.offset` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking comment (may be fixed in separate PR. It does not deserve a fix that would dismiss existing approvals):
Is it a misprint here? "llvm.andgpu." -> "llvm.amdgpu."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, that's a typo, will fix now.
@asudarsa would you please review the changes in files owned by @intel/dpcpp-tools-reviewers . I see that automatically @psamolysov-intel assigned is out-of-office for unknown time. |
@jchlanda - if you decide to update the patch again, then please use "git merge ...; git push ..." instead of "git rebase ...; git push -f ...". |
@gmlueck, can you review please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nitpicks. But looks good overall. Thanks
My bad, sorry. |
2c433e9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
* Fix links to configure.py and compile.py * Linking to the file in tree caused the link in the docs to just download the python script. It makes more sense to link to the github web UI for these as they should be used in a checkout anyway. * Remove CUDA requiring device selector * Since intel#6203 the device selector should handle this case well. * Update HIP backend limitations * HIP is no longer in beta * Windows isn't supported but intel#17702 made the build work with it so it might work for some users. * Global offset has been supported since intel#5855 * Add common limitations * Update HIP for Nvidia section * This might work but is not supported * Update HIP section * Recommended HIP version + testing platforms * HIP is no longer in beta * Add note on target aliases for CUDA and HIP
* Fix links to configure.py and compile.py * Linking to the file in tree caused the link in the docs to just download the python script. It makes more sense to link to the github web UI for these as they should be used in a checkout anyway. * Remove CUDA requiring device selector * Since #6203 the device selector should handle this case well. * Update HIP backend limitations * HIP is no longer in beta * Windows isn't supported but #17702 made the build work with it so it might work for some users. * Global offset has been supported since #5855 * Add common limitations * Update HIP for Nvidia section * This might work but is not supported * Update HIP section * Recommended HIP version + testing platforms * HIP is no longer in beta * Add note on target aliases for CUDA and HIP
The purpose of this patch is to generalize SYCL global offset pass and enable it for AMDGPU.
In order of the commits:
MDNode
: 0abb088 This removes the need for command line options added by the SYCL driver, discussed here: [SYCL] Generalize local accessor to shared mem pass #5149 (comment)builtin_amdgcn_implicit_offset
and enable the pass for ADMGPU: 5a751a8spirv_GlobalOffset_[x,y,z]
: 53f950dThe main deviation from the NVPTX is the need for supporting address spaces. For AMD kernel arguments reside in constant address space, which for the case with offset forces a copy to private AS, in order to keep the call-graph interface coherent (we can't allocate const address space for the case without offset).
Corresponding test-suit PR: intel/llvm-test-suite#941