-
Notifications
You must be signed in to change notification settings - Fork 769
[clang] Modify linkage and register initialization of device_global
#15148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 5 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
8106a51
Modify linkage and initialization of SYCL device_global
hdelan f5d0483
Add test for static device global
hdelan 86fa136
Update comments for clarity
hdelan 1c6a7e2
Clang format
hdelan ffae8c0
Make external linkage only apply to AMDGCN/NVPTX
hdelan ee7f6a4
Prefix static dev global symbol with UID
hdelan 885bfeb
Don't use llvm.compiler.used for device_globals
hdelan f7a3069
Change order of ops
hdelan 2d80de9
Defer renaming of SYCL static device globals
hdelan 2076358
Update codegen test
hdelan ebb92a8
Clang format
hdelan 85ebf51
Update SYCL tests
hdelan 0c0aacd
Merge branch 'sycl' into dev-global-static
hdelan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hdelan marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
// RUN: %{build} -o %t.out | ||
// RUN: %{run} %t.out | ||
// | ||
// The OpenCL GPU backends do not currently support device_global backend | ||
// calls. | ||
// UNSUPPORTED: opencl && gpu | ||
// | ||
// Tests static device_global access through device kernels. | ||
|
||
#include "common.hpp" | ||
|
||
static device_global<int[4], TestProperties> DeviceGlobalVar; | ||
|
||
int main() { | ||
queue Q; | ||
|
||
Q.single_task([=]() { DeviceGlobalVar.get()[0] = 42; }); | ||
// Make sure that the write happens before subsequent read | ||
Q.wait(); | ||
|
||
int OutVal = 0; | ||
{ | ||
buffer<int, 1> OutBuf(&OutVal, 1); | ||
Q.submit([&](handler &CGH) { | ||
auto OutAcc = OutBuf.get_access<access::mode::write>(CGH); | ||
CGH.single_task([=]() { OutAcc[0] = DeviceGlobalVar.get()[0]; }); | ||
}); | ||
} | ||
assert(OutVal == 42 && "Read value does not match."); | ||
return 0; | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smanna12 do you think we can set this universally for all
device_globals
? In NVPTX/AMDGCN this is correct but I'm not entirely sure if we should consider device globals for SPIR-V targets to be externally initialized.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the change to the generated IR? I believe we attempted to achieve this by marking the
device_global
inllvm.compiler.used
, which seems to avoid the optimizer optimizing it when it looks like it won't be initialized. I am not sure why that doesn't reach the CUDA and HIP path though.If this works with SPIR-V too, I am curious as to whether we could drop the
llvm.compiler.used
marking fordevice_global
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the tests with the updated IR as soon as I think I have the right behaviour. I'm curious as well about the
llvm.compiler.used
not being honoured. It looks like thellvm.compiler.used
persists until the end of CC1 phase, but it is discarded by the timellvm-link
is happening. I don't know why this is. Another way to fix this behaviour would be just to make surellvm-link
is not discardingllvm.compiler.used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that
file-table-tform
is getting rid of thellvm.compiler.used
symbol. But this is before thellc
step which also involvesllvm-linking
libclc and libdevice for some reason.llvm-link
ing libclc and libdevice results in a IPO pipeline which includes some GlobalOpt stuff, which messes with the global symbols.So I think this could be fixed just by removing linking with libclc and libdevice at the
llc
step. I'll verify and if that works I'll close this PR and open a new one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this a little bit on the team, and we have decided that it is more robust not to rely on
llvm.compiler.used
so I am going to stick with this approach and remove thellvm.compiler.used
global var for staticdevice_global
s.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a patch here to modify
sycl-post-link
behaviour here. I think this should solve the problemThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, but let's make sure we're on the same page:
llvm.compiler.used
is expected to be dropped by tools, so that behavior is reasonable.llvm.used
is expected to not be dropped because it needs to survive to the final linking phase. So my suggestion is for us to usellvm.used
from Clang instead ofllvm.compiler.used
, and then fix the tooling that was droppingllvm.used
. Does that make sense?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for explanation. I've changed use of
llvm.compiler.used
tollvm.used
for static device globals in the PR here. Only the NVPTX backend removesllvm[.compiler].used
at lowering time, so the other backends keep the current behaviour wheresycl-post-link
removes the symbols forllvm.used
/llvm.compiler.used
.It was because of this patchy and inconsistent handling of
llvm[.compiler].used
that I thought it would be better to use an approach that relied on more universally supported concepts such as linkage. But the current approach should work so I am happy keeping as is in the aforementioned PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned, the ideal fix would be to have all SYCL backends (NVPTX, AMDGCN and SPIR-V) treat
llvm.compiler.used
in the same way, by removing the global symbol at bc -> asm lowering. This is what NVPTX does currently and I think it is sensible behaviour. In this way we can make sure the symbol lasts for all IR parts of the pipeline. For device code we don't need to worry about post lowering linking optimisations that might remove the symbol, at least we don't currently need the symbol at this stage using thesycl-post-link
method, so I thinkllvm.compiler.used
is preferable tollvm.used
.In this way we would stop removing
llvm.compiler.used
atsycl-post-link
, which is unintuitive behaviour especially given that it happens before further LLVM passes are run (not honouringllvm.compiler.used
, which is supposed to last until lowering).@AaronBallman is that a solution you can get behind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can get behind that, yeah. My biggest concern is regarding changing the linkage; so long as we don't need to do that, I'll be happy!