-
Notifications
You must be signed in to change notification settings - Fork 769
[clang] Modify linkage and register initialization of device_global
#15148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A SYCL device_global is initialized externally. Also a device_global should be given external linkage even when it is static. If linkage is internal then the compiler may SRA a device_global, which results in different symbols for host and device for the same device_global.
Make sure that static device globals work.
e419f0d
to
f5d0483
Compare
Also make sure that the device global is only considered externally initialized on the device compilation pass.
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
const RecordDecl *RD = D->getType()->getAsRecordDecl(); | ||
if (RD && RD->hasAttr<SYCLDeviceGlobalAttr>()) { | ||
// SYCL device globals are initialized externally | ||
GV->setExternallyInitialized(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smanna12 do you think we can set this universally for all device_globals
? In NVPTX/AMDGCN this is correct but I'm not entirely sure if we should consider device globals for SPIR-V targets to be externally initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the change to the generated IR? I believe we attempted to achieve this by marking the device_global
in llvm.compiler.used
, which seems to avoid the optimizer optimizing it when it looks like it won't be initialized. I am not sure why that doesn't reach the CUDA and HIP path though.
If this works with SPIR-V too, I am curious as to whether we could drop the llvm.compiler.used
marking for device_global
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the tests with the updated IR as soon as I think I have the right behaviour. I'm curious as well about the llvm.compiler.used
not being honoured. It looks like the llvm.compiler.used
persists until the end of CC1 phase, but it is discarded by the time llvm-link
is happening. I don't know why this is. Another way to fix this behaviour would be just to make sure llvm-link
is not discarding llvm.compiler.used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that file-table-tform
is getting rid of the llvm.compiler.used
symbol. But this is before the llc
step which also involves llvm-linking
libclc and libdevice for some reason. llvm-link
ing libclc and libdevice results in a IPO pipeline which includes some GlobalOpt stuff, which messes with the global symbols.
So I think this could be fixed just by removing linking with libclc and libdevice at the llc
step. I'll verify and if that works I'll close this PR and open a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this a little bit on the team, and we have decided that it is more robust not to rely on llvm.compiler.used
so I am going to stick with this approach and remove the llvm.compiler.used
global var for static device_global
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a patch here to modify sycl-post-link
behaviour here. I think this should solve the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of device globals from llvm.compiler.used is actually happening deliberately at sycl-post-link, not file-table-tform. Apologies for the misdirection.
No worries, but let's make sure we're on the same page: llvm.compiler.used
is expected to be dropped by tools, so that behavior is reasonable. llvm.used
is expected to not be dropped because it needs to survive to the final linking phase. So my suggestion is for us to use llvm.used
from Clang instead of llvm.compiler.used
, and then fix the tooling that was dropping llvm.used
. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for explanation. I've changed use of llvm.compiler.used
to llvm.used
for static device globals in the PR here. Only the NVPTX backend removes llvm[.compiler].used
at lowering time, so the other backends keep the current behaviour where sycl-post-link
removes the symbols for llvm.used
/llvm.compiler.used
.
It was because of this patchy and inconsistent handling of llvm[.compiler].used
that I thought it would be better to use an approach that relied on more universally supported concepts such as linkage. But the current approach should work so I am happy keeping as is in the aforementioned PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned, the ideal fix would be to have all SYCL backends (NVPTX, AMDGCN and SPIR-V) treat llvm.compiler.used
in the same way, by removing the global symbol at bc -> asm lowering. This is what NVPTX does currently and I think it is sensible behaviour. In this way we can make sure the symbol lasts for all IR parts of the pipeline. For device code we don't need to worry about post lowering linking optimisations that might remove the symbol, at least we don't currently need the symbol at this stage using the sycl-post-link
method, so I think llvm.compiler.used
is preferable to llvm.used
.
In this way we would stop removing llvm.compiler.used
at sycl-post-link
, which is unintuitive behaviour especially given that it happens before further LLVM passes are run (not honouring llvm.compiler.used
, which is supposed to last until lowering).
@AaronBallman is that a solution you can get behind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can get behind that, yeah. My biggest concern is regarding changing the linkage; so long as we don't need to do that, I'll be happy!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test LGTM!
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
const RecordDecl *RD = D->getType()->getAsRecordDecl(); | ||
if (RD && RD->hasAttr<SYCLDeviceGlobalAttr>()) { | ||
// SYCL device globals are initialized externally | ||
GV->setExternallyInitialized(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the change to the generated IR? I believe we attempted to achieve this by marking the device_global
in llvm.compiler.used
, which seems to avoid the optimizer optimizing it when it looks like it won't be initialized. I am not sure why that doesn't reach the CUDA and HIP path though.
If this works with SPIR-V too, I am curious as to whether we could drop the llvm.compiler.used
marking for device_global
.
llvm.compiler.used is brittle and shouldn't be used in this instance. Better to make static device_globals externally linked with an alias.
Change name before making externally initialized.
addSYCLUniqueID(GV, D, Context); | ||
if (Linkage == llvm::GlobalValue::InternalLinkage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm....This seems ok to me but I am not sure if there are repercussions I am not thinking of . @AaronBallman can you weigh in here?
Only rename the static SYCL device globals at the end of module codegen. If renaming is not deferred the compiler may generate multiple symbols for the same device global - one before renaming and another after.
- Make all global vars externally initialized - Don't use llvm.compiler.used - Change the names of static dev global vars to be prefixed with THE_PREFIX
Ping @intel/dpcpp-cfe-reviewers |
A SYCL device_global is initialized externally. We should make sure the
frontend registers it as such to avoid it being removed in IPO
optimizations.
Also a device_global should be given external linkage even when it
is static. If linkage is internal then the compiler may SRA a
device_global
,which results in different symbols for host and device for the same
device_global
.Ping @frasercrmck @ldrumm @rafbiels