Skip to content

[clang] Modify linkage and register initialization of device_global #15148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Aug 20, 2024

A SYCL device_global is initialized externally. We should make sure the
frontend registers it as such to avoid it being removed in IPO
optimizations.

Also a device_global should be given external linkage even when it
is static. If linkage is internal then the compiler may SRA a device_global,
which results in different symbols for host and device for the same
device_global.

Ping @frasercrmck @ldrumm @rafbiels

A SYCL device_global is initialized externally. Also a device_global
should be given external linkage even when it is static. If linkage is
internal then the compiler may SRA a device_global, which results in
different symbols for host and device for the same device_global.
@hdelan hdelan requested review from a team as code owners August 20, 2024 15:54
@hdelan hdelan requested a review from steffenlarsen August 20, 2024 15:54
Make sure that static device globals work.
Also make sure that the device global is only considered externally
initialized on the device compilation pass.
const RecordDecl *RD = D->getType()->getAsRecordDecl();
if (RD && RD->hasAttr<SYCLDeviceGlobalAttr>()) {
// SYCL device globals are initialized externally
GV->setExternallyInitialized(true);
Copy link
Contributor Author

@hdelan hdelan Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smanna12 do you think we can set this universally for all device_globals? In NVPTX/AMDGCN this is correct but I'm not entirely sure if we should consider device globals for SPIR-V targets to be externally initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the change to the generated IR? I believe we attempted to achieve this by marking the device_global in llvm.compiler.used, which seems to avoid the optimizer optimizing it when it looks like it won't be initialized. I am not sure why that doesn't reach the CUDA and HIP path though.

If this works with SPIR-V too, I am curious as to whether we could drop the llvm.compiler.used marking for device_global.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the tests with the updated IR as soon as I think I have the right behaviour. I'm curious as well about the llvm.compiler.used not being honoured. It looks like the llvm.compiler.used persists until the end of CC1 phase, but it is discarded by the time llvm-link is happening. I don't know why this is. Another way to fix this behaviour would be just to make sure llvm-link is not discarding llvm.compiler.used

Copy link
Contributor Author

@hdelan hdelan Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that file-table-tform is getting rid of the llvm.compiler.used symbol. But this is before the llc step which also involves llvm-linking libclc and libdevice for some reason. llvm-linking libclc and libdevice results in a IPO pipeline which includes some GlobalOpt stuff, which messes with the global symbols.

So I think this could be fixed just by removing linking with libclc and libdevice at the llc step. I'll verify and if that works I'll close this PR and open a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a little bit on the team, and we have decided that it is more robust not to rely on llvm.compiler.used so I am going to stick with this approach and remove the llvm.compiler.used global var for static device_globals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a patch here to modify sycl-post-link behaviour here. I think this should solve the problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of device globals from llvm.compiler.used is actually happening deliberately at sycl-post-link, not file-table-tform. Apologies for the misdirection.

No worries, but let's make sure we're on the same page: llvm.compiler.used is expected to be dropped by tools, so that behavior is reasonable. llvm.used is expected to not be dropped because it needs to survive to the final linking phase. So my suggestion is for us to use llvm.used from Clang instead of llvm.compiler.used, and then fix the tooling that was dropping llvm.used. Does that make sense?

Copy link
Contributor Author

@hdelan hdelan Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explanation. I've changed use of llvm.compiler.used to llvm.used for static device globals in the PR here. Only the NVPTX backend removes llvm[.compiler].used at lowering time, so the other backends keep the current behaviour where sycl-post-link removes the symbols for llvm.used/llvm.compiler.used.

It was because of this patchy and inconsistent handling of llvm[.compiler].used that I thought it would be better to use an approach that relied on more universally supported concepts such as linkage. But the current approach should work so I am happy keeping as is in the aforementioned PR.

Copy link
Contributor Author

@hdelan hdelan Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned, the ideal fix would be to have all SYCL backends (NVPTX, AMDGCN and SPIR-V) treat llvm.compiler.used in the same way, by removing the global symbol at bc -> asm lowering. This is what NVPTX does currently and I think it is sensible behaviour. In this way we can make sure the symbol lasts for all IR parts of the pipeline. For device code we don't need to worry about post lowering linking optimisations that might remove the symbol, at least we don't currently need the symbol at this stage using the sycl-post-link method, so I think llvm.compiler.used is preferable to llvm.used.

In this way we would stop removing llvm.compiler.used at sycl-post-link, which is unintuitive behaviour especially given that it happens before further LLVM passes are run (not honouring llvm.compiler.used, which is supposed to last until lowering).

@AaronBallman is that a solution you can get behind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can get behind that, yeah. My biggest concern is regarding changing the linkage; so long as we don't need to do that, I'll be happy!

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test LGTM!

const RecordDecl *RD = D->getType()->getAsRecordDecl();
if (RD && RD->hasAttr<SYCLDeviceGlobalAttr>()) {
// SYCL device globals are initialized externally
GV->setExternallyInitialized(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the change to the generated IR? I believe we attempted to achieve this by marking the device_global in llvm.compiler.used, which seems to avoid the optimizer optimizing it when it looks like it won't be initialized. I am not sure why that doesn't reach the CUDA and HIP path though.

If this works with SPIR-V too, I am curious as to whether we could drop the llvm.compiler.used marking for device_global.

llvm.compiler.used is brittle and shouldn't be used in this instance.
Better to make static device_globals externally linked with an alias.
Change name before making externally initialized.
addSYCLUniqueID(GV, D, Context);
if (Linkage == llvm::GlobalValue::InternalLinkage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm....This seems ok to me but I am not sure if there are repercussions I am not thinking of . @AaronBallman can you weigh in here?

Only rename the static SYCL device globals at the end of module codegen.
If renaming is not deferred the compiler may generate multiple symbols
for the same device global - one before renaming and another after.
- Make all global vars externally initialized
- Don't use llvm.compiler.used
- Change the names of static dev global vars to be prefixed with
  THE_PREFIX
@hdelan
Copy link
Contributor Author

hdelan commented Aug 28, 2024

Ping @intel/dpcpp-cfe-reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants