-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Generalize local accessor to shared mem pass #5149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
87800ad
to
7add2f1
Compare
Looks like the pass is a bit too eager and runs on all amdgcn amdhsa kernels, moving it to WIP, till I update it to be only triggered on SYCL kernels. |
6d31ac3
to
417ba47
Compare
I decided to hide the pass behind |
As well as enabling the pass for amdgcn amdhsa.
417ba47
to
46f6752
Compare
46f6752
to
437d5c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pass changes LGTM, a couple of minor comments about tests
llvm/test/CodeGen/AMDGPU/local-accessor-to-shared-memory-triple.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AMDGPU/local-accessor-to-shared-memory-valid-triple.ll
Outdated
Show resolved
Hide resolved
…-triple.ll Co-authored-by: Alexey Sachkov <[email protected]>
@intel/dpcpp-clang-driver-reviewers, @intel/llvm-reviewers-runtime, ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime changes LGTM
@intel/dpcpp-clang-driver-reviewers, ping. |
a49c751
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for Driver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime changes lgtm
@@ -5787,6 +5787,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, | |||
CmdArgs.push_back("-treat-scalable-fixed-error-as-warning"); | |||
} | |||
|
|||
// Enable local accessor to shared memory pass for SYCL. | |||
if (isa<BackendJobAction>(JA) && IsSYCL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdtoguchi, shouldn't the condition check IsSYCLOffloadDevice
instead of IsSYCL
?
abi/user_mangling.cpp
and regression/fsycl-save-temps.cpp
from check-sycl suite fails on my system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. If this is to be only set for device, then this should be using IsSYCLOffloadDevice
, as IsSYCL
is for all compilations (host and device) when SYCL device offloading is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get following error:
clang (LLVM option parsing): Unknown command line argument '-sycl-enable-local-accessor'. Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--enable-local-reassign'?
Honestly, I don't really understand why the options is not visible for FE in host mode, but using IsSYCLOffloadDevice
fixed the problem and I don't think running the pass enabled by -sycl-enable-local-accessor
is needed in the host mode.
Another mystery is why this issue is not exposed by CI system. I suppose it some how related to the difference in cmake configuration - I don't build NVPTX and AMDGPU targets, which I suppose link the library with "unknown" option.
We definitely need to do more investigation on this issue.
@jchlanda, FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional problem: -sycl-enable-local-accessor
is only being set when we do some kind of code generation step. As the device compilation does not do this, -sycl-enable-local-accessor
is never set for device. It is only being emitted for host compilations as that goes through the assembling step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at a high level, -sycl-enable-local-accessor
does only emit for the code generation step with the nvptx64
target. The steps are not combined for nvptx64
allowing the option to only be emitted for device there. Kind of a round-about way to restrict the option, but it can leak out to host if -S
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, @jchlanda is away this week.
@AerialMantis, can someone else to take a look into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, I'll have someone else take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a PR up to fix this #5408
I believe the CI did not catch this as it builds for both cuda and hip then runs for each backend, correct me if I am wrong. If you build for CUDA or HIP then -sycl-enable-local-accessor is then usable.
I think that -sycl-enable-local-accessor
is not available when not building for CUDA/HIP because the pass is initialised within the llvm NVPTX and AMDGPU backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost glad that this bug surfaced @bader , @mdtoguchi . TBH, -sycl-enable-local-accessor
is a workaround, that I never liked. The problem I had was that there wasn't a way to tell that a kernel was compiled from SYCL. Simply relying on the calling convention (CallingConv::AMDGPU_KERNEL
) is not enough, as there are multiple paths that use it (OpenCL, OpenMP, SYCL). I was wondering if it would be better to follow NVIDIA here and use metadata nodes to denote kernels (https://github.com/intel/llvm/blob/HEAD/clang/lib/CodeGen/TargetInfo.cpp#L7242). This would work for all the passes that we only want to run on SYCL kernels (for instance we'd like to generalize https://github.com/intel/llvm/blob/sycl/llvm/lib/Target/NVPTX/SYCL/GlobalOffset.cpp would benefit from it).
This PR resolves an issue raised in #5149. It changes the the check from `IsSYCL` to `IsSYCLOffloadDevice` and limits its usage to nvptx and amdgcn. A new test is added, to prevent regression.
The purpose of this patch is to generalize SYCL global offset pass and enable it for AMDGPU. * enable global offset in AMD's HIP * decorate SYCL kernel with dedicated MDNode: This removes the need for command line options added by the SYCL driver, discussed here: [SYCL] Generalize local accessor to shared mem pass #5149 (comment) * extract common helpers for local accessor and global offset passes * generalize the pass * introduce builtin_amdgcn_implicit_offset and enable the pass for ADMGPU * implement spirv_GlobalOffset_[x,y,z] * update the docs The main deviation from the NVPTX is the need for supporting address spaces. For AMD kernel arguments reside in constant address space, which for the case with offset forces a copy to private AS, in order to keep the call-graph interface coherent (we can't allocate const address space for the case without offset). Corresponding test-suit PR: intel/llvm-test-suite#941
Now, that it lives in
SYCLLowerIR
it can be easily shared between AMDGCN and NVPTX backends.This requires the same alignment fix as for Cuda, see: #5113
Fixes #5013