[SYCL] Generalize local accessor to shared mem pass #5149

jchlanda · 2021-12-15T10:28:31Z

Now, that it lives in SYCLLowerIR it can be easily shared between AMDGCN and NVPTX backends.

This requires the same alignment fix as for Cuda, see: #5113

jchlanda · 2021-12-15T12:20:15Z

Looks like the pass is a bit too eager and runs on all amdgcn amdhsa kernels, moving it to WIP, till I update it to be only triggered on SYCL kernels.

jchlanda · 2021-12-17T08:48:44Z

I decided to hide the pass behind sycl-enable-local-accessor option, as there is no way for AMDGCN backend to distinguish if the kernel comes from SYCL. The same is true for NVPTX, but the code gets away with using triple's getOS() == Tripel::CUDA.

As well as enabling the pass for amdgcn amdhsa.

AlexeySachkov

The pass changes LGTM, a couple of minor comments about tests

llvm/test/CodeGen/AMDGPU/local-accessor-to-shared-memory-triple.ll

llvm/test/CodeGen/AMDGPU/local-accessor-to-shared-memory-valid-triple.ll

…-triple.ll Co-authored-by: Alexey Sachkov <[email protected]>

llvm/lib/SYCLLowerIR/LocalAccessorToSharedMemory.cpp

bader · 2022-01-13T18:04:32Z

@intel/dpcpp-clang-driver-reviewers, @intel/llvm-reviewers-runtime, ping.

alexbatashev

Runtime changes LGTM

bader · 2022-01-18T20:24:09Z

@intel/dpcpp-clang-driver-reviewers, ping.

clang/lib/Driver/ToolChains/HIPAMD.cpp

clang/lib/Driver/ToolChains/Clang.cpp

mdtoguchi

OK for Driver

alexbatashev

Runtime changes lgtm

bader · 2022-01-25T14:16:40Z

clang/lib/Driver/ToolChains/Clang.cpp

@@ -5787,6 +5787,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
    CmdArgs.push_back("-treat-scalable-fixed-error-as-warning");
  }

+  // Enable local accessor to shared memory pass for SYCL.
+  if (isa<BackendJobAction>(JA) && IsSYCL) {


@mdtoguchi, shouldn't the condition check IsSYCLOffloadDevice instead of IsSYCL?
abi/user_mangling.cpp and regression/fsycl-save-temps.cpp from check-sycl suite fails on my system.

Ah, yes. If this is to be only set for device, then this should be using IsSYCLOffloadDevice, as IsSYCL is for all compilations (host and device) when SYCL device offloading is enabled.

I get following error:

clang (LLVM option parsing): Unknown command line argument '-sycl-enable-local-accessor'. Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--enable-local-reassign'?

Honestly, I don't really understand why the options is not visible for FE in host mode, but using IsSYCLOffloadDevice fixed the problem and I don't think running the pass enabled by -sycl-enable-local-accessor is needed in the host mode.
Another mystery is why this issue is not exposed by CI system. I suppose it some how related to the difference in cmake configuration - I don't build NVPTX and AMDGPU targets, which I suppose link the library with "unknown" option.

We definitely need to do more investigation on this issue.

@jchlanda, FYI.

Additional problem: -sycl-enable-local-accessor is only being set when we do some kind of code generation step. As the device compilation does not do this, -sycl-enable-local-accessor is never set for device. It is only being emitted for host compilations as that goes through the assembling step.

at a high level, -sycl-enable-local-accessor does only emit for the code generation step with the nvptx64 target. The steps are not combined for nvptx64 allowing the option to only be emitted for device there. Kind of a round-about way to restrict the option, but it can leak out to host if -S is used.

IIRC, @jchlanda is away this week.
@AerialMantis, can someone else to take a look into this?

That's right, I'll have someone else take a look.

There is a PR up to fix this #5408
I believe the CI did not catch this as it builds for both cuda and hip then runs for each backend, correct me if I am wrong. If you build for CUDA or HIP then -sycl-enable-local-accessor is then usable.
I think that -sycl-enable-local-accessor is not available when not building for CUDA/HIP because the pass is initialised within the llvm NVPTX and AMDGPU backends.

I'm almost glad that this bug surfaced @bader , @mdtoguchi . TBH, -sycl-enable-local-accessor is a workaround, that I never liked. The problem I had was that there wasn't a way to tell that a kernel was compiled from SYCL. Simply relying on the calling convention (CallingConv::AMDGPU_KERNEL) is not enough, as there are multiple paths that use it (OpenCL, OpenMP, SYCL). I was wondering if it would be better to follow NVIDIA here and use metadata nodes to denote kernels (https://github.com/intel/llvm/blob/HEAD/clang/lib/CodeGen/TargetInfo.cpp#L7242). This would work for all the passes that we only want to run on SYCL kernels (for instance we'd like to generalize https://github.com/intel/llvm/blob/sycl/llvm/lib/Target/NVPTX/SYCL/GlobalOffset.cpp would benefit from it).

This PR resolves an issue raised in #5149. It changes the the check from `IsSYCL` to `IsSYCLOffloadDevice` and limits its usage to nvptx and amdgcn. A new test is added, to prevent regression.

The purpose of this patch is to generalize SYCL global offset pass and enable it for AMDGPU. * enable global offset in AMD's HIP * decorate SYCL kernel with dedicated MDNode: This removes the need for command line options added by the SYCL driver, discussed here: [SYCL] Generalize local accessor to shared mem pass #5149 (comment) * extract common helpers for local accessor and global offset passes * generalize the pass * introduce builtin_amdgcn_implicit_offset and enable the pass for ADMGPU * implement spirv_GlobalOffset_[x,y,z] * update the docs The main deviation from the NVPTX is the need for supporting address spaces. For AMD kernel arguments reside in constant address space, which for the case with offset forces a copy to private AS, in order to keep the call-graph interface coherent (we can't allocate const address space for the case without offset). Corresponding test-suit PR: intel/llvm-test-suite#941

jchlanda requested review from bader, kbobrovs, kychendev, smaslov-intel, sndmitriev and v-klochkov as code owners December 15, 2021 10:28

jchlanda force-pushed the jakub/sycl_local_accessors branch from 87800ad to 7add2f1 Compare December 15, 2021 10:33

jchlanda marked this pull request as draft December 15, 2021 12:20

jchlanda changed the title ~~[SYCL] Generalize local accessor to shared mem pass~~ [WIP] [SYCL] Generalize local accessor to shared mem pass Dec 15, 2021

jchlanda force-pushed the jakub/sycl_local_accessors branch from 6d31ac3 to 417ba47 Compare December 17, 2021 08:28

jchlanda changed the title ~~[WIP] [SYCL] Generalize local accessor to shared mem pass~~ [SYCL] Generalize local accessor to shared mem pass Dec 17, 2021

jchlanda marked this pull request as ready for review December 17, 2021 08:43

jchlanda requested review from AGindinson, hchilama and mdtoguchi as code owners December 17, 2021 08:43

jchlanda and others added 2 commits January 3, 2022 18:57

[SYCL] Generalize local accessor to shared mem pass

2022ae9

As well as enabling the pass for amdgcn amdhsa.

[SYCL] [HIP] Fix alignment of local arguments

df343be

jchlanda force-pushed the jakub/sycl_local_accessors branch from 417ba47 to 46f6752 Compare January 3, 2022 18:57

jchlanda requested review from a team as code owners January 3, 2022 18:57

jchlanda and others added 2 commits January 3, 2022 19:27

[SYCL] Port local acessors tests to amdgcn

82dbbdb

[SYCL] Add command line option for local accessor to shared mem pass

437d5c8

jchlanda force-pushed the jakub/sycl_local_accessors branch from 46f6752 to 437d5c8 Compare January 3, 2022 19:27

AlexeySachkov previously approved these changes Jan 10, 2022

View reviewed changes

llvm/test/CodeGen/AMDGPU/local-accessor-to-shared-memory-triple.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/local-accessor-to-shared-memory-valid-triple.ll Outdated Show resolved Hide resolved

AlexeySachkov mentioned this pull request Jan 11, 2022

[SYCL] Turn off part of SimplifyCFG optimizations in SYCL mode. #5283

Merged

Update llvm/test/CodeGen/AMDGPU/local-accessor-to-shared-memory-valid…

2b2a968

…-triple.ll Co-authored-by: Alexey Sachkov <[email protected]>

bader requested a review from AlexeySachkov January 12, 2022 14:05

AlexeySachkov previously approved these changes Jan 12, 2022

View reviewed changes

mlychkov reviewed Jan 12, 2022

View reviewed changes

llvm/lib/SYCLLowerIR/LocalAccessorToSharedMemory.cpp Show resolved Hide resolved

alexbatashev previously approved these changes Jan 14, 2022

View reviewed changes

jchlanda mentioned this pull request Jan 14, 2022

[SYCL] Port LocalAccessorToSharedMemory and GlobalOffset to new PM #5310

Closed

AerialMantis mentioned this pull request Jan 17, 2022

[SYCL][CUDA][HIP] warp misaligned address on CUDA and results mismatch on HIP #5007

Closed

mdtoguchi reviewed Jan 18, 2022

View reviewed changes

clang/lib/Driver/ToolChains/HIPAMD.cpp Show resolved Hide resolved

mdtoguchi reviewed Jan 18, 2022

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Show resolved Hide resolved

Add test for -sycl-enable-local-accessor

a49c751

jchlanda dismissed stale reviews from alexbatashev and AlexeySachkov via a49c751 January 19, 2022 15:04

jchlanda requested a review from mdtoguchi January 19, 2022 15:05

mdtoguchi approved these changes Jan 19, 2022

View reviewed changes

jchlanda requested review from mdtoguchi, mlychkov, a team, alexbatashev and AlexeySachkov January 20, 2022 12:00

alexbatashev approved these changes Jan 20, 2022

View reviewed changes

mlychkov approved these changes Jan 20, 2022

View reviewed changes

bader merged commit 58508ba into intel:sycl Jan 20, 2022

bader reviewed Jan 25, 2022

View reviewed changes

AidanBeltonS mentioned this pull request Jan 27, 2022

[SYCL] Change clang toolchain local accessor check #5408

Merged

vmaksimo mentioned this pull request Feb 2, 2022

LLVM and SPIRV-LLVM-Translator pulldown (WW06) #5427

Merged

jingwan2 mentioned this pull request Feb 17, 2022

Compiler crash at pass 'SYCL Local Accessor to Shared Memory' on CUDA backend caused by https://github.com/intel/llvm/pull/5149 #5600

Closed

jchlanda mentioned this pull request Mar 22, 2022

[SYCL] Generalize GlobalOffset and enable it for AMDGPU #5855

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Generalize local accessor to shared mem pass #5149

[SYCL] Generalize local accessor to shared mem pass #5149

jchlanda commented Dec 15, 2021

jchlanda commented Dec 15, 2021

jchlanda commented Dec 17, 2021

AlexeySachkov left a comment

bader commented Jan 13, 2022

alexbatashev left a comment

bader commented Jan 18, 2022

mdtoguchi left a comment

alexbatashev left a comment

bader Jan 25, 2022

mdtoguchi Jan 25, 2022

bader Jan 25, 2022

mdtoguchi Jan 25, 2022

mdtoguchi Jan 25, 2022

bader Jan 26, 2022

AerialMantis Jan 26, 2022

AidanBeltonS Jan 27, 2022

jchlanda Jan 31, 2022

[SYCL] Generalize local accessor to shared mem pass #5149

[SYCL] Generalize local accessor to shared mem pass #5149

Conversation

jchlanda commented Dec 15, 2021

jchlanda commented Dec 15, 2021

jchlanda commented Dec 17, 2021

AlexeySachkov left a comment

Choose a reason for hiding this comment

bader commented Jan 13, 2022

alexbatashev left a comment

Choose a reason for hiding this comment

bader commented Jan 18, 2022

mdtoguchi left a comment

Choose a reason for hiding this comment

alexbatashev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment