Skip to content

[SYCL] Add barrier before leader guard in LowerWGScope pass #2208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 30, 2020

Conversation

againull
Copy link
Contributor

@againull againull commented Jul 29, 2020

Currently barrier is added only to merge basic block. But barrier must
be added before leader guard too. As an example, let's consider the
following pseudo code which is generated by LowerWGScope:

1  __local int *done;
2  kernel test() {
3   int done_wi;
4   int count = 0;
5   do {
6     count++;
7     if (get_local_linear_id() == 0)
8       *done = (count == 2);
9     barrier(CLK_LOCAL_MEM_FENCE);
10    done_wi = *done;
11  } while (!done_wi);
12 }

Step 1. All WIs execute stmt 3 - stmt 9. This is the first time all WIs
encounter the barrier.
Step 2. WI0 execute stmt 10 - stmt 11 and this time done is false, and
then execute stmt 5 - stmt 9, it set done to true. This is the second
time WI0 encounter the barrier.
Step 3. Other WIs begin executing stmt 10 - stmt 11 they will see done
is true so they can't reach barrier now.

To resolve this problem barrier must be added before leader guard:

...
barrier(CLK_LOCAL_MEM_FENCE);
if (get_local_linear_id() == 0)
...

Signed-off-by: Artur Gainullin [email protected]

@againull againull requested a review from bader as a code owner July 29, 2020 17:49
Currently barrier is added only to merge basic block. But barrier must
be added before leader guard too. As an example, let's consider the
following pseudo code which is generated by LowerWGScope:

1  __local int *done;
2  kernel test() {
3   int done_wi;
4   int count = 0;
5   do {
6     count++;
7     if (get_local_linear_id() == 0)
8       *done = (count == 2);
9     barrier(CLK_LOCAL_MEM_FENCE);
10    done_wi = *done;
11  } while (!done_wi);
12 }

Step 1. All WIs execute stmt 3 - stmt 9. This is the first time all WIs
encounter the barrier.
Step 2. WI0 execute stmt 10 - stmt 11 and this time done is false, and
then execute  stmt 5 - stmt 9, it set done to true. This is the second
time WI0 encounter the barrier.
Step 3. Other WIs begin executing stmt 10 - stmt 11 they will see done
is true so they can't reach barrier now.

To resolve this problem barrier must be added before leader guard:
...
barrier(CLK_LOCAL_MEM_FENCE);
if (get_local_linear_id() == 0)
...

Signed-off-by: Artur Gainullin <[email protected]>
@@ -265,6 +265,7 @@ static void guardBlockWithIsLeaderCheck(BasicBlock *IfBB, BasicBlock *TrueBB,
auto *Ty = LinearLocalID->getType();
Value *Zero = Constant::getNullValue(Ty);
IRBuilder<> Builder(IfBB->getContext());
spirv::genWGBarrier(*(IfBB->getTerminator()), TT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: would be nice to add a comment why the extra barrier is needed.

@@ -0,0 +1,65 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -LowerWGScope -S | FileCheck %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: would be nice to add the C++ source of the SYCL kernel used to generate this IR as a comment.

@bader bader merged commit a4a7950 into intel:sycl Jul 30, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Jul 31, 2020
…rogram

* upstream/sycl:
  [SYCL] Add barrier before leader guard in LowerWGSCope pass (intel#2208)
  [SYCL] Add prototype of atomic_ref<T*> (intel#2177)
  Revert "[SYCL] Disallow mutable lambdas (intel#1785)" (intel#2213)
againull added a commit to againull/llvm that referenced this pull request Sep 1, 2020
@againull againull deleted the barrier branch December 3, 2022 00:03
jsji pushed a commit that referenced this pull request Feb 8, 2024
* [SPV-IR to OCL] Fix mutated builtin call attribute copying

When SPIRVToOCL20Pass translates following SPV IR
```
  %call = tail call spir_func noundef ptr @_Z42__spirv_GenericCastToPtrExplicit_ToPrivatePvi(ptr addrspace(4) noundef %Ptr, i32 noundef 0)
declare spir_func ptr @_Z42__spirv_GenericCastToPtrExplicit_ToPrivatePvi(ptr addrspace(4), i32)
```
to OCL IR:
```
  %call = tail call spir_func noundef ptr @__to_private(ptr addrspace(4) noundef %Ptr)
declare spir_func ptr @__to_private(ptr addrspace(4))
```
, there is error `Attribute after last parameter` at copying attribute
to the new call because the old call has an additional AttributeSet for
the last argument. The last argument is eliminated.

Set tail call for new call if old call is tail call.

Revert cafd7e0.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@a31a0a6
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
update to L0 loader version v1.18.3 and don't scan _deps with Trivy or CodeQL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants