Skip to content

[SYCL] Share PFWG lambda object through shared memory #1455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2020

Conversation

againull
Copy link
Contributor

@againull againull commented Apr 2, 2020

In the current implementation private address of the PFWG lambda object
is shared by leader work item through local memory to other work items.
This is not correct. That is why perform the copy of the PFWG lambda
object to shared memory and make work items work with address of the
object in shared memory. I.e. this case should be handled in the
similar way as for byval parameters.

Signed-off-by: Artur Gainullin [email protected]

@againull againull requested review from kbobrovs and bader April 2, 2020 07:45
kbobrovs
kbobrovs previously approved these changes Apr 2, 2020
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few minor notes

@@ -679,10 +688,19 @@ static void fixupPrivateMemoryPFWILambdaCaptures(CallInst *PFWICall) {
// Go through "byval" parameters which are passed as AS(0) pointers
// and: (1) create local shadows for them (2) and initialize them from the
// leader's copy and (3) replace usages with pointer to the shadow
//
// Do the same with PFWG lamda object parameter ('this' pointer) which is also
// passed as AS(0) pointer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that this is based on assumption that '*this' coming from PFWG is not modified down the callgraph. Otherwise a copy-back would be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, added the comment why copy back is not needed.

@@ -679,10 +688,19 @@ static void fixupPrivateMemoryPFWILambdaCaptures(CallInst *PFWICall) {
// Go through "byval" parameters which are passed as AS(0) pointers
// and: (1) create local shadows for them (2) and initialize them from the
// leader's copy and (3) replace usages with pointer to the shadow
//
// Do the same with PFWG lamda object parameter ('this' pointer) which is also
// passed as AS(0) pointer.
static void shareByValParams(Function &F, const Triple &TT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the function purpose is extended, I suggest to change the name. E.g. sharePFWGPrivateObjects or something like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

RepVal = ConstantExpr::getPointerBitCastOrAddrSpaceCast(Shadow,
Arg.getType());
}
} else if (Arg.getArgNo() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment that this is 'this' processing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

; CHECK-NEXT: [[TMP0:%.*]] = load i64, i64 addrspace(1)* @__spirv_BuiltInLocalInvocationIndex
; CHECK-NEXT: [[CMPZ3:%.*]] = icmp eq i64 [[TMP0]], 0
; CHECK-NEXT: br i1 [[CMPZ3]], label [[LEADER:%.*]], label [[MERGE:%.*]]
; CHECK: leader:
; CHECK-NEXT: [[TMP1:%.*]] = bitcast %struct.zot* [[ARG1:%.*]] to i8*
; CHECK-NEXT: call void @llvm.memcpy.p3i8.p0i8.i64(i8 addrspace(3)* align 16 bitcast (%struct.zot addrspace(3)* @[[GROUP_SHADOW]] to i8 addrspace(3)*), i8* align 8 [[TMP1]], i64 96, i1 false)
; CHECK-NEXT: call void @llvm.memcpy.p3i8.p0i8.i64(i8 addrspace(3)* align 16 bitcast (%struct.zot addrspace(3)* @ArgShadow.2 to i8 addrspace(3)*), i8* align 8 [[TMP1]], i64 96, i1 false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider replacing ArgShadow.2 with a match (added above at the declaration point)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

In the current implementation private address of the PFWG lambda object
is shared by leader work item through local memory to other work items.
This is not correct. That is why perform the copy of the PFWG lambda
object to shared memory and make work items work with address of the
object in shared memory. I.e. this case should be handled in the
similar way as for byval parameters.

Signed-off-by: Artur Gainullin <[email protected]>
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit 16f64b8 into intel:sycl Apr 3, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 6, 2020
…_private_api

* origin/sycl: (614 commits)
  [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466)
  [SYCL][USM] Remove vestigial dead code (intel#1474)
  [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451)
  [Driver][SYCL] Emit an error if c compilation is forced (intel#1438)
  [SYCL] Fix sycl-post-link when no split and symbols are requested. (intel#1454)
  [SYCL] Change priority of devices in default_selector (intel#1264)
  [CI] Update CODEOWNERS matching rules order (intel#1468)
  [SYCL] Share PFWG lambda object through shared memory (intel#1455)
  [CI] Fix CODEOWNERS file syntax (intel#1464)
  [SYCL][CUDA] Fix active context when creating base event (intel#1447)
  [SYCL] Diagnose implicit declaration of kernel function type (intel#1450)
  [BuildBot] Modify configure script (intel#1421)
  [SYCL] Resolve min/max conflict (intel#1339)
  [CI][BuildBot] Fix configure parameter to turn on/off assertions (intel#1449)
  [SYCL] XFAIL LIT test due to duplicate diagnostic
  [SYCL] Remove explicit sycl_device attribute requirement
  Apply more suggestions
  Apply suggestions
  Translate new set of Intel FPGA Loop Controls
  Translate Intel FPGA force_pow2_depth memory attribute
  ...
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 15, 2020
…c_abi_checks

* origin/sycl: (625 commits)
  [SYCL][Test] Disable spec_const_redefine.cpp on all devices but HOST (intel#1488)
  [SYCL] Only export public API (intel#1456)
  [SYCL][CUDA] Fix selected_binary argument in piextDeviceSelectBinary (intel#1475)
  [SYCL] Enable LIT testing with CUDA BE (intel#1458)
  [SYCL] Fix float to half-type conversion (intel#1395)
  [NFC] Cleanup unneded macro from builtins implementation (intel#1445)
  Enable cfg-printer LLVM lit tests only if LLVM linked statically (intel#1479)
  [SYCL][NFC] Reflect the "allowlist" renaming in the code (intel#1480)
  [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466)
  [SYCL][USM] Remove vestigial dead code (intel#1474)
  [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451)
  [Driver][SYCL] Emit an error if c compilation is forced (intel#1438)
  [SYCL] Fix sycl-post-link when no split and symbols are requested. (intel#1454)
  [SYCL] Change priority of devices in default_selector (intel#1264)
  [CI] Update CODEOWNERS matching rules order (intel#1468)
  [SYCL] Share PFWG lambda object through shared memory (intel#1455)
  [CI] Fix CODEOWNERS file syntax (intel#1464)
  [SYCL][CUDA] Fix active context when creating base event (intel#1447)
  [SYCL] Diagnose implicit declaration of kernel function type (intel#1450)
  [BuildBot] Modify configure script (intel#1421)
  ...
bader added a commit that referenced this pull request Jun 15, 2022
…sm (#6212)

This patch refactors #1455 to avoid uses of deprecated `getPointerElementType` function.
#1455 introduces the code that uses pointer type information to create a shadow copy of SYCL kernel object. 

The same can be achieved by applying `work-group` scope attribute the SYCL kernel object. Compiler allocates such object in local address space, so object is shared among all work-items in the work-group.
@againull againull deleted the share_pfwg branch December 3, 2022 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants