-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL] Refactor SYCL kernel object handling in hierarchical parallelism #6212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Author: againull <[email protected]> Date: Fri Apr 3 00:59:46 2020 -0700 [SYCL] Share PFWG lambda object through shared memory (intel#1455) In the current implementation private address of the PFWG lambda object is shared by leader work item through local memory to other work items. This is not correct. That is why perform the copy of the PFWG lambda object to shared memory and make work items work with address of the object in shared memory. I.e. this case should be handled in the similar way as for byval parameters. Signed-off-by: Artur Gainullin <[email protected]>
/summary:run |
@kbobrovs, @againull, I think I hit another bug/limitation of the pass. The pass doesn't look through a function calls when it analyses the execution scope i.e. work-group vs work-item. void foo(sycl::group<1> group, ...) {
group.parallel_for_work_item(range<1>(), [&](h_item<1> i) { ... });
}
...
cgh.parallel_for_work_group<class kernel>(
range<1>(...), range<1>(...), [=](group<1> g) {
foo(g, ...);
}); The pass emits the code to call |
@bader, yes, this is a limitation of the pass. It should have been added as a TODO. As I recall, it was considered not practical to spend resourced on adding support for such scenarios. Possible solution is known. |
Thanks for the update. I tried to mark generated kernel function with work-group scope attribute, so that LowerWGScope pass will put parallel_for_work_group lambda object into local memory, but it also puts parallel_for_work_item one layer down in the call stack. I think I'll try another idea. I'm going to change the pass to process kernel functions calling functions with work-group scope attribute in addition to just functions marked with work-group scope attribute. I'll move the code added by #1455 to the portion processing kernel functions. |
This reverts commit 0e13ee9.
Kernel objects passed to parallel_for_work_group function must be shared among all work-items withing a work-group.
LLVMContext &Ctx = At.getContext(); | ||
IRBuilder<> Builder(Ctx); | ||
Builder.SetInsertPoint(&LeaderBB->front()); | ||
if (!Arg.hasByValAttr()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we skip "this" because it is allocated in the proper AS by the FE, correct? Comment would be helpful for the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I just reverted the changes from #1455 and tried to re-implement it by fixing address space in clang instead.
Do you want me to comment that this
points to the object in local address space, so we don't need a shadow copy for that argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LowerWGScope.cpp LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks or the detailed review @erichkeane and for the change @bader. FE changes LGTM
This patch refactors #1455 to avoid uses of deprecated
getPointerElementType
function.#1455 introduces the code that uses pointer type information to create a shadow copy of SYCL kernel object.
The same can be achieved by applying
work-group
scope attribute the SYCL kernel object. Compiler allocates such object in local address space, so object is shared among all work-items in the work-group.