Skip to content

[SYCL] Refactor SYCL kernel object handling in hierarchical parallelism #6212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 15, 2022

Conversation

bader
Copy link
Contributor

@bader bader commented May 29, 2022

This patch refactors #1455 to avoid uses of deprecated getPointerElementType function.
#1455 introduces the code that uses pointer type information to create a shadow copy of SYCL kernel object.

The same can be achieved by applying work-group scope attribute the SYCL kernel object. Compiler allocates such object in local address space, so object is shared among all work-items in the work-group.

bader added 4 commits May 29, 2022 07:56
Author: againull <[email protected]>
Date:   Fri Apr 3 00:59:46 2020 -0700

    [SYCL] Share PFWG lambda object through shared memory (intel#1455)

    In the current implementation private address of the PFWG lambda object
    is shared by leader work item through local memory to other work items.
    This is not correct. That is why perform the copy of the PFWG lambda
    object to shared memory and make work items work with address of the
    object in shared memory. I.e. this case should be handled in the
    similar way as for byval parameters.

    Signed-off-by: Artur Gainullin <[email protected]>
@bader
Copy link
Contributor Author

bader commented May 29, 2022

/summary:run

@bader
Copy link
Contributor Author

bader commented May 30, 2022

@kbobrovs, @againull, I think I hit another bug/limitation of the pass. The pass doesn't look through a function calls when it analyses the execution scope i.e. work-group vs work-item.

void foo(sycl::group<1> group, ...) {
  group.parallel_for_work_item(range<1>(), [&](h_item<1> i) { ... });
}
...

  cgh.parallel_for_work_group<class kernel>(
    range<1>(...), range<1>(...), [=](group<1> g) {
      foo(g, ...);
    });

The pass emits the code to call foo once per work-group, but I can't find anything like this in the specification.
@intel/dpcpp-specification-reviewers, what is the expected behavior in this case?

@kbobrovs
Copy link
Contributor

@bader, yes, this is a limitation of the pass. It should have been added as a TODO. As I recall, it was considered not practical to spend resourced on adding support for such scenarios. Possible solution is known.

@bader
Copy link
Contributor Author

bader commented May 31, 2022

@bader, yes, this is a limitation of the pass. It should have been added as a TODO. As I recall, it was considered not practical to spend resourced on adding support for such scenarios. Possible solution is known.

Thanks for the update. I tried to mark generated kernel function with work-group scope attribute, so that LowerWGScope pass will put parallel_for_work_group lambda object into local memory, but it also puts parallel_for_work_item one layer down in the call stack.

I think I'll try another idea. I'm going to change the pass to process kernel functions calling functions with work-group scope attribute in addition to just functions marked with work-group scope attribute. I'll move the code added by #1455 to the portion processing kernel functions.

bader added 4 commits May 31, 2022 06:07
Kernel objects passed to parallel_for_work_group function must be shared
among all work-items withing a work-group.
@bader bader marked this pull request as ready for review June 7, 2022 19:03
@bader bader requested review from a team as code owners June 7, 2022 19:03
@bader bader changed the title [SYCL] Refactor lower work-group scope pass [SYCL] Refactor SYCK kernel object handling in hierarchical parallelism Jun 7, 2022
@bader bader changed the title [SYCL] Refactor SYCK kernel object handling in hierarchical parallelism [SYCL] Refactor SYCL kernel object handling in hierarchical parallelism Jun 8, 2022
@bader bader requested review from againull and kbobrovs June 8, 2022 15:21
LLVMContext &Ctx = At.getContext();
IRBuilder<> Builder(Ctx);
Builder.SetInsertPoint(&LeaderBB->front());
if (!Arg.hasByValAttr())
Copy link
Contributor

@kbobrovs kbobrovs Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we skip "this" because it is allocated in the proper AS by the FE, correct? Comment would be helpful for the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I just reverted the changes from #1455 and tried to re-implement it by fixing address space in clang instead.
Do you want me to comment that this points to the object in local address space, so we don't need a shadow copy for that argument?

kbobrovs
kbobrovs previously approved these changes Jun 9, 2022
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LowerWGScope.cpp LGTM

@bader bader requested a review from erichkeane June 14, 2022 17:35
@bader bader requested a review from erichkeane June 14, 2022 18:00
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks or the detailed review @erichkeane and for the change @bader. FE changes LGTM

@bader bader merged commit 0c7a1e1 into intel:sycl Jun 15, 2022
@bader bader deleted the lower-wg-scope-refactor branch June 15, 2022 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants