Skip to content

[SYCL] Parallel-for range correction to improve group size selection by GPU driver #2703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Dec 17, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a23f9ad
[SYCL] Parallel-for range rounding-up for improved group size selecti…
rdeodhar Oct 29, 2020
047123f
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 2, 2020
73f50bd
Correction to wrapper kernel name.
rdeodhar Nov 4, 2020
2aad33d
Test correction to backend interoperability interface.
rdeodhar Nov 5, 2020
ac6bf28
Environment var control to disable optimization; correction to one test.
rdeodhar Nov 6, 2020
da1a3ab
Fixes for set_args and this_item usage, and for review comments.
rdeodhar Nov 16, 2020
e186605
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 16, 2020
eaacd8a
Formatting changes.
rdeodhar Nov 16, 2020
535745f
Formatting change.
rdeodhar Nov 16, 2020
5c6c841
Removed unneeded files.
rdeodhar Nov 16, 2020
dc20fa1
Added some comments requested by reviewers.
rdeodhar Nov 17, 2020
d62e2f1
Added a test for integration header and one execution test.
rdeodhar Nov 17, 2020
b860666
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 17, 2020
8677a2b
Adjustment to test to account for added lines in sycl.hpp.
rdeodhar Nov 18, 2020
c340ccf
Changed runtime test.
rdeodhar Nov 18, 2020
e08a478
Changes requested by reviewers, and test modifications.
rdeodhar Nov 19, 2020
0b878dc
Fixed EOL.
rdeodhar Nov 19, 2020
d6773cb
Modified a test.
rdeodhar Nov 19, 2020
59ae778
Modified this_item search, and env var names.
rdeodhar Nov 20, 2020
81b777c
Added env var documentation.
rdeodhar Nov 20, 2020
900aca8
Restrict rounding-up optimization to GPU devices.
rdeodhar Nov 21, 2020
094c01d
Merge branch 'sycl' into iwgo5
rdeodhar Nov 21, 2020
700c056
Made a method private.
rdeodhar Nov 22, 2020
6bfede9
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 22, 2020
383ae96
Merge correction.
rdeodhar Nov 22, 2020
e8e7f74
Merge branch 'iwgo5' of https://github.com/rdeodhar/llvm into iwgo5
rdeodhar Nov 22, 2020
e8b0872
Test changes and improved this_item call detection.
rdeodhar Nov 26, 2020
e6d42a0
Necessary addition to SYCL symbols.
rdeodhar Nov 27, 2020
4b9093e
Minor corrections.
rdeodhar Dec 3, 2020
a2a6ded
Enabled rounding for CPU also.
rdeodhar Dec 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions clang/lib/Sema/SemaSYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -510,6 +510,22 @@ class MarkDeviceFunction : public RecursiveASTVisitor<MarkDeviceFunction> {
FunctionDecl *FD = WorkList.back().first;
FunctionDecl *ParentFD = WorkList.back().second;

// To implement rounding-up of a parallel-for range (Jira 20239)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove reference to internal tracking number.

Suggested change
// To implement rounding-up of a parallel-for range (Jira 20239)
// To implement rounding-up of a parallel-for range

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

// a kernel call is modified like this:
// auto Wrapper = [=](TransformedArgType Arg) {
// if (Arg[0] >= NumWorkItems[0])
// return;
// Arg.set_allowed_range(NumWorkItems);
// KernelFunc(Arg);
// };
//
// This transformation leads to a condition where a kernel body
// function becomes callable from a new kernel body function.
// Hence this test.
if ((ParentFD == KernelBody) && isSYCLKernelBodyFunction(FD)) {
KernelBody = FD;
}

if ((ParentFD == SYCLKernel) && isSYCLKernelBodyFunction(FD)) {
assert(!KernelBody && "inconsistent call graph - only one kernel body "
"function can be called");
Expand Down
53 changes: 44 additions & 9 deletions sycl/include/CL/sycl/handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -728,23 +728,58 @@ class __SYCL_EXPORT handler {
void parallel_for_lambda_impl(range<Dims> NumWorkItems,
KernelType KernelFunc) {
throwIfActionIsCreated();
using NameT =
typename detail::get_kernel_name_t<KernelName, KernelType>::name;
using LambdaArgType = sycl::detail::lambda_arg_type<KernelType, item<Dims>>;

// If 1D kernel argument is an integral type, convert it to sycl::item<1>
using TransformedArgType =
typename std::conditional<std::is_integral<LambdaArgType>::value &&
Dims == 1,
item<Dims>, LambdaArgType>::type;
using NameT =
typename detail::get_kernel_name_t<KernelName, KernelType>::name;
constexpr size_t GoodLocalSizeX = 32;
std::string KName = typeid(NameT *).name();
bool DisableRounding =
KName.find("SYCL_OPT_PFWGS_DISABLE") != std::string::npos;
if (!DisableRounding && NumWorkItems[0] % GoodLocalSizeX != 0) {
// Not a multiple
size_t NewValX =
((NumWorkItems[0] + GoodLocalSizeX - 1) / GoodLocalSizeX) *
GoodLocalSizeX;
if (getenv("SYCL_OPT_PFWGS_TRACE") != nullptr)
std::cerr << "***** Adjusted size from " << NumWorkItems[0] << " to "
<< NewValX << " *****\n";
auto Wrapper = [=](TransformedArgType Arg) {
if (Arg[0] >= NumWorkItems[0])
return;
Arg.set_allowed_range(NumWorkItems);
KernelFunc(Arg);
};

using NameWT = NameT *;
range<Dims> AdjustedRange = NumWorkItems;
AdjustedRange.set_range(NewValX);
#ifdef __SYCL_DEVICE_ONLY__
(void)NumWorkItems;
kernel_parallel_for<NameT, TransformedArgType>(KernelFunc);
kernel_parallel_for<NameWT, TransformedArgType>(Wrapper);
#else
detail::checkValueRange<Dims>(NumWorkItems);
MNDRDesc.set(std::move(NumWorkItems));
StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
std::move(KernelFunc));
MCGType = detail::CG::KERNEL;
detail::checkValueRange<Dims>(AdjustedRange);
MNDRDesc.set(std::move(AdjustedRange));
StoreLambda<NameWT, decltype(Wrapper), Dims, TransformedArgType>(
std::move(Wrapper));
MCGType = detail::CG::KERNEL;
#endif
} else {
#ifdef __SYCL_DEVICE_ONLY__
(void)NumWorkItems;
kernel_parallel_for<NameT, TransformedArgType>(KernelFunc);
#else
detail::checkValueRange<Dims>(NumWorkItems);
MNDRDesc.set(std::move(NumWorkItems));
StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
std::move(KernelFunc));
MCGType = detail::CG::KERNEL;
#endif
}
}

/// Defines and invokes a SYCL kernel function for the specified range.
Expand Down
2 changes: 2 additions & 0 deletions sycl/include/CL/sycl/id.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ template <int dimensions = 1> class id : public detail::array<dimensions> {
return result;
}

void set_allowed_range(range<dimensions> rnwi) { (void)rnwi[0]; }

#ifndef __SYCL_DISABLE_ID_TO_INT_CONV__
/* Template operator is not allowed because it disables further type
* conversion. For example, the next code will not work in case of template
Expand Down
2 changes: 2 additions & 0 deletions sycl/include/CL/sycl/item.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ template <int dimensions = 1, bool with_offset = true> class item {

bool operator!=(const item &rhs) const { return rhs.MImpl != MImpl; }

void set_allowed_range(const range<dimensions> rnwi) { MImpl.MExtent = rnwi; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this new function is public. Should it be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This seems to introduce a lot of new public functions. I don't think we should be doing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of this is to be transparent to the programmer, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


protected:
template <bool has_offset = with_offset>
item(detail::enable_if_t<has_offset, const range<dimensions>> &extent,
Expand Down
3 changes: 3 additions & 0 deletions sycl/include/CL/sycl/range.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ template <int dimensions = 1> class range : public detail::array<dimensions> {
return size;
}

// Adjust the first dim of the range
void set_range(const size_t dim0) { this->common_array[0] = dim0; }

range(const range<dimensions> &rhs) = default;
range(range<dimensions> &&rhs) = default;
range<dimensions> &operator=(const range<dimensions> &rhs) = default;
Expand Down