Skip to content

[SYCL] Parallel-for range correction to improve group size selection by GPU driver #2703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Dec 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a23f9ad
[SYCL] Parallel-for range rounding-up for improved group size selecti…
rdeodhar Oct 29, 2020
047123f
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 2, 2020
73f50bd
Correction to wrapper kernel name.
rdeodhar Nov 4, 2020
2aad33d
Test correction to backend interoperability interface.
rdeodhar Nov 5, 2020
ac6bf28
Environment var control to disable optimization; correction to one test.
rdeodhar Nov 6, 2020
da1a3ab
Fixes for set_args and this_item usage, and for review comments.
rdeodhar Nov 16, 2020
e186605
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 16, 2020
eaacd8a
Formatting changes.
rdeodhar Nov 16, 2020
535745f
Formatting change.
rdeodhar Nov 16, 2020
5c6c841
Removed unneeded files.
rdeodhar Nov 16, 2020
dc20fa1
Added some comments requested by reviewers.
rdeodhar Nov 17, 2020
d62e2f1
Added a test for integration header and one execution test.
rdeodhar Nov 17, 2020
b860666
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 17, 2020
8677a2b
Adjustment to test to account for added lines in sycl.hpp.
rdeodhar Nov 18, 2020
c340ccf
Changed runtime test.
rdeodhar Nov 18, 2020
e08a478
Changes requested by reviewers, and test modifications.
rdeodhar Nov 19, 2020
0b878dc
Fixed EOL.
rdeodhar Nov 19, 2020
d6773cb
Modified a test.
rdeodhar Nov 19, 2020
59ae778
Modified this_item search, and env var names.
rdeodhar Nov 20, 2020
81b777c
Added env var documentation.
rdeodhar Nov 20, 2020
900aca8
Restrict rounding-up optimization to GPU devices.
rdeodhar Nov 21, 2020
094c01d
Merge branch 'sycl' into iwgo5
rdeodhar Nov 21, 2020
700c056
Made a method private.
rdeodhar Nov 22, 2020
6bfede9
Merge branch 'sycl' of https://github.com/intel/llvm into iwgo5
rdeodhar Nov 22, 2020
383ae96
Merge correction.
rdeodhar Nov 22, 2020
e8e7f74
Merge branch 'iwgo5' of https://github.com/rdeodhar/llvm into iwgo5
rdeodhar Nov 22, 2020
e8b0872
Test changes and improved this_item call detection.
rdeodhar Nov 26, 2020
e6d42a0
Necessary addition to SYCL symbols.
rdeodhar Nov 27, 2020
4b9093e
Minor corrections.
rdeodhar Dec 3, 2020
a2a6ded
Enabled rounding for CPU also.
rdeodhar Dec 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,9 @@ class SYCLIntegrationHeader {
/// Registers a specialization constant to emit info for it into the header.
void addSpecConstant(StringRef IDName, QualType IDType);

/// Notes that this_item is called within the kernel.
void setCallsThisItem(bool B);

private:
// Kernel actual parameter descriptor.
struct KernelParamDesc {
Expand Down Expand Up @@ -382,6 +385,9 @@ class SYCLIntegrationHeader {
/// Descriptor of kernel actual parameters.
SmallVector<KernelParamDesc, 8> Params;

// Whether kernel calls this_item()
bool CallsThisItem;

KernelDesc() = default;
};

Expand Down
128 changes: 116 additions & 12 deletions clang/lib/Sema/SemaSYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,23 @@ class Util {
/// \param Tmpl whether the class is template instantiation or simple record
static bool isSyclType(const QualType &Ty, StringRef Name, bool Tmpl = false);

/// Checks whether given function is a standard SYCL API function with given
/// name.
/// \param FD the function being checked.
/// \param Name the function name to be checked against.
static bool isSyclFunction(const FunctionDecl *FD, StringRef Name);

/// Checks whether given clang type is a full specialization of the SYCL
/// specialization constant class.
static bool isSyclSpecConstantType(const QualType &Ty);

// Checks declaration context hierarchy.
/// \param DC the context of the item to be checked.
/// \param Scopes the declaration scopes leading from the item context to the
/// translation unit (excluding the latter)
static bool matchContext(const DeclContext *DC,
ArrayRef<Util::DeclContextDesc> Scopes);

/// Checks whether given clang type is declared in the given hierarchy of
/// declaration contexts.
/// \param Ty the clang type being checked
Expand Down Expand Up @@ -511,6 +524,21 @@ class MarkDeviceFunction : public RecursiveASTVisitor<MarkDeviceFunction> {
FunctionDecl *FD = WorkList.back().first;
FunctionDecl *ParentFD = WorkList.back().second;

// To implement rounding-up of a parallel-for range the
// SYCL header implementation modifies the kernel call like this:
// auto Wrapper = [=](TransformedArgType Arg) {
// if (Arg[0] >= NumWorkItems[0])
// return;
// Arg.set_allowed_range(NumWorkItems);
// KernelFunc(Arg);
// };
//
// This transformation leads to a condition where a kernel body
// function becomes callable from a new kernel body function.
// Hence this test.
if ((ParentFD == KernelBody) && isSYCLKernelBodyFunction(FD))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I found this while working on something else. Is there anything we can do to make this MUCH more selective? The problem we have now is that someone who uses a lambda (or operator()) inside their top-level lambda will have things mis-diagnose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSYCLKernelBodyFunction has a simplistic implementation, not introduced by this PR, by the way. One way to improve matters is to recognize the kernel early during parsing, and use an internal attribute to mark it as a KernelBody.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this use of it actually is a pretty nasty breaking change.

I'm not sure what opportunity the parser has to do that marking, AND it would likely break your patch (since there is no way to mark the 2nd lambda there).

I think we might need some sort of way of having this library opt-into pulling the body-attributes in from the child.

KernelBody = FD;

if ((ParentFD == SYCLKernel) && isSYCLKernelBodyFunction(FD)) {
assert(!KernelBody && "inconsistent call graph - only one kernel body "
"function can be called");
Expand Down Expand Up @@ -2691,15 +2719,63 @@ class SyclKernelIntHeaderCreator : public SyclKernelFieldHandler {
return !SemaRef.getASTContext().hasSameType(FD->getType(), Ty);
}

// Sets a flag if the kernel is a parallel_for that calls the
// free function API "this_item".
void setThisItemIsCalled(const CXXRecordDecl *KernelObj,
FunctionDecl *KernelFunc) {
if (getKernelInvocationKind(KernelFunc) != InvokeParallelFor)
return;

const CXXMethodDecl *WGLambdaFn = getOperatorParens(KernelObj);
if (!WGLambdaFn)
return;

// The call graph for this translation unit.
CallGraph SYCLCG;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we already build a callgraph in SemaSYCL. Can we try to re-use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be nice to reuse that infrastructure. I first tried pursuing that approach. The result of a scan for calls to this_item would have to be saved somewhere. The existing callgraph traversal lead to various function "attributes" being set. This would be fine, except that calls_this_item is not an attribute. We could define an internal attribute for that. Would that be acceptable? If yes, it would simplify the SemaSYCL changes quite a bit. How to add such an attribute?

Copy link
Contributor

@elizabethandrews elizabethandrews Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why not. You can check SYCLRequiresDecomposition for an example of internal attribute. @premanandrao @Fznamznon could you please confirm this is ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be ok.

SYCLCG.addToCallGraph(SemaRef.getASTContext().getTranslationUnitDecl());
using ChildParentPair =
std::pair<const FunctionDecl *, const FunctionDecl *>;
llvm::SmallPtrSet<const FunctionDecl *, 16> Visited;
llvm::SmallVector<ChildParentPair, 16> WorkList;
WorkList.push_back({WGLambdaFn, nullptr});

while (!WorkList.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems also pretty similar to code which walks through another CallGraph in SemaSYCL. I think we need to merge them one day, but this is not in scope of your PR, I just wanted to capture the problem.

const FunctionDecl *FD = WorkList.back().first;
WorkList.pop_back();
if (!Visited.insert(FD).second)
continue; // We've already seen this Decl

// Check whether this call is to sycl::this_item().
if (Util::isSyclFunction(FD, "this_item")) {
Header.setCallsThisItem(true);
return;
}

CallGraphNode *N = SYCLCG.getNode(FD);
if (!N)
continue;

for (const CallGraphNode *CI : *N) {
if (auto *Callee = dyn_cast<FunctionDecl>(CI->getDecl())) {
Callee = Callee->getMostRecentDecl();
if (!Visited.count(Callee))
WorkList.push_back({Callee, FD});
}
}
}
}

public:
static constexpr const bool VisitInsideSimpleContainers = false;
SyclKernelIntHeaderCreator(Sema &S, SYCLIntegrationHeader &H,
const CXXRecordDecl *KernelObj, QualType NameType,
StringRef Name, StringRef StableName)
StringRef Name, StringRef StableName,
FunctionDecl *KernelFunc)
: SyclKernelFieldHandler(S), Header(H) {
bool IsSIMDKernel = isESIMDKernelType(KernelObj);
Header.startKernel(Name, NameType, StableName, KernelObj->getLocation(),
IsSIMDKernel);
setThisItemIsCalled(KernelObj, KernelFunc);
}

bool handleSyclAccessorType(const CXXRecordDecl *RD,
Expand Down Expand Up @@ -3146,7 +3222,7 @@ void Sema::ConstructOpenCLKernel(FunctionDecl *KernelCallerFunc,
SyclKernelIntHeaderCreator int_header(
*this, getSyclIntegrationHeader(), KernelObj,
calculateKernelNameType(Context, KernelCallerFunc), KernelName,
StableName);
StableName, KernelCallerFunc);

KernelObjVisitor Visitor{*this};
Visitor.VisitRecordBases(KernelObj, kernel_decl, kernel_body, int_header);
Expand Down Expand Up @@ -3858,6 +3934,9 @@ void SYCLIntegrationHeader::emit(raw_ostream &O) {
O << " __SYCL_DLL_LOCAL\n";
O << " static constexpr bool isESIMD() { return " << K.IsESIMDKernel
<< "; }\n";
O << " __SYCL_DLL_LOCAL\n";
O << " static constexpr bool callsThisItem() { return ";
O << K.CallsThisItem << "; }\n";
O << "};\n";
CurStart += N;
}
Expand Down Expand Up @@ -3916,6 +3995,12 @@ void SYCLIntegrationHeader::addSpecConstant(StringRef IDName, QualType IDType) {
SpecConsts.emplace_back(std::make_pair(IDType, IDName.str()));
}

void SYCLIntegrationHeader::setCallsThisItem(bool B) {
KernelDesc *K = getCurKernelDesc();
assert(K && "no kernels");
K->CallsThisItem = B;
}

SYCLIntegrationHeader::SYCLIntegrationHeader(DiagnosticsEngine &_Diag,
bool _UnnamedLambdaSupport,
Sema &_S)
Expand Down Expand Up @@ -3983,6 +4068,21 @@ bool Util::isSyclType(const QualType &Ty, StringRef Name, bool Tmpl) {
return matchQualifiedTypeName(Ty, Scopes);
}

bool Util::isSyclFunction(const FunctionDecl *FD, StringRef Name) {
if (!FD->isFunctionOrMethod() || !FD->getIdentifier() ||
FD->getName().empty() || Name != FD->getName())
return false;

const DeclContext *DC = FD->getDeclContext();
if (DC->isTranslationUnit())
return false;

std::array<DeclContextDesc, 2> Scopes = {
Util::DeclContextDesc{clang::Decl::Kind::Namespace, "cl"},
Util::DeclContextDesc{clang::Decl::Kind::Namespace, "sycl"}};
return matchContext(DC, Scopes);
}

bool Util::isAccessorPropertyListType(const QualType &Ty) {
const StringRef &Name = "accessor_property_list";
std::array<DeclContextDesc, 4> Scopes = {
Expand All @@ -3993,21 +4093,15 @@ bool Util::isAccessorPropertyListType(const QualType &Ty) {
return matchQualifiedTypeName(Ty, Scopes);
}

bool Util::matchQualifiedTypeName(const QualType &Ty,
ArrayRef<Util::DeclContextDesc> Scopes) {
// The idea: check the declaration context chain starting from the type
bool Util::matchContext(const DeclContext *Ctx,
ArrayRef<Util::DeclContextDesc> Scopes) {
// The idea: check the declaration context chain starting from the item
// itself. At each step check the context is of expected kind
// (namespace) and name.
const CXXRecordDecl *RecTy = Ty->getAsCXXRecordDecl();

if (!RecTy)
return false; // only classes/structs supported
const auto *Ctx = cast<DeclContext>(RecTy);
StringRef Name = "";

for (const auto &Scope : llvm::reverse(Scopes)) {
clang::Decl::Kind DK = Ctx->getDeclKind();

if (DK != Scope.first)
return false;

Expand All @@ -4021,11 +4115,21 @@ bool Util::matchQualifiedTypeName(const QualType &Ty,
Name = cast<NamespaceDecl>(Ctx)->getName();
break;
default:
llvm_unreachable("matchQualifiedTypeName: decl kind not supported");
llvm_unreachable("matchContext: decl kind not supported");
}
if (Name != Scope.second)
return false;
Ctx = Ctx->getParent();
}
return Ctx->isTranslationUnit();
}

bool Util::matchQualifiedTypeName(const QualType &Ty,
ArrayRef<Util::DeclContextDesc> Scopes) {
const CXXRecordDecl *RecTy = Ty->getAsCXXRecordDecl();

if (!RecTy)
return false; // only classes/structs supported
const auto *Ctx = cast<DeclContext>(RecTy);
return Util::matchContext(Ctx, Scopes);
}
12 changes: 12 additions & 0 deletions clang/test/CodeGenSYCL/Inputs/sycl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,18 @@ struct id {
int Data;
};

template <int dim> struct item {
template <typename... T>
item(T... args) {} // fake constructor
private:
// Some fake field added to see using of item arguments in the
// kernel wrapper
int Data;
};

template <int Dims> item<Dims>
this_item() { return item<Dims>{}; }

template <int dim>
struct range {
template <typename... T>
Expand Down
4 changes: 2 additions & 2 deletions clang/test/CodeGenSYCL/kernel-by-reference.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ int simple_add(int i) {
int main() {
queue q;
#if defined(SYCL2020)
// expected-warning@Inputs/sycl.hpp:286 {{Passing kernel functions by value is deprecated in SYCL 2020}}
// expected-warning@Inputs/sycl.hpp:298 {{Passing kernel functions by value is deprecated in SYCL 2020}}
// expected-note@+3 {{in instantiation of function template specialization}}
#endif
q.submit([&](handler &h) {
h.single_task_2017<class sycl2017>([]() { simple_add(10); });
});

#if defined(SYCL2017)
// expected-warning@Inputs/sycl.hpp:281 {{Passing of kernel functions by reference is a SYCL 2020 extension}}
// expected-warning@Inputs/sycl.hpp:293 {{Passing of kernel functions by reference is a SYCL 2020 extension}}
// expected-note@+3 {{in instantiation of function template specialization}}
#endif
q.submit([&](handler &h) {
Expand Down
114 changes: 114 additions & 0 deletions clang/test/CodeGenSYCL/parallel_for_this_item.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
// RUN: %clang_cc1 -fsycl -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown-sycldevice -fsycl-int-header=%t.h %s -fsyntax-only
// RUN: FileCheck -input-file=%t.h %s

// This test checks that compiler generates correct kernel description
// for parallel_for kernels that use the this_item API.

// CHECK: __SYCL_INLINE_NAMESPACE(cl) {
// CHECK-NEXT: namespace sycl {
// CHECK-NEXT: namespace detail {

// CHECK: static constexpr
// CHECK-NEXT: const char* const kernel_names[] = {
// CHECK-NEXT: "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3GNU",
// CHECK-NEXT: "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3EMU",
// CHECK-NEXT: "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3OWL",
// CHECK-NEXT: "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3RAT"
// CHECK-NEXT: };

// CHECK:template <> struct KernelInfo<class GNU> {
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const char* getName() { return "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3GNU"; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr unsigned getNumParams() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const kernel_param_desc_t& getParamDesc(unsigned i) {
// CHECK-NEXT: return kernel_signatures[i+0];
// CHECK-NEXT: }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool isESIMD() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool callsThisItem() { return 0; }
// CHECK-NEXT:};
// CHECK-NEXT:template <> struct KernelInfo<class EMU> {
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const char* getName() { return "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3EMU"; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr unsigned getNumParams() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const kernel_param_desc_t& getParamDesc(unsigned i) {
// CHECK-NEXT: return kernel_signatures[i+0];
// CHECK-NEXT: }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool isESIMD() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool callsThisItem() { return 1; }
// CHECK-NEXT:};
// CHECK-NEXT:template <> struct KernelInfo<class OWL> {
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const char* getName() { return "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3OWL"; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr unsigned getNumParams() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const kernel_param_desc_t& getParamDesc(unsigned i) {
// CHECK-NEXT: return kernel_signatures[i+0];
// CHECK-NEXT: }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool isESIMD() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool callsThisItem() { return 0; }
// CHECK-NEXT:};
// CHECK-NEXT:template <> struct KernelInfo<class RAT> {
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const char* getName() { return "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE3RAT"; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr unsigned getNumParams() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr const kernel_param_desc_t& getParamDesc(unsigned i) {
// CHECK-NEXT: return kernel_signatures[i+0];
// CHECK-NEXT: }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool isESIMD() { return 0; }
// CHECK-NEXT: __SYCL_DLL_LOCAL
// CHECK-NEXT: static constexpr bool callsThisItem() { return 1; }
// CHECK-NEXT:};

#include "sycl.hpp"

using namespace cl::sycl;

SYCL_EXTERNAL item<1> g() { return this_item<1>(); }
SYCL_EXTERNAL item<1> f() { return g(); }

// This is a similar-looking this_item function but not the real one.
template <int Dims> item<Dims> this_item(int i) { return item<1>{i}; }

// This is a method named this_item but not the real one.
class C {
public:
template <int Dims> item<Dims> this_item() { return item<1>{66}; };
};

int main() {
queue myQueue;
myQueue.submit([&](::handler &cgh) {
// This kernel does not call sycl::this_item
cgh.parallel_for<class GNU>(range<1>(1),
[=](item<1> I) { this_item<1>(55); });

// This kernel calls sycl::this_item
cgh.parallel_for<class EMU>(range<1>(1),
[=](::item<1> I) { this_item<1>(); });

// This kernel does not call sycl::this_item
cgh.parallel_for<class OWL>(range<1>(1), [=](id<1> I) {
class C c;
c.this_item<1>();
});

// This kernel calls sycl::this_item
cgh.parallel_for<class RAT>(range<1>(1), [=](id<1> I) { f(); });
});

return 0;
}
2 changes: 2 additions & 0 deletions sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ subject to change. Do not rely on these variables in production code.
| SYCL_PI_LEVEL_ZERO_MAX_COMMAND_LIST_CACHE | Positive integer | Maximum number of oneAPI Level Zero Command lists that can be allocated with no reuse before throwing an "out of resources" error. Default is 20000, threshold may be increased based on resource availabilty and workload demand. |
| SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR | Any(\*) | Disable USM allocator in Level Zero plugin (each memory request will go directly to Level Zero runtime) |
| SYCL_PI_LEVEL_ZERO_BATCH_SIZE | Integer | Sets a preferred number of commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
| SYCL_PARALLEL_FOR_RANGE_ROUNDING_TRACE | Any(\*) | Enables tracing of parallel_for invocations with rounded-up ranges. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. Suggest reusing some level of SYCL_PI_TRACE instead of introducing new variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use the SYCL_PI_TRACE env var to control this output. This trace has nothing to do with the plugin.

| SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING | Any(\*) | Disables automatic rounding-up of parallel_for invocation ranges. |

`(*) Note: Any means this environment variable is effective when set to any non-null value.`

Expand Down
Loading