Skip to content

[SYCL] Support for array kernel parameters #1423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

[SYCL] Support for array kernel parameters #1423

wants to merge 6 commits into from

Conversation

rdeodhar
Copy link
Contributor

Signed-off-by: rdeodhar [email protected]
This PR adds support for:

  1. standard-layout arrays as kernel parameters
  2. arrays of Accessors as kernel parameters
  3. structs containing arrays of Accessors
    The design is described in the attached document.

Array_Kernel_Parameters.txt

@bader
Copy link
Contributor

bader commented Mar 29, 2020

@rdeodhar, please, add documentation to sycl/doc directory similar to other *.md documents.

@Fznamznon Fznamznon changed the title [SYC] Support for array kernel parameters [SYCL] Support for array kernel parameters Mar 30, 2020
@bader bader mentioned this pull request Mar 30, 2020
Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to give comments to the doc itself. Please convert it to markdown format and add to the patch.
BTW, the doc sometimes that this is an extension. Does this patch actually implements some SYCL language extension?
Could you please add a CodeGen test checking resulting IR module?

@Fznamznon Fznamznon self-assigned this Mar 30, 2020
@rdeodhar rdeodhar requested a review from Fznamznon March 30, 2020 14:59
@keryell
Copy link
Contributor

keryell commented Mar 31, 2020

This is an interesting feature, but also strange.

Imagine some wrong (SYCL) C++ code where a C-style array is captured by reference but goes out of scope when the kernel is really executed. You have some crash on host device and non host device.
With this extension the wrong SYCL C++ code appears to work on non host device but still crashes on host device... Nice for the debug and headache...
Otherwise, just use... std::array :-)

@bader
Copy link
Contributor

bader commented Mar 31, 2020

C-style array is captured by reference

According to my understanding this is not allowed (except some USM modes).
Arrays (as any other host variable) should be captured by value. Right?

@rdeodhar
Copy link
Contributor Author

The array is captured by value.

(And the incorrect argument about capture of an array by reference would apply to non-arrays as well, such as scalars, and structs. But it doesn't.)

@keryell
Copy link
Contributor

keryell commented Mar 31, 2020

C-style array is captured by reference

To clarify my short-cut here: the C-style array decays to pointer which is captured by value. At the end in normal C++ code, the user can see this lambda capture as a way to address the array by reference.

According to my understanding this is not allowed (except some USM modes).
Arrays (as any other host variable) should be captured by value. Right?

C arrays do not really exist. You just copy the pointer value in the kernel.
Then you can use how you want this scalar value with its inherent... limitations. :-)

All this reminds me some lengthy discussions in the Khronos OpenCL committee around the open-source Clang implementation and about how to capture arrays in Clang blocks used to device-side enqueue kernels with OpenCL C 2.0 syntax...
I do not remember how this story ended... @AnastasiaStulova ?

If I understand correctly all this, at the end I am a little puzzled by the motivating use case.
Having some incorrect crashing C++ code that happens to do something at the end by using this extension in some cases...

void crash_me(sycl::queue &q) {
  int soon_out_of_scope[] = { 1, 2 };
  // The single task will be executed in the future, probably after 
  q.single_task([=] { f(soon_out_of_scope[1]); });
}

// Suspense... What is the device of the queue picked by the runtime?
sycl::queue q;
crash_me(q);
a_function_that_reuses_the_stack();
// With the magical extension, it does not crash if the device is not the host
q.wait();

So what is the plan next?
Can you handle arrays of unknown size? It would be nice to have C99 VLA extension too.
Push this C++ extension in C++23? Good luck. :-)
Extend UBSan, ThreadSanitizer, Valgrind, HellGrind... so they can understand this extension too?
Do you plan to handle also array of arrays of arrays of... ?

Any C programmer knows since the 70's that if you want to pass arrays by value to functions you put them in structures. But I am unsure that we want a C++ extension here to does this automatically...

Also any programmer using std::thread or std::async today has this issue.

Follow the https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rsl-arrays and use std::array instead for this purpose. :-)

The philosophy behind SYCL C++ DSL is that it is just plain C++ without extension, if I remember correctly.

Can you restate what is a real compelling code sample for all this?

@rdeodhar
Copy link
Contributor Author

rdeodhar commented Mar 31, 2020

This surprised me too, but for the purposes of lambda capture, the whole C-style array is captured by value. It is not the case that the base pointer alone is captured. See https://godbolt.org/z/PyW6fT.

So, the motivation for this change is to match the user expectation of array capture by value, even when using SYCL.

@keryell
Copy link
Contributor

keryell commented Mar 31, 2020

This surprised me too, but for the purposes of lambda capture, the whole C-style array is captured by value. It is not the case that the base pointer alone is captured. See https://godbolt.org/z/PyW6fT.

Very convincing ! \o/

I have forgotten about this behavior that has bitten me already once and is hidden in C++ draft "7.5.5.2 Captures [expr.prim.lambda.capture]" as

(For array members, the array elements are direct-initialized in increasing subscript order.)

So, the motivation for this change is to match the user expectation of array capture by value, even when using SYCL.

I see. I did a complete misinterpretation of your PR. :-( I thought you had some applications you wanted to simplify when ported to SYCL by extending the behavior of the C++ with some flavors of OpenMP or OpenACC. That is the opposite. It was something which was working in C++ and with SYCL and the host device but not with the current SYCL implementation with non-host devices because of the way the serialization of the arguments is done currently between the host and device.

So, could you clarify your design document accordingly?

For example, at least:

The motivation for this extension is this: On the CPU, a lambda function is allowed to access an element of an array defined outside the lambda. The implementation captures the entire array by value. A user would naturally expect this to work in SYCL as well.

"The implementation captures the entire array by value." is not about this implementation but about the C++ standard where "lambdas capture the entire array by value". Otherwise I understand there is something in this implementation we want to generalize...

"On the CPU, a lambda function is allowed to access an element of an array defined outside the lambda." should come as a consequence of the sentence above. And it is not really a question of "allowed or not" but the fact that when using a capture by value, it can just work on a copy of the array.

So good news, I can answer myself: no extension to do in C++23, no change to do to ThreadSanitizer, Valgrind, HellGrind... :-)

Thank you for the clarification and sorry for the noise.

@rolandschulz
Copy link
Contributor

I have forgotten about this behavior that have bitten me already once and is hidden in C++ draft "7.5.5.2 Captures [expr.prim.lambda.capture]" as

This seems to be confusing to everyone. Do you think we should add something to the SYCL spec to clarify this?

@AnastasiaStulova
Copy link
Contributor

All this reminds me some lengthy discussions in the Khronos OpenCL committee around the open-source Clang implementation and about how to capture arrays in Clang blocks used to device-side enqueue kernels with OpenCL C 2.0 syntax...
I do not remember how this story ended... @AnastasiaStulova ?

Yes, there was also upstream discussion regarding this http://lists.llvm.org/pipermail/cfe-dev/2016-March/047849.html and Khronos BugZilla issue 15659.

I think we ended up with this as a conclusion:

  • Block capture semantics follows regular C argument passing convention i.e. arrays are captured by reference (decayed to pointers) and structs are captured by value (Example 5).

@Fznamznon Fznamznon linked an issue Apr 1, 2020 that may be closed by this pull request
@keryell
Copy link
Contributor

keryell commented Apr 2, 2020

This seems to be confusing to everyone. Do you think we should add something to the SYCL spec to clarify this?

Yes, why not? Feel free to contribute a note.

@keryell
Copy link
Contributor

keryell commented Apr 2, 2020

Yes, there was also upstream discussion regarding this http://lists.llvm.org/pipermail/cfe-dev/2016-March/047849.html and Khronos BugZilla issue 15659.

I think we ended up with this as a conclusion:

  • Block capture semantics follows regular C argument passing convention i.e. arrays are captured by reference (decayed to pointers) and structs are captured by value (Example 5).

Ah yes https://www.khronos.org/members/login/bugzilla/show_bug.cgi?id=15659

Thanks for being a good librarian on this! :-)

And 4 years ago I made the same misinterpretation on C++ lambda and array captures... :-(

Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the design document!

Fznamznon
Fznamznon previously approved these changes Apr 3, 2020
Copy link
Contributor

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, otherwise seems to be OK.

FieldDecl *Field,
const CXXRecordDecl *CRD,
Expr *Base,
FieldDecl *Field, Expr *SpecialObjME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FieldDecl *Field, Expr *SpecialObjME,
FieldDecl *Field, MemberExpr *SpecialObjME,

The name implies this is going to be a member expression, so we should have it be that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move the SpecialObjME parameter to the end with a default value of nullptr.
However, its type needs to remain as Expr*. In the cases where a nullptr is passed in, this function will create a MemberExpr and assign it to SpecialObjME. In the cases where a value is passed in, it is an Expr*, an index operation applied to a MemberExpr. That's why the declared type needs to be the more general Expr*.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what you mean here? If it is a member expression, it should be stored as one. It is a pointer, so it shouldn't matter what gets passed in...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's passed in is either nullptr or a member[index]. So what's passed in is not a MemberExpr.
However, when nullptr is passed in, the function creates a MemberExpr to use in place of the nullptr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it isn't a Special Object member expression. The more you've talked about this, the more I'm not sure we have the correct abstraction here. It seems that passing it like this isn't the correct place to break the ownership model then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what a Special Object member expression is. I took SpecialObjMe to be a name with nothing special meant by the "Special". I seem to be missing something here.
What change do you suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ME typically stands for member expression. I'll have to look during the day tomorrow and better understand this code and the patch before making a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also look at the design doc added to sycl/doc, which will explain what is happening here.

@@ -749,8 +750,8 @@ static CompoundStmt *CreateOpenCLKernelBody(Sema &S,

llvm::SmallVector<Expr *, 4> ParamStmts;
const auto *Proto = cast<FunctionProtoType>(Method->getType());
S.GatherArgumentsForCall(SourceLocation(), Method, Proto, 0,
ParamDREs, ParamStmts);
S.GatherArgumentsForCall(SourceLocation(), Method, Proto, 0, ParamDREs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here? I realize that the formatting tool probably got this, but please don't change this unless you do it in a patch specifically to fix the formatting.

@@ -761,48 +762,79 @@ static CompoundStmt *CreateOpenCLKernelBody(Sema &S,
// Recursively search for accessor fields to initialize them with kernel
// parameters
std::function<void(const CXXRecordDecl *, Expr *)>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this NOT std::function. I realize this is existing, but this is a lambda, the type erasure cost of a std::function here is pretty horrible. Just replace the type above with 'auto'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested change fails to compile because there is a recursive call to getExprForWrappedAccessorInit, and because of that, "auto" cannot be used.
I would rather not get into making this change since it is in existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed that it is calling itself. That said,you ARE modifying everything else with it, so it is completely within the scope of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When auto is used the compiler does not accept the call within the body, because the type deduction hasn't happened yet. Can you suggest a way to write the type of this lambda, that doesn't use auto, and also doesn't use std::function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot name the type of a lambda as anything but 'auto'.

You cant use lambdas recursively unfortunately, the type of the lambda isn't complete until after the closing brace. SO the only alternative is a function object. In this case, I'm OK with std::function for now. Normally, it would be in the scope of the review, but I don't think its worth changing since you cannot just use the lambda's type.

@@ -826,8 +858,8 @@ static CompoundStmt *CreateOpenCLKernelBody(Sema &S,
InitializationSequence InitSeq(S, Entity, InitKind, None);
ExprResult MemberInit = InitSeq.Perform(S, Entity, InitKind, None);
InitExprs.push_back(MemberInit.get());
getExprForSpecialSYCLObj(FieldType, Field, CRD, KernelObjCloneRef,
InitMethodName, BodyStmts);
getExprForSpecialSYCLObj(FieldType, Field, nullptr, CRD,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I see how often you're calling this with nullptr. I'd prefer that you either move it to the end and make it a default parameter, or make sure that every place you pass nullptr you put the variable-name-comment in it.

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will continue reviewing the array-specific part.

<h3>Introduction</h3>

This document describes the changes to support passing arrays to SYCL kernels
and special treatment of Accessor arrays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not clear what is 'Accessor arrays', please elaborate a bit - e.g. an array of sycl::accessor objects (?)

The following cases are handled:

1. arrays of standard-layout type as top-level arguments
2. arrays of Accessors as top-level arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. arrays of Accessors as top-level arguments
2. Arrays of accessors as top-level arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bulleted lists if each bullet forms a complete sentence then that sentence begins with a capitalized letter and is terminated with a period. If bullets follow a colon and are not each a complete sentence then the format used in the document is recommended.

<h3>Introduction</h3>

This document describes the changes to support passing arrays to SYCL kernels
and special treatment of Accessor arrays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and special treatment of Accessor arrays.
and special treatment of accessor arrays.

1. arrays of standard-layout type as top-level arguments
2. arrays of Accessors as top-level arguments
3. arrays of accessors within structs that are top-level arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the limitation really structs or std-layout objects?

and special treatment of Accessor arrays.
The following cases are handled:

1. arrays of standard-layout type as top-level arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. arrays of standard-layout type as top-level arguments
1. Arrays of standard-layout type as top-level arguments.

As may be observed from the example above, standard-layout lambda capture
components are passed by value to the device as separate parameters.
This includes scalars, pointers, and standard-layout structs.
Certain SYCL struct types that are not standard-layout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Certain SYCL struct types that are not standard-layout,
Certain SYCL standard API types that are not standard-layout,

components are passed by value to the device as separate parameters.
This includes scalars, pointers, and standard-layout structs.
Certain SYCL struct types that are not standard-layout,
such as Accessors and Samplers, are treated specially.
Copy link
Contributor

@kbobrovs kbobrovs Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
such as Accessors and Samplers, are treated specially.
such as sycl::accessor and sycl::sampler, are treated specially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for accessor and sampler objects to get disassembled/reassembled is not that they are of non-standard layout, but because their content is very different on host and device, even the set of fields which constitute these types are different. Runtime performs a (not necessarily 1:1) translation of host object components into device object components to be passed via arguments.

Certain SYCL struct types that are not standard-layout,
such as Accessors and Samplers, are treated specially.
The arguments to their init functions are passed as separate parameters
and used within the kernel caller function to initialize Accessors/Samplers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and everywhere, using capitalized first letter to name these types does not seem correct.

};
```

Each entry in the kernel_signatures table contains three values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each entry in the kernel_signatures table contains three values:
Each entry in the kernel_signatures table is a kernel_param_desc_t object which contains three values:

Each entry in the kernel_signatures table contains three values:
1) an encoding of the type of capture object member
2) a field that encodes additional properties, and
3) an offset within a block of memory where the value of that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3) an offset within a block of memory where the value of that
3) an offset within the lambda object where the value of that

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice doc, thanks. Several places just need some clarification.

As described earlier, each variable captured by a lambda that comprises a
SYCL kernel becomes a parameter of the kernel caller function.
For arrays, simply allowing them through would result in a
function parameter of array type. This is not supported in C++.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier it was mentioned that in std C++ arrays are captured. Is SYCL solution repeating the std C++ solution or not? Would be good to mention here to have a baseline for comparison and better understanding, since std C++ was mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++, a special case is made for an array captured in a lambda. The whole array is captured.
For a SYCL lambda the behavior is now the same.

The sharp-eyed reviewer of `SemaSYCL.cpp` will notice that the array
is actually double-wrapped in structs. This was done simply to preserve
the interface to an existing function (`CreateAndAddPrmDsc`) which
processes each kernel caller parameter as a capture object member.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought that this is done to be able to pass the array into the kernel, as the top level capture struct is broken down into components, and the wrapping becomes single, and it needs to be at least single as bare (unwrapped) arrays can't be passed in C++ as mentied above.
So if the reason is different and if it is possible to create AST for an argument of bare (unwrapped) array type then you might want to consider not wrapping the array and introducing a new kind of parameter - kind_array - instead of using kind_std_layout. That would probably simplify code generation and generated code. Maybe a TODO for future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated code uses a struct to wrap the array. The double-wrapping is an implementation detail, just the way some existing functions are used to achieve an array wrapped in a (single) struct as the generated code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix the CreateAndAddParmDsc instead, a fixme here won't be fixable without being an ABI break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a way in clang to represent a function argument that is a whole array. That's why a wrap in a struct is necessary.

Arrays of accessors are supported in a manner similar to that of a plain
Accessor. For each accessor array element, the four values required to
call its init function are passed as separate arguments to the kernel.
Reassembly within the kernel caller is serialized by accessor array element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what it means - 'serialized by accessor array element'. Apparently, 'performed individually for each accessor array element', but some may interpret it as 'not done in parallel'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init functions of accessors in an array are called one after the other, in the order of the array elements.


This document describes the changes to support passing arrays to SYCL kernels
and special treatment of Accessor arrays.
The following cases are handled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if user code does fit into the supported cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An compiler diagnostic is issued.

range<1> inAccR2_1,
id<1> inI_1,
)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what are the restrictions on how big the array or array of accessors can be? Maybe worth adding typical value here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present there is no check on array size, just as there are no checks on the sizes of any other arguments such as structs.

<h3>Fix 3: Accessor Arrays within Structs</h3>

*Individual* Accessors within structs were already supported.
Struct parameters of kernels that are structs are traversed member
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Struct parameters of kernels that are structs are traversed member
Kernel parameters that are structs are traversed member

Struct parameters of kernels that are structs are traversed member
by member, recursively, to enumerate member structs that are one of
the SYCL special types: Accessors and Samplers. For each special
struct encountered in the scan, arguments of their init functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct encountered in the scan, arguments of their init functions
struct encountered in the traversal, arguments of their init functions

@erichkeane
Copy link
Contributor

So a few comments as I'm reviewing this function, as I haven't dealt with it at all yet:

1: When using 'auto' to represent a pointer, coding standard requires making it "auto *". This seems to mess that up a couple of times.

2: This patch makes getExprForSpecialSYCLObj and getExprForWrappedAccessorInit even HUGER. They need to be their own functions, not lambdas (please fix). We have quite a few lambdas used in this code that are too large, so I'd prefer that any that this patch touches that are more than ~5-10 lines need to be their own function (copyableToKernel should likely just be a function as well).

3: Doing #2 above makes SpecialObjME an overloadable parameter (basically a 'fixup' from one to the other), so that likely could be simplified by becomign 2 functions.

4: This patch makes the comment on getExprForWrappedAccessorInit a lie. It isn't searching for accessor fields, its searching for a ton of things.

5: This patch seems to do quite a bit of repeatative code to create the indexes into the array. It seems to me that this needs to be made its own function.

@erichkeane
Copy link
Contributor

I spent a bit of time going for a walk and thinking about it, and I think we're pretty solidly at a point (even before this patch) where much of the code this patch touches is in desperate need of a significant refactor. The code is disorganized, filled with repetition, and quite unreadable.

This patch shows we're even at the point where we're making significant architecture decisions based on hacks required (array in a struct in a struct!) to work within this code.

I don't consider myself the code owner of this bit, since I haven't had much/any involvement with it so far (I believe @kbobrovs is the most familiar), but if I WAS, I would likely not allow any patches to touch it until it had gone through a sizable rework.

I'll leave the rest of the review up to @kbobrovs , but I'd suggest strongly working with the author of this patch (@rdeodhar ) to come up with a clean, fluid, and readable version of the OpenCL kernel code and integration header generation code that wouldn't require the array-in-struct-instruct hack above, and is much more readable.

@rdeodhar
Copy link
Contributor Author

rdeodhar commented Apr 7, 2020

I'll wait on Konst's opinion on the need for rewriting the entire SYCL kernel code generation piece.
A full rewrite is beyond the scope of this change.

@erichkeane
Copy link
Contributor

erichkeane commented Apr 7, 2020

I'll wait on Konst's opinion on the need for rewriting the entire SYCL kernel code generation piece.
A full rewrite is beyond the scope of this change.

Beyond scope, yes, but blocking in my opinion. Necessitating the array-in-a-struct-in-a-struct seems like a long term decision we're making for short-term reasons. I wouldn't want us locked into that implementation without a better reason than, "it was too much work".

@elizabethandrews
Copy link
Contributor

A refactor was also suggested by @Fznamznon when reviewing another patch I was working on to replace the current field based copy with a single memcopy from kernel object. If I understand all of this correctly, if I could get that patch to work, it would remove the struct within struct wrapping happening here since the memcopy would handle it. Currently that patch still has a couple of run time and performance issues - possibly due to mishandling of pointers which I am currently investigating.

@rdeodhar
Copy link
Contributor Author

rdeodhar commented Apr 7, 2020

I'll wait on Konst's opinion on the need for rewriting the entire SYCL kernel code generation piece.
A full rewrite is beyond the scope of this change.

Beyond scope, yes, but blocking in my opinion. Necessitating the array-in-a-struct-in-a-struct seems like a long term decision we're making for short-term reasons. I wouldn't want us locked into that implementation without a better reason than, "it was too much work".

I can offer the array-processing code as a preliminary implementation for the clang team to incorporate into a full rewrite of SYCL kernel processing.

@erichkeane
Copy link
Contributor

I'll wait on Konst's opinion on the need for rewriting the entire SYCL kernel code generation piece.
A full rewrite is beyond the scope of this change.

Beyond scope, yes, but blocking in my opinion. Necessitating the array-in-a-struct-in-a-struct seems like a long term decision we're making for short-term reasons. I wouldn't want us locked into that implementation without a better reason than, "it was too much work".

I can offer the array-processing code as a preliminary implementation for the clang team to incorporate into a full rewrite of SYCL kernel processing.

It seems that the right approach is very much to just refactor this patch on top of @elizabethandrews patch above and at least fix the double-struct problem.

It would be one thing to accept a patch against code that needs a refactor. Its a complete different problem that you had to do a sub-optimal hack implementation because of it. At that point, the refactor becomes blocking.

@kbobrovs
Copy link
Contributor

kbobrovs commented Apr 7, 2020

@rdeodhar, @erichkeane, I agree some refactoring is really needed, as the code seems to be approaching non-maintainable state. And it was needed even before this patch, as the giant lambdas have existed before, maybe this patch just made this more obvious and triggered this discussion.
Some stats: CreateOpenCLKernelBody function is now 360 lines, buildArgTys approaching 200.

so I'd prefer that any that this patch touches that are more than ~5-10 lines need to be their own function

I agree. I'd say lambdas in this source are being misused, not only in this patch.

So TODOs that I see:

  • (Design) we need to decide if double wrapping is really needed or not. In my opinion, this boils down to 2 questions
    • whether it is possible to create AST for an argument of bare (unwrapped) array type
    • whether array type kernel argument can be represented in SPIRV
      If the answer is yes to both, then we don't need to wrap array. Otherwise we do. Opinions?
  • refactor the pre-patch state to replace huge lambdas with functions
  • refactor Rajiv's patch atop the above
  • (documentation) move details describing existing design into the existing design doc

Overall supporting arrays seems to be a great usability feature also making SYCL more consistent with C++.

@kbobrovs
Copy link
Contributor

kbobrovs commented Apr 7, 2020

@rdeodhar , please remember not to force-push updates as this will destroy discussion history. Multiple commits atop are OK - they will be squashed before merge anyway.

@erichkeane
Copy link
Contributor

@rdeodhar, @erichkeane, I agree some refactoring is really needed, as the code seems to be approaching non-maintainable state. And it was needed even before this patch, as the giant lambdas have existed before, maybe this patch just made this more obvious and triggered this discussion.

I'd nix "approaching" :)

Some stats: CreateOpenCLKernelBody function is now 360 lines, buildArgTys approaching 200.

so I'd prefer that any that this patch touches that are more than ~5-10 lines need to be their own function

I agree. I'd say lambdas in this source are being misused, not only in this patch.

So TODOs that I see:

* (Design) we need to decide if double wrapping is really needed or not. In my opinion, this boils down to 2 questions
  
  * whether it is possible to create AST for an argument of bare (unwrapped) array type

The doc justifies this as " This was done simply to preserve
the interface to an existing function (CreateAndAddPrmDsc) which
processes each kernel caller parameter as a capture object member. "

If thats our only justification, it seems improper.

  * whether array type kernel argument can be represented in SPIRV
    If the answer is yes to both, then we don't need to wrap array. Otherwise we do. Opinions?

* refactor the pre-patch state to replace huge lambdas with functions

There is more to it than just the lambdas I think, so the refactor should probably be somewhat intense. But moving the lambdas is a good start. IMO, there are 3 or so re-implementations of revisiting types and doing almost exactly the same thing, I would hope we could come up with a reasonable approach to maximize code reuse here.

* refactor Rajiv's patch atop the above

* (documentation) move details describing existing design into the existing design doc

Overall supporting arrays seems to be a great usability feature also making SYCL more consistent with C++.

Otherwise, I agree as you've written.

@intel intel deleted a comment from rdeodhar Apr 7, 2020
@rdeodhar
Copy link
Contributor Author

rdeodhar commented Apr 8, 2020

I have incorporated all of Konst's comments in the design document.
On the implementation I will wait for the proposed restructuring of the kernel generation code to be completed and will then apply my patch to the new code. So no changes are being made to SemaSYCL.cpp for now.

@rdeodhar rdeodhar closed this Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ICE when array used in Functor
10 participants