Skip to content

[SYCL] Implement SYCL 2020 spec functionality: no propagation from functions to the caller #3836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

smanna12
Copy link
Contributor

@smanna12 smanna12 commented May 27, 2021

In SYCL 1.2.1 spec, the attributes get propagated from device functions to a kernel.
The SYCL 2020 requirement mandating the avoidance of the propagation of all kernel attributes to the caller when used on a function.

Attributes that should not be propagated from device functions to a kernel to match with new SYCL 2020 spec.
1. scheduler_target_fmax_mhz
2. kernel_args_restrict
3. no_global_work_offset
4. max-work-group-size
5. max-global-work-dim
6. num-simd-work-items
7. reqd-sub-group-size
8. reqd-work-group-size
9. named_sub_group_size
10. sycl_explicit_simd

This patch
i. keeps the SYCL 1.2.1 spec functionality and propagates the attributes with the older SYCL mode(-sycl-std=2017)
ii. adds diagnostic for ignored attribute for attribute spelling [[intel:: named_sub_group_size()]] with earlier
version of SYCL mode (-sycl-std=2017) since this attribute follows the SYCL 2020 Attribute Rules.
iii. adds or updates tests to validate the propagating behavior with SYCL 2020 and SYCL 2017 modes.

Signed-off-by: Soumi Manna [email protected]

smanna12 added 2 commits May 27, 2021 09:48
In SYCL 1.2.1 spec, the attributes get propagated from device functions to a kernel.
The SYCL 2020 requirement mandating the avoidance of the propagation of all kernel attributes to the caller when used on a function.

Attributes that should not be propagated from device functions to a kernel to match with new SYCL 2020 spec.
     1.	scheduler_target_fmax_mhz
     2.	kernel_args_restrict
     3.	no_global_work_offset
     4.	max-work-group-size
     5.	max-global-work-dim
     6.	num-simd-work-items
     7.	reqd-sub-group-size
     8.	reqd-work-group-size
     9.	named_sub_group_size
     10. sycl_explicit_simd

This patch
     i. keeps the SYCL 1.2.1 spec functionality and propagates the attributes with the older SYCL mode(-sycl-std=2017)
     ii. adds diagnostic for ignored attribute for attribute spelling [[intel:: named_sub_group_size()]] with earlier
         version of SYCL mode (-sycl-std=2017) since this attribute follows the SYCL 2020 Attribute Rules.
     iii. adds or updates tests to validate the propagating behavior with SYCL 2020 and SYCL 2017 modes.

Signed-off-by: Soumi Manna <[email protected]>
@smanna12 smanna12 changed the title [SYCL] feature [SYCL] Implement SYCL 2020 spec functionality: no propagation from functions to the caller May 27, 2021
@smanna12 smanna12 marked this pull request as ready for review May 27, 2021 17:16
@smanna12
Copy link
Contributor Author

smanna12 commented May 27, 2021

PR has been created for Pre-commit test failure: intel/llvm-test-suite#297

Failed Tests (1):
SYCL :: Basic/parallel_for_range.cpp

Signed-off-by: Soumi Manna <[email protected]>
Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in SYCL 1.2.1 6.7 that the attributes are propagated to the kernel. I don't see any mention of propagating any attributes to kernels in SYCL 2020. So am I correct that in SYCL 2020 mode, no attribute should be propagated to the kernel?

If that's correct, do we still need to maintain the explicit list of attributes to not propagate in SYCL 2020 mode? It'd be nice to drop that entire giant isa<> if possible.

Comment on lines +364 to +378
} else {
// Attributes that should not be propagated from device functions to a
// kernel in SYCL 2020.
if (DirectlyCalled) {
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) {
return isa<
SYCLIntelFPGAMaxConcurrencyAttr,
SYCLIntelFPGADisableLoopPipeliningAttr, SYCLSimdAttr,
SYCLIntelKernelArgsRestrictAttr, ReqdWorkGroupSizeAttr,
SYCLIntelNumSimdWorkItemsAttr, SYCLIntelSchedulerTargetFmaxMhzAttr,
SYCLIntelNoGlobalWorkOffsetAttr, SYCLIntelMaxWorkGroupSizeAttr,
IntelReqdSubGroupSizeAttr, SYCLIntelMaxGlobalWorkDimAttr,
IntelNamedSubGroupSizeAttr, SYCLIntelFPGAInitiationIntervalAttr>(A);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can get rid of this entire branch because in 2020 mode, it seems like none of the attributes propagate anyway (or did I get that wrong)?

Copy link
Contributor

@elizabethandrews elizabethandrews Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we can simplify this code somehow but this branch collects attributes to be applied to device functions and kernel itself. The attributes applied to kernel still need to be collected and applied irrespective of SYCL version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @AaronBallman. i was wrong. We need this entire else branch in SYCL2020 mode since we still need to copy the attributes for DirectlyCalled = TRUE. I have no better idea about how we can avoid duplicating the codes here.

Copy link
Contributor

@AaronBallman AaronBallman Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to think more and more that it's past time to table generate this logic rather than continuing to struggle to maintain these lists manually. Do I understand correctly that we really have one collection need with three (maybe four) modes: never propagate, always propagate, only propagate when directly called (possibly with a diagnostic), but the mode may differ based on the language options? If so, we could perhaps try to design a tablegen feature to do this. e.g., something along the lines of

>>let SYCLKernelPropagationBehavior = [SYCLKernelPropMode<SYCL2020, NeverPropagate>, // SYCL 2020 behavior
                                     SYCLKernelPropMode<SYCL2017, AlwaysPropagate>] // SYCL 2017 behavior

(Where SYCL2020 and SYCL2017 are new LangOpt definitions we add to Attr.td and the *Propagate are enumerations we define.)

Thinking out loud: we'd generate a function named static bool isAttributeCollected(const Attr *A, const LanguageOptions &LangOpts, bool IsDirectlyCalled); that returns whether an attribute should be collected or not. We'd have to collect all of the SYCLKernelPropagationBehavior objects in Attr.td so that we could group the language mode checks together in the resulting generated file. What we'd generate would effectively look like:

>>if (LangOpts.getSYCLVersion() == SYCL_2017) {
  if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2017, propagate if directly called
    return true;
  if (isa<large generated list of attributes here>(A)) // SYCL 2017, always propagate
    return true;
>>}
>>if (LangOpts.getSYCLVersion() >= SYCL_2020) {
  if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2020, propagate if directly called
    return true;
  if (isa<large generated list of attributes here>(A)) // SYCL 2020, always propagate
    return true;
}
return false;

where each of the generated lists of attributes in the isa<> checks are based off the propagation enumeration from Attr.td.

If we want to include diagnostics in the logic, I think we'd return an enumeration rather than a boolean and let the caller figure out what diagnostic to emit, whether to drop the attribute, etc. But given that we only have one of those, we may just want to handle that case specially.

Then, collectSYCLAttributes() will defer most of the logic to the generated isAttributeCollected(), but can still house any custom logic we need (like for diagnostics).

WDYT? (Note, there may be tweaks needed to the idea -- this was designed somewhat off-the-cuff, so if you have a better idea of how to express this in Attr.td, we should definitely explore it.)

As for whether this is a separate task or done as part of this one... I'm on the fence. It's a bit separable, but at the same time, it'd implement the main point to this review so it seems reasonable to just do it here.

Thanks @AaronBallman for the tablegen design. I agree with you that it seems reasonable to do this here. I did not have a chance to look into the new design yesterday. I will take a look at this today and will follow up with you about this for any question.

Comment on lines 71 to 72
// CHECK-LABEL: FunctionDecl {{.*}}test_kernel9
// CHECK-NOT: SYCLIntelNoGlobalWorkOffsetAttr {{.*}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is calling FileCheck in the RUN lines, so this isn't actually being tested.

Note: I'd like to see the AST node tests start being separated from the diagnostic tests. There's an AST directory where those should typically live, and it means we don't have to work around diagnostic tests that also check errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue seems to apply in other tests as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have updated Sema tests so that FileCheck can be tested in the RUN lines with both SYCL 2020 and SYCl 2017 modes.

@smanna12
Copy link
Contributor Author

smanna12 commented Jun 1, 2021

I see in SYCL 1.2.1 6.7 that the attributes are propagated to the kernel. I don't see any mention of propagating any attributes to kernels in SYCL 2020. So am I correct that in SYCL 2020 mode, no attribute should be propagated to the kernel?

Thanks @AaronBallman for the reviews.

https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:what-changed-between
According to SYCL 2020 spec:
Kernel attributes have been better described and are now applied to the function type of a kernel function, which allows them to be applied directly to lambdas. This means that propagation of the attribute from a function to the calling kernel is no longer required, and attributes are instead applied directly to the kernel function that they impact.

If that's correct, do we still need to maintain the explicit list of attributes to not propagate in SYCL 2020 mode? It'd be nice to drop that entire giant isa<> if possible.

Yes, i think we can drop this entirely.

@AaronBallman
Copy link
Contributor

If that's correct, do we still need to maintain the explicit list of attributes to not propagate in SYCL 2020 mode? It'd be nice to drop that entire giant isa<> if possible.

Yes, i think we can drop this entirely.

Thank you for verifying! I hope this will simplify the code nicely.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked the tests yet.

Comment on lines +364 to +378
} else {
// Attributes that should not be propagated from device functions to a
// kernel in SYCL 2020.
if (DirectlyCalled) {
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) {
return isa<
SYCLIntelFPGAMaxConcurrencyAttr,
SYCLIntelFPGADisableLoopPipeliningAttr, SYCLSimdAttr,
SYCLIntelKernelArgsRestrictAttr, ReqdWorkGroupSizeAttr,
SYCLIntelNumSimdWorkItemsAttr, SYCLIntelSchedulerTargetFmaxMhzAttr,
SYCLIntelNoGlobalWorkOffsetAttr, SYCLIntelMaxWorkGroupSizeAttr,
IntelReqdSubGroupSizeAttr, SYCLIntelMaxGlobalWorkDimAttr,
IntelNamedSubGroupSizeAttr, SYCLIntelFPGAInitiationIntervalAttr>(A);
});
}
Copy link
Contributor

@elizabethandrews elizabethandrews Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we can simplify this code somehow but this branch collects attributes to be applied to device functions and kernel itself. The attributes applied to kernel still need to be collected and applied irrespective of SYCL version.

// Attributes that should not be propagated from device functions to a kernel.
if (DirectlyCalled) {
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) {
return isa<SYCLIntelLoopFuseAttr, SYCLIntelFPGAMaxConcurrencyAttr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be added to the DirectlyCalled list in L362?

Copy link
Contributor Author

@smanna12 smanna12 Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attributes directly apply on kernel functor/lambda in SYCL2020 modes, so i did not add them in L362.
SYCLIntelFPGAMaxConcurrencyAttr,
SYCLIntelFPGADisableLoopPipeliningAttr,
SYCLIntelFPGAInitiationIntervalAttr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes are available prior to SYCL2020 right? Shouldn't they apply for earlier versions as well? I think this patch changes existing behavior for these attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes are available prior to SYCL2020 right? Shouldn't they apply for earlier versions as well? I think this patch changes existing behavior for these attributes.

All attributes were added recently. they were added after SYCl2020 spec release. I think they should not apply in SYCL2017 modes.

#3388
#3441

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how its expected to work but unless the extensions these attributes support are limited to SYCL2020, or these attributes are documented to work only in SYCL 2020, we probably should not be changing this behavior for earlier versions of SYCL. @AaronBallman please let us know your thoughts here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if these attributes are not supported for SYCL 2017, shouldn't we be diagnosing it?

The diagnostic seems reasonable to me.

Copy link
Contributor

@elizabethandrews elizabethandrews Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wait for @AaronBallman's input. In my opinion it is confusing/strange to have individual attributes behave differently in different versions of SYCL spec, but I guess we are doing that with this change anyway. I guess the question is more - should we change existing behavior for these attributes in SYCL 2017

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it is confusing/strange to have individual attributes behave differently in different versions of SYCL spec,

I agree with you, @elizabethandrews.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delayed response -- having power issues at the house.

I'm not sure how its expected to work but unless the extensions these attributes support are limited to SYCL2020, or these attributes are documented to work only in SYCL 2020, we probably should not be changing this behavior for earlier versions of SYCL. @AaronBallman please let us know your thoughts here.

Agreed. I think the route we want to go is:

  • When a new attribute is added to the intel namespace, we support it in all the language modes where it has valid semantics. The semantics of the attribute can do whatever is most sensible for that given language mode, but once the semantics are set and the attribute has been released in the wild, the semantics should not change except to become more permissive (e.g., we shouldn't change the semantics such that code breaks, but we should be fine to allow the attribute to be used in ways that used to produce an error).
  • When a new attribute is added to the sycl namespace, we support it in all the language modes where it has valid semantics, but we diagnose use of a new feature in the older modes as an extension. The semantics of the attribute have to follow what's specified by the SYCL spec. If we think some semantics are going to cause implementation concerns for us, we need to talk to the SYCL spec authors about how to resolve it on a case-by-case basis.
    ** Note: community typically also adds a compatibility warning in the newer mode so people who want their code to remain compatible with older language standards can do so. If we didn't support SYCL 1.2.1, we could skip this diagnostic, but from talks with @kbsmith-intel, it sounds like SYCL 1.2.1 support is still mandatory and so the future compat warning should also be added.
  • When the attribute does not have valid semantics in a given mode (regardless of what vendor namespace the attribute is in), we should ignore the attribute with a diagnostic to let the user know it's being ignored. The only exception to this rule are SYCL attributes that are ignored in host mode but not device mode.

I guess the question is more - should we change existing behavior for these attributes in SYCL 2017

I don't think we should change existing behaviors -- that runs too much risk of silently breaking user code. However, it's also not clear to much just how much implementation effort it is to retain the old behavior in each case and whether the old behavior was sensible or not in SYCL 1.2.1. My reading of the 1.2.1 spec suggests that only the vec_type_hint, work_group_size_hint, and reqd_work_group_size attributes are propagated and none of the rest of them are. I get this from (emphasis added by me for clarity):

The vec_type_hint, work_group_size_hint and reqd_work_group_size kernel attributes in OpenCL C
apply to kernel functions, but this is not syntactically possible in SYCL. In SYCL, these attributes are legal on
device functions and their specification is propagated down to any caller of those device functions, such that the
kernel attributes are the sum of all the kernel attributes of all device functions called.

That said, I have no idea if this is an accurate understanding of the SYCL spec.

Copy link
Contributor Author

@smanna12 smanna12 Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AaronBallman and @elizabethandrews. I have added the attributes below in SYCL 1.2.1 mode. The semantic is same in all modes.

SYCLIntelFPGAMaxConcurrencyAttr,
SYCLIntelFPGADisableLoopPipeliningAttr,
SYCLIntelFPGAInitiationIntervalAttr

@@ -2443,8 +2447,9 @@ lambda capture, or function object member, of the callable to which the
attribute was applied. This effect is equivalent to annotating restrict on
**all** kernel pointer arguments in an OpenCL or SPIR-V kernel.

If ``intel::kernel_args_restrict`` is applied to a function called from a device
kernel, the attribute is not ignored and it is propagated to the kernel.
In SYCL 1.2.1 mode, the ``intel::kernel_args_restrict`` attribute is propagated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this risk breaking code, or was the old documentation wrong in SYCL 2020 mode? Same question applies to the other documentation instances where we go from always propagating to sometimes propagating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this risk breaking code, or was the old documentation wrong in SYCL 2020 mode? Same question applies to the other documentation instances where we go from always propagating to sometimes propagating.

Not sure i understand your question correctly.

The current PR changes the existing behavior. Only breaking part happens here - propagation with SYCL 1.2.1 and no propagation with SYCL 2020 mode when the attribute is applied to a function from a device kernel. so old documentation was wrong in SYCL 2020 mode.

Comment on lines +3373 to +3374
// If the [[intel::named_sub_group_size]] attribute spelling is used in
// SYCL 2017 mode, we want to diagnose it as being an ignored attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docs may need to be updated for this attribute, as they currently imply there's a mode other than SYCL 2020 mode: https://github.com/intel/llvm/blob/sycl/clang/include/clang/Basic/AttrDocs.td#L4613

Also, this should be expressed in Attr.td in a LangOpts clause.

Comment on lines +364 to +378
} else {
// Attributes that should not be propagated from device functions to a
// kernel in SYCL 2020.
if (DirectlyCalled) {
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) {
return isa<
SYCLIntelFPGAMaxConcurrencyAttr,
SYCLIntelFPGADisableLoopPipeliningAttr, SYCLSimdAttr,
SYCLIntelKernelArgsRestrictAttr, ReqdWorkGroupSizeAttr,
SYCLIntelNumSimdWorkItemsAttr, SYCLIntelSchedulerTargetFmaxMhzAttr,
SYCLIntelNoGlobalWorkOffsetAttr, SYCLIntelMaxWorkGroupSizeAttr,
IntelReqdSubGroupSizeAttr, SYCLIntelMaxGlobalWorkDimAttr,
IntelNamedSubGroupSizeAttr, SYCLIntelFPGAInitiationIntervalAttr>(A);
});
}
Copy link
Contributor

@AaronBallman AaronBallman Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to think more and more that it's past time to table generate this logic rather than continuing to struggle to maintain these lists manually. Do I understand correctly that we really have one collection need with three (maybe four) modes: never propagate, always propagate, only propagate when directly called (possibly with a diagnostic), but the mode may differ based on the language options? If so, we could perhaps try to design a tablegen feature to do this. e.g., something along the lines of

>>let SYCLKernelPropagationBehavior = [SYCLKernelPropMode<SYCL2020, NeverPropagate>, // SYCL 2020 behavior
                                     SYCLKernelPropMode<SYCL2017, AlwaysPropagate>] // SYCL 2017 behavior

(Where SYCL2020 and SYCL2017 are new LangOpt definitions we add to Attr.td and the *Propagate are enumerations we define.)

Thinking out loud: we'd generate a function named static bool isAttributeCollected(const Attr *A, const LanguageOptions &LangOpts, bool IsDirectlyCalled); that returns whether an attribute should be collected or not. We'd have to collect all of the SYCLKernelPropagationBehavior objects in Attr.td so that we could group the language mode checks together in the resulting generated file. What we'd generate would effectively look like:

>>if (LangOpts.getSYCLVersion() == SYCL_2017) {
  if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2017, propagate if directly called
    return true;
  if (isa<large generated list of attributes here>(A)) // SYCL 2017, always propagate
    return true;
>>}
>>if (LangOpts.getSYCLVersion() >= SYCL_2020) {
  if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2020, propagate if directly called
    return true;
  if (isa<large generated list of attributes here>(A)) // SYCL 2020, always propagate
    return true;
}
return false;

where each of the generated lists of attributes in the isa<> checks are based off the propagation enumeration from Attr.td.

If we want to include diagnostics in the logic, I think we'd return an enumeration rather than a boolean and let the caller figure out what diagnostic to emit, whether to drop the attribute, etc. But given that we only have one of those, we may just want to handle that case specially.

Then, collectSYCLAttributes() will defer most of the logic to the generated isAttributeCollected(), but can still house any custom logic we need (like for diagnostics).

WDYT? (Note, there may be tweaks needed to the idea -- this was designed somewhat off-the-cuff, so if you have a better idea of how to express this in Attr.td, we should definitely explore it.)

As for whether this is a separate task or done as part of this one... I'm on the fence. It's a bit separable, but at the same time, it'd implement the main point to this review so it seems reasonable to just do it here.

Thanks @AaronBallman for the tablegen design. I agree with you that it seems reasonable to do this here. I did not have a chance to look into the new design yesterday. I will take a look at this today and will follow up with you about this for any question.

@elizabethandrews
Copy link
Contributor

Do I understand correctly that we really have one collection need with three (maybe four) modes: never propagate, always propagate, only propagate when directly called (possibly with a diagnostic), but the mode may differ based on the language options?

My understanding is we have 2 broad classifications in SYCL2017-

  1. Attributes which need to be applied to kernel if applied to any device function in call graph
  2. Attributes which don't need to be propagated from device functions to kernel. These attributes can however be applied to kernel itself if required (i.e. DirectlyCalled).

In SYCL2020 point 1. is moot. Point 2 however is still relevant.

The concept of DirectlyCalled confuses the issue a bit right now because it gets confused with 'propagation'. We're not really propagating DirectlyCalled attributes from device functions to kernel. These technically are attributes which are applied to kernel itself

@smanna12
Copy link
Contributor Author

smanna12 commented Jun 2, 2021

My understanding is we have 2 broad classifications in SYCL2017-

  1. Attributes which need to be applied to kernel if applied to any device function in call graph
  2. Attributes which don't need to be propagated from device functions to kernel. These attributes can however be applied to kernel itself if required (i.e. DirectlyCalled).

Agreed.

In SYCL2020 point 1. is moot. Point 2 however is still relevant.

if (LangOpts.getSYCLVersion() >= SYCL_2020) {
  if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2020, propagate if directly called
    return true; --> This is important. 

  if (isa<large generated list of attributes here>(A)) // SYCL 2020, always propagate
    return true; ---> _**Do we need Point 2? **SYCL 2020 does not allow any propagation from device function to kernel
}**_**
return false;

The concept of DirectlyCalled confuses the issue a bit right now because it gets confused with 'propagation'. We're not really propagating DirectlyCalled attributes from device functions to kernel. These technically are attributes which are applied to kernel itself

Yes, Agreed.

@AaronBallman
Copy link
Contributor

Do I understand correctly that we really have one collection need with three (maybe four) modes: never propagate, always propagate, only propagate when directly called (possibly with a diagnostic), but the mode may differ based on the language options?

My understanding is we have 2 broad classifications in SYCL2017-

  1. Attributes which need to be applied to kernel if applied to any device function in call graph
  2. Attributes which don't need to be propagated from device functions to kernel. These attributes can however be applied to kernel itself if required (i.e. DirectlyCalled).

In SYCL2020 point 1. is moot. Point 2 however is still relevant.

The concept of DirectlyCalled confuses the issue a bit right now because it gets confused with 'propagation'. We're not really propagating DirectlyCalled attributes from device functions to kernel. These technically are attributes which are applied to kernel itself

Excuse my being stupid, but I still don't understand the point to DirectlyCalled. If the attributes have been applied to the kernel itself, why do they need to be collected at all (they're already collected on the kernel AST node, are they not)? The code in VisitCallNode() calls collectSYCLAttributes() with DirectlyCalled being both true and false and collects into the same container, so I'm still rather confused.

@elizabethandrews
Copy link
Contributor

If the attributes have been applied to the kernel itself, why do they need to be collected at all (they're already collected on the kernel AST node, are they not)?

I don't think so.

E.g.

User Defined SYCL kernel -

template <int N>
class KernelFunctor2 {
public:
  [[intel::loop_fuse(N)]] void operator()() const {
  }
};

SYCL Kernel Call -

void foo() {
    KernelFunctor2<3> f2;
    kernel_single_task<class kernel_name_2>(f2);
}

AST node for this kernel however does not contain the attribute. I do not know why. Maybe because it is applied to operator method( ) ?


FunctionDecl 0xf8b5e28 <test.cpp:9:7, <invalid sloc>> col:7 _ZTSZ3foovE13kernel_name_2 'void ()'
|-CompoundStmt 0xf8ba9e0 <<invalid sloc>>
| |-DeclStmt 0xf8b6048 <line:4:6>
| | `-VarDecl 0xf8b5fa0 <line:9:7, line:4:6> line:9:7 used KernelFunctor2 'KernelFunctor2<3>' cinit
| |   `-InitListExpr 0xf8b6008 <line:4:6> 'KernelFunctor2<3>'
| `-CompoundStmt 0xf8b6130 <line:5:1, line:6:1>
|   `-CXXOperatorCallExpr 0xf8b6100 <line:5:3, col:14> 'void' '()'
|     |-ImplicitCastExpr 0xf8b60d0 <col:13, col:14> 'void (*)() const' <FunctionToPointerDecay>
|     | `-DeclRefExpr 0xf8b60b0 <col:13, col:14> 'void () const' lvalue CXXMethod 0xf8af088 'operator()' 'void () const'
|     `-ImplicitCastExpr 0xf8b60e8 <col:3> 'const KernelFunctor2<3>' lvalue <NoOp>
|       `-DeclRefExpr 0xf8b6090 <col:3> 'KernelFunctor2<3>' lvalue Var 0xf8b5fa0 'KernelFunctor2' 'KernelFunctor2<3>'
|-OpenCLKernelAttr 0xf8b5ec8 <<invalid sloc>> Implicit
|-AsmLabelAttr 0xf8b5f20 <<invalid sloc>> Implicit "_ZTSZ3foovE13kernel_name_2"
|-ArtificialAttr 0xf8b5f78 <<invalid sloc>> Implicit
`-SYCLKernelAttr 0xf8baa00 <<invalid sloc>> Implicit

AST after we collect attributes -


`-FunctionDecl 0xf8b5e28 <line:9:7, <invalid sloc>> col:7 _ZTSZ3foovE13kernel_name_2 'void ()'
  |-CompoundStmt 0xf8ba9e0 <<invalid sloc>>
  | |-DeclStmt 0xf8b6048 <line:4:6>
  | | `-VarDecl 0xf8b5fa0 <line:9:7, line:4:6> line:9:7 used KernelFunctor2 'KernelFunctor2<3>' cinit
  | |   `-InitListExpr 0xf8b6008 <line:4:6> 'KernelFunctor2<3>'
  | `-CompoundStmt 0xf8b6130 <line:5:1, line:6:1>
  |   `-CXXOperatorCallExpr 0xf8b6100 <line:5:3, col:14> 'void' '()'
  |     |-ImplicitCastExpr 0xf8b60d0 <col:13, col:14> 'void (*)() const' <FunctionToPointerDecay>
  |     | `-DeclRefExpr 0xf8b60b0 <col:13, col:14> 'void () const' lvalue CXXMethod 0xf8af088 'operator()' 'void () const'
  |     `-ImplicitCastExpr 0xf8b60e8 <col:3> 'const KernelFunctor2<3>' lvalue <NoOp>
  |       `-DeclRefExpr 0xf8b6090 <col:3> 'KernelFunctor2<3>' lvalue Var 0xf8b5fa0 'KernelFunctor2' 'KernelFunctor2<3>'
  |-OpenCLKernelAttr 0xf8b5ec8 <<invalid sloc>> Implicit
  |-AsmLabelAttr 0xf8b5f20 <<invalid sloc>> Implicit "_ZTSZ3foovE13kernel_name_2"
  |-ArtificialAttr 0xf8b5f78 <<invalid sloc>> Implicit
  |-SYCLKernelAttr 0xf8baa00 <<invalid sloc>> Implicit
  `-SYCLIntelLoopFuseAttr 0xf8af198 <line:11:5, col:23> loop_fuse
    `-ConstantExpr 0xf8af178 <col:22> 'int'
      |-value: Int 3
      `-SubstNonTypeTemplateParmExpr 0xf8af158 <col:22> 'int'
        |-NonTypeTemplateParmDecl 0xf8ae630 <line:8:11, col:15> col:15 referenced 'int' depth 0 index 0 N
        `-IntegerLiteral 0xf8af138 <line:11:22> 'int' 3

@AaronBallman
Copy link
Contributor

If the attributes have been applied to the kernel itself, why do they need to be collected at all (they're already collected on the kernel AST node, are they not)?

I don't think so.

E.g.

User Defined SYCL kernel -

template <int N>
class KernelFunctor2 {
public:
  [[intel::loop_fuse(N)]] void operator()() const {
  }
};

SYCL Kernel Call -

void foo() {
    KernelFunctor2<3> f2;
    kernel_single_task<class kernel_name_2>(f2);
}

AST node for this kernel however does not contain the attribute. I do not know why. Maybe because it is applied to operator method( ) ?


FunctionDecl 0xf8b5e28 <test.cpp:9:7, <invalid sloc>> col:7 _ZTSZ3foovE13kernel_name_2 'void ()'
|-CompoundStmt 0xf8ba9e0 <<invalid sloc>>
| |-DeclStmt 0xf8b6048 <line:4:6>
| | `-VarDecl 0xf8b5fa0 <line:9:7, line:4:6> line:9:7 used KernelFunctor2 'KernelFunctor2<3>' cinit
| |   `-InitListExpr 0xf8b6008 <line:4:6> 'KernelFunctor2<3>'
| `-CompoundStmt 0xf8b6130 <line:5:1, line:6:1>
|   `-CXXOperatorCallExpr 0xf8b6100 <line:5:3, col:14> 'void' '()'
|     |-ImplicitCastExpr 0xf8b60d0 <col:13, col:14> 'void (*)() const' <FunctionToPointerDecay>
|     | `-DeclRefExpr 0xf8b60b0 <col:13, col:14> 'void () const' lvalue CXXMethod 0xf8af088 'operator()' 'void () const'
|     `-ImplicitCastExpr 0xf8b60e8 <col:3> 'const KernelFunctor2<3>' lvalue <NoOp>
|       `-DeclRefExpr 0xf8b6090 <col:3> 'KernelFunctor2<3>' lvalue Var 0xf8b5fa0 'KernelFunctor2' 'KernelFunctor2<3>'
|-OpenCLKernelAttr 0xf8b5ec8 <<invalid sloc>> Implicit
|-AsmLabelAttr 0xf8b5f20 <<invalid sloc>> Implicit "_ZTSZ3foovE13kernel_name_2"
|-ArtificialAttr 0xf8b5f78 <<invalid sloc>> Implicit
`-SYCLKernelAttr 0xf8baa00 <<invalid sloc>> Implicit

AST after we collect attributes -


`-FunctionDecl 0xf8b5e28 <line:9:7, <invalid sloc>> col:7 _ZTSZ3foovE13kernel_name_2 'void ()'
  |-CompoundStmt 0xf8ba9e0 <<invalid sloc>>
  | |-DeclStmt 0xf8b6048 <line:4:6>
  | | `-VarDecl 0xf8b5fa0 <line:9:7, line:4:6> line:9:7 used KernelFunctor2 'KernelFunctor2<3>' cinit
  | |   `-InitListExpr 0xf8b6008 <line:4:6> 'KernelFunctor2<3>'
  | `-CompoundStmt 0xf8b6130 <line:5:1, line:6:1>
  |   `-CXXOperatorCallExpr 0xf8b6100 <line:5:3, col:14> 'void' '()'
  |     |-ImplicitCastExpr 0xf8b60d0 <col:13, col:14> 'void (*)() const' <FunctionToPointerDecay>
  |     | `-DeclRefExpr 0xf8b60b0 <col:13, col:14> 'void () const' lvalue CXXMethod 0xf8af088 'operator()' 'void () const'
  |     `-ImplicitCastExpr 0xf8b60e8 <col:3> 'const KernelFunctor2<3>' lvalue <NoOp>
  |       `-DeclRefExpr 0xf8b6090 <col:3> 'KernelFunctor2<3>' lvalue Var 0xf8b5fa0 'KernelFunctor2' 'KernelFunctor2<3>'
  |-OpenCLKernelAttr 0xf8b5ec8 <<invalid sloc>> Implicit
  |-AsmLabelAttr 0xf8b5f20 <<invalid sloc>> Implicit "_ZTSZ3foovE13kernel_name_2"
  |-ArtificialAttr 0xf8b5f78 <<invalid sloc>> Implicit
  |-SYCLKernelAttr 0xf8baa00 <<invalid sloc>> Implicit
  `-SYCLIntelLoopFuseAttr 0xf8af198 <line:11:5, col:23> loop_fuse
    `-ConstantExpr 0xf8af178 <col:22> 'int'
      |-value: Int 3
      `-SubstNonTypeTemplateParmExpr 0xf8af158 <col:22> 'int'
        |-NonTypeTemplateParmDecl 0xf8ae630 <line:8:11, col:15> col:15 referenced 'int' depth 0 index 0 N
        `-IntegerLiteral 0xf8af138 <line:11:22> 'int' 3

I had a good, long conversation on the phone with @erichkeane about this and I think I have my head wrapped around it a little bit better. Thank you to everyone for the discussion!

What I understand now is that this really is propagating the attribute to the opencl-kernel in the DirectlyCalled case. It's taking the attributes from KernelFunctor2<3> (because that's the type of the object passed to kernel_single_task()) and from KernelFunctor::operator() (because that's the code being executed on the device) and adding them onto the generated _ZTSZ3foovE13kernel_name_2 function that runs on the device.

It also sounds like, at least in theory for SYCL 2020, we might want all attributes to be propagated in the DirectlyCalled case because if the user writes an attribute on the function running on the device, they likely expect that to have impact. e.g., if the user adds an optnone attribute to operator() they may rightfully expect no optimizations to be enabled for that function. However, we might not want to do this for two reasons: many attributes have only semantic impact and that will be meaningless because of how late in Sema the device function is generated, and some attributes may have really bizarre codegen behaviors (like, what would a multiversioned device function even mean?).

If this is reasonably accurate, then I think the design basically works, except we basically would never mark anything in Attr.td as SYCLKernelPropMode<SYCL2020, AlwaysPropagate>. The code generation could still produce that behavior if someone wrote this in Attr.td, but the expectation is that we don't have any attributes that need this, they should hopefully only ever use DirectlyPropagate or NeverPropagate.

WDYT?

@elizabethandrews
Copy link
Contributor

If the attributes have been applied to the kernel itself, why do they need to be collected at all (they're already collected on the kernel AST node, are they not)?

I don't think so.
E.g.
User Defined SYCL kernel -

template <int N>
class KernelFunctor2 {
public:
  [[intel::loop_fuse(N)]] void operator()() const {
  }
};

SYCL Kernel Call -

void foo() {
    KernelFunctor2<3> f2;
    kernel_single_task<class kernel_name_2>(f2);
}

AST node for this kernel however does not contain the attribute. I do not know why. Maybe because it is applied to operator method( ) ?


FunctionDecl 0xf8b5e28 <test.cpp:9:7, <invalid sloc>> col:7 _ZTSZ3foovE13kernel_name_2 'void ()'
|-CompoundStmt 0xf8ba9e0 <<invalid sloc>>
| |-DeclStmt 0xf8b6048 <line:4:6>
| | `-VarDecl 0xf8b5fa0 <line:9:7, line:4:6> line:9:7 used KernelFunctor2 'KernelFunctor2<3>' cinit
| |   `-InitListExpr 0xf8b6008 <line:4:6> 'KernelFunctor2<3>'
| `-CompoundStmt 0xf8b6130 <line:5:1, line:6:1>
|   `-CXXOperatorCallExpr 0xf8b6100 <line:5:3, col:14> 'void' '()'
|     |-ImplicitCastExpr 0xf8b60d0 <col:13, col:14> 'void (*)() const' <FunctionToPointerDecay>
|     | `-DeclRefExpr 0xf8b60b0 <col:13, col:14> 'void () const' lvalue CXXMethod 0xf8af088 'operator()' 'void () const'
|     `-ImplicitCastExpr 0xf8b60e8 <col:3> 'const KernelFunctor2<3>' lvalue <NoOp>
|       `-DeclRefExpr 0xf8b6090 <col:3> 'KernelFunctor2<3>' lvalue Var 0xf8b5fa0 'KernelFunctor2' 'KernelFunctor2<3>'
|-OpenCLKernelAttr 0xf8b5ec8 <<invalid sloc>> Implicit
|-AsmLabelAttr 0xf8b5f20 <<invalid sloc>> Implicit "_ZTSZ3foovE13kernel_name_2"
|-ArtificialAttr 0xf8b5f78 <<invalid sloc>> Implicit
`-SYCLKernelAttr 0xf8baa00 <<invalid sloc>> Implicit

AST after we collect attributes -


`-FunctionDecl 0xf8b5e28 <line:9:7, <invalid sloc>> col:7 _ZTSZ3foovE13kernel_name_2 'void ()'
  |-CompoundStmt 0xf8ba9e0 <<invalid sloc>>
  | |-DeclStmt 0xf8b6048 <line:4:6>
  | | `-VarDecl 0xf8b5fa0 <line:9:7, line:4:6> line:9:7 used KernelFunctor2 'KernelFunctor2<3>' cinit
  | |   `-InitListExpr 0xf8b6008 <line:4:6> 'KernelFunctor2<3>'
  | `-CompoundStmt 0xf8b6130 <line:5:1, line:6:1>
  |   `-CXXOperatorCallExpr 0xf8b6100 <line:5:3, col:14> 'void' '()'
  |     |-ImplicitCastExpr 0xf8b60d0 <col:13, col:14> 'void (*)() const' <FunctionToPointerDecay>
  |     | `-DeclRefExpr 0xf8b60b0 <col:13, col:14> 'void () const' lvalue CXXMethod 0xf8af088 'operator()' 'void () const'
  |     `-ImplicitCastExpr 0xf8b60e8 <col:3> 'const KernelFunctor2<3>' lvalue <NoOp>
  |       `-DeclRefExpr 0xf8b6090 <col:3> 'KernelFunctor2<3>' lvalue Var 0xf8b5fa0 'KernelFunctor2' 'KernelFunctor2<3>'
  |-OpenCLKernelAttr 0xf8b5ec8 <<invalid sloc>> Implicit
  |-AsmLabelAttr 0xf8b5f20 <<invalid sloc>> Implicit "_ZTSZ3foovE13kernel_name_2"
  |-ArtificialAttr 0xf8b5f78 <<invalid sloc>> Implicit
  |-SYCLKernelAttr 0xf8baa00 <<invalid sloc>> Implicit
  `-SYCLIntelLoopFuseAttr 0xf8af198 <line:11:5, col:23> loop_fuse
    `-ConstantExpr 0xf8af178 <col:22> 'int'
      |-value: Int 3
      `-SubstNonTypeTemplateParmExpr 0xf8af158 <col:22> 'int'
        |-NonTypeTemplateParmDecl 0xf8ae630 <line:8:11, col:15> col:15 referenced 'int' depth 0 index 0 N
        `-IntegerLiteral 0xf8af138 <line:11:22> 'int' 3

I had a good, long conversation on the phone with @erichkeane about this and I think I have my head wrapped around it a little bit better. Thank you to everyone for the discussion!

What I understand now is that this really is propagating the attribute to the opencl-kernel in the DirectlyCalled case. It's taking the attributes from KernelFunctor2<3> (because that's the type of the object passed to kernel_single_task()) and from KernelFunctor::operator() (because that's the code being executed on the device) and adding them onto the generated _ZTSZ3foovE13kernel_name_2 function that runs on the device.

Yep! Thank you for very succinctly summarizing this!

It also sounds like, at least in theory for SYCL 2020, we might want all attributes to be propagated in the DirectlyCalled case because if the user writes an attribute on the function running on the device, they likely expect that to have impact. e.g., if the user adds an optnone attribute to operator() they may rightfully expect no optimizations to be enabled for that function. However, we might not want to do this for two reasons: many attributes have only semantic impact and that will be meaningless because of how late in Sema the device function is generated, and some attributes may have really bizarre codegen behaviors (like, what would a multiversioned device function even mean?).

This is an interesting point. I would assume only a subset of function attributes should apply to kernel/device functions. We seem to do this for kernel functions by only allowing a certain subset to propagate. For device functions I am not sure. Maybe CodeGen only handles 'allowed attributes' for device functions? I haven't verified this.

If this is reasonably accurate, then I think the design basically works, except we basically would never mark anything in Attr.td as SYCLKernelPropMode<SYCL2020, AlwaysPropagate>. The code generation could still produce that behavior if someone wrote this in Attr.td, but the expectation is that we don't have any attributes that need this, they should hopefully only ever use DirectlyPropagate or NeverPropagate.

WDYT?

I've never actually tinkered with inner workings of TableGen and so I apologize in advance if what I'm about to suggest/ask makes no sense :) Is there a reason why we can't just mark attributes as DirectPropagate and AlwaysPropagate and then generate the list based on this? I don't see the point of NeverPropagate. Any attribute not marked as DirectPropagate or AlwaysPropagate should be 'never propagate' right?

@AaronBallman
Copy link
Contributor

It also sounds like, at least in theory for SYCL 2020, we might want all attributes to be propagated in the DirectlyCalled case because if the user writes an attribute on the function running on the device, they likely expect that to have impact. e.g., if the user adds an optnone attribute to operator() they may rightfully expect no optimizations to be enabled for that function. However, we might not want to do this for two reasons: many attributes have only semantic impact and that will be meaningless because of how late in Sema the device function is generated, and some attributes may have really bizarre codegen behaviors (like, what would a multiversioned device function even mean?).

This is an interesting point. I would assume only a subset of function attributes should apply to kernel/device functions. We seem to do this for kernel functions by only allowing a certain subset to propagate. For device functions I am not sure. Maybe CodeGen only handles 'allowed attributes' for device functions? I haven't verified this.

I'm not certain, to be honest. I agree that I think we only want a subset of function attributes to apply to device functions though, but how we decide what that list is.. no clue. I brought up multiversioning, but what about things like calling convention attributes? Sanitizers? Constructor/Cleanup? I think someone needs to audit what attributes are available for functions, pare out the ones that are only semantic checks we can't utilize, pare out the ones that don't make sense in a kernel, and see what's left. But this adds a new maintenance burden whenever we do a pulldown -- community isn't going to maintain the list of what's valid on a kernel or not, but they add new attributes that may be useful for kernels just the same.

If this is reasonably accurate, then I think the design basically works, except we basically would never mark anything in Attr.td as SYCLKernelPropMode<SYCL2020, AlwaysPropagate>. The code generation could still produce that behavior if someone wrote this in Attr.td, but the expectation is that we don't have any attributes that need this, they should hopefully only ever use DirectlyPropagate or NeverPropagate.
WDYT?

I've never actually tinkered with inner workings of TableGen and so I apologize in advance if what I'm about to suggest/ask makes no sense :) Is there a reason why we can't just mark attributes as DirectPropagate and AlwaysPropagate and then generate the list based on this? I don't see the point of NeverPropagate. Any attribute not marked as DirectPropagate or AlwaysPropagate should be 'never propagate' right?

Definitely not a silly question! I'm kind of on the fence about whether NeverPropagate is actually needed or not. I was thinking we'd need it for a "catch-all" situation. e.g., always propagate in SYCL 1.2.1, direct propagate in SYCL > 2020, and then when SYCL 202X comes out we may need to never propagate if some particular compiler new flag is used. But I think it may be possible to express this sort of thing without needing a never propagate flag.

BTW, I'm sort of ignoring/hand-waving at the issue where the language options conflict. I figure the predicates in the generated file will be listed in source order from Attr.td, so if a later language option conflicts with an earlier one, oh well.

@smanna12 smanna12 marked this pull request as draft June 17, 2021 20:36
@smanna12
Copy link
Contributor Author

smanna12 commented Jul 12, 2021

I have created PR : #4084 for this.

Will address TableGen support separately and tracked here: #4094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants