-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Implement SYCL 2020 spec functionality: no propagation from functions to the caller #3836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In SYCL 1.2.1 spec, the attributes get propagated from device functions to a kernel. The SYCL 2020 requirement mandating the avoidance of the propagation of all kernel attributes to the caller when used on a function. Attributes that should not be propagated from device functions to a kernel to match with new SYCL 2020 spec. 1. scheduler_target_fmax_mhz 2. kernel_args_restrict 3. no_global_work_offset 4. max-work-group-size 5. max-global-work-dim 6. num-simd-work-items 7. reqd-sub-group-size 8. reqd-work-group-size 9. named_sub_group_size 10. sycl_explicit_simd This patch i. keeps the SYCL 1.2.1 spec functionality and propagates the attributes with the older SYCL mode(-sycl-std=2017) ii. adds diagnostic for ignored attribute for attribute spelling [[intel:: named_sub_group_size()]] with earlier version of SYCL mode (-sycl-std=2017) since this attribute follows the SYCL 2020 Attribute Rules. iii. adds or updates tests to validate the propagating behavior with SYCL 2020 and SYCL 2017 modes. Signed-off-by: Soumi Manna <[email protected]>
PR has been created for Pre-commit test failure: intel/llvm-test-suite#297 Failed Tests (1): |
Signed-off-by: Soumi Manna <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in SYCL 1.2.1 6.7 that the attributes are propagated to the kernel. I don't see any mention of propagating any attributes to kernels in SYCL 2020. So am I correct that in SYCL 2020 mode, no attribute should be propagated to the kernel?
If that's correct, do we still need to maintain the explicit list of attributes to not propagate in SYCL 2020 mode? It'd be nice to drop that entire giant isa<>
if possible.
} else { | ||
// Attributes that should not be propagated from device functions to a | ||
// kernel in SYCL 2020. | ||
if (DirectlyCalled) { | ||
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) { | ||
return isa< | ||
SYCLIntelFPGAMaxConcurrencyAttr, | ||
SYCLIntelFPGADisableLoopPipeliningAttr, SYCLSimdAttr, | ||
SYCLIntelKernelArgsRestrictAttr, ReqdWorkGroupSizeAttr, | ||
SYCLIntelNumSimdWorkItemsAttr, SYCLIntelSchedulerTargetFmaxMhzAttr, | ||
SYCLIntelNoGlobalWorkOffsetAttr, SYCLIntelMaxWorkGroupSizeAttr, | ||
IntelReqdSubGroupSizeAttr, SYCLIntelMaxGlobalWorkDimAttr, | ||
IntelNamedSubGroupSizeAttr, SYCLIntelFPGAInitiationIntervalAttr>(A); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can get rid of this entire branch because in 2020 mode, it seems like none of the attributes propagate anyway (or did I get that wrong)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we can simplify this code somehow but this branch collects attributes to be applied to device functions and kernel itself. The attributes applied to kernel still need to be collected and applied irrespective of SYCL version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @AaronBallman. i was wrong. We need this entire else branch in SYCL2020 mode since we still need to copy the attributes for DirectlyCalled = TRUE. I have no better idea about how we can avoid duplicating the codes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to think more and more that it's past time to table generate this logic rather than continuing to struggle to maintain these lists manually. Do I understand correctly that we really have one collection need with three (maybe four) modes: never propagate, always propagate, only propagate when directly called (possibly with a diagnostic), but the mode may differ based on the language options? If so, we could perhaps try to design a tablegen feature to do this. e.g., something along the lines of
>>let SYCLKernelPropagationBehavior = [SYCLKernelPropMode<SYCL2020, NeverPropagate>, // SYCL 2020 behavior
SYCLKernelPropMode<SYCL2017, AlwaysPropagate>] // SYCL 2017 behavior
(Where
SYCL2020
andSYCL2017
are newLangOpt
definitions we add to Attr.td and the*Propagate
are enumerations we define.)
Thinking out loud: we'd generate a function named
static bool isAttributeCollected(const Attr *A, const LanguageOptions &LangOpts, bool IsDirectlyCalled);
that returns whether an attribute should be collected or not. We'd have to collect all of theSYCLKernelPropagationBehavior
objects in Attr.td so that we could group the language mode checks together in the resulting generated file. What we'd generate would effectively look like:
>>if (LangOpts.getSYCLVersion() == SYCL_2017) {
if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2017, propagate if directly called
return true;
if (isa<large generated list of attributes here>(A)) // SYCL 2017, always propagate
return true;
>>}
>>if (LangOpts.getSYCLVersion() >= SYCL_2020) {
if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2020, propagate if directly called
return true;
if (isa<large generated list of attributes here>(A)) // SYCL 2020, always propagate
return true;
}
return false;
where each of the generated lists of attributes in the isa<>
checks are based off the propagation enumeration from Attr.td.
If we want to include diagnostics in the logic, I think we'd return an enumeration rather than a boolean and let the caller figure out what diagnostic to emit, whether to drop the attribute, etc. But given that we only have one of those, we may just want to handle that case specially.
Then,
collectSYCLAttributes()
will defer most of the logic to the generatedisAttributeCollected()
, but can still house any custom logic we need (like for diagnostics).
WDYT? (Note, there may be tweaks needed to the idea -- this was designed somewhat off-the-cuff, so if you have a better idea of how to express this in Attr.td, we should definitely explore it.)
As for whether this is a separate task or done as part of this one... I'm on the fence. It's a bit separable, but at the same time, it'd implement the main point to this review so it seems reasonable to just do it here.
Thanks @AaronBallman for the tablegen design. I agree with you that it seems reasonable to do this here. I did not have a chance to look into the new design yesterday. I will take a look at this today and will follow up with you about this for any question.
// CHECK-LABEL: FunctionDecl {{.*}}test_kernel9 | ||
// CHECK-NOT: SYCLIntelNoGlobalWorkOffsetAttr {{.*}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is calling FileCheck
in the RUN lines, so this isn't actually being tested.
Note: I'd like to see the AST node tests start being separated from the diagnostic tests. There's an AST
directory where those should typically live, and it means we don't have to work around diagnostic tests that also check errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue seems to apply in other tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have updated Sema tests so that FileCheck can be tested in the RUN lines with both SYCL 2020 and SYCl 2017 modes.
Thanks @AaronBallman for the reviews. https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:what-changed-between
Yes, i think we can drop this entirely. |
Thank you for verifying! I hope this will simplify the code nicely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked the tests yet.
} else { | ||
// Attributes that should not be propagated from device functions to a | ||
// kernel in SYCL 2020. | ||
if (DirectlyCalled) { | ||
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) { | ||
return isa< | ||
SYCLIntelFPGAMaxConcurrencyAttr, | ||
SYCLIntelFPGADisableLoopPipeliningAttr, SYCLSimdAttr, | ||
SYCLIntelKernelArgsRestrictAttr, ReqdWorkGroupSizeAttr, | ||
SYCLIntelNumSimdWorkItemsAttr, SYCLIntelSchedulerTargetFmaxMhzAttr, | ||
SYCLIntelNoGlobalWorkOffsetAttr, SYCLIntelMaxWorkGroupSizeAttr, | ||
IntelReqdSubGroupSizeAttr, SYCLIntelMaxGlobalWorkDimAttr, | ||
IntelNamedSubGroupSizeAttr, SYCLIntelFPGAInitiationIntervalAttr>(A); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we can simplify this code somehow but this branch collects attributes to be applied to device functions and kernel itself. The attributes applied to kernel still need to be collected and applied irrespective of SYCL version.
// Attributes that should not be propagated from device functions to a kernel. | ||
if (DirectlyCalled) { | ||
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) { | ||
return isa<SYCLIntelLoopFuseAttr, SYCLIntelFPGAMaxConcurrencyAttr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be added to the DirectlyCalled list in L362?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attributes directly apply on kernel functor/lambda in SYCL2020 modes, so i did not add them in L362.
SYCLIntelFPGAMaxConcurrencyAttr,
SYCLIntelFPGADisableLoopPipeliningAttr,
SYCLIntelFPGAInitiationIntervalAttr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These attributes are available prior to SYCL2020 right? Shouldn't they apply for earlier versions as well? I think this patch changes existing behavior for these attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These attributes are available prior to SYCL2020 right? Shouldn't they apply for earlier versions as well? I think this patch changes existing behavior for these attributes.
All attributes were added recently. they were added after SYCl2020 spec release. I think they should not apply in SYCL2017 modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how its expected to work but unless the extensions these attributes support are limited to SYCL2020, or these attributes are documented to work only in SYCL 2020, we probably should not be changing this behavior for earlier versions of SYCL. @AaronBallman please let us know your thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if these attributes are not supported for SYCL 2017, shouldn't we be diagnosing it?
The diagnostic seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please wait for @AaronBallman's input. In my opinion it is confusing/strange to have individual attributes behave differently in different versions of SYCL spec, but I guess we are doing that with this change anyway. I guess the question is more - should we change existing behavior for these attributes in SYCL 2017
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion it is confusing/strange to have individual attributes behave differently in different versions of SYCL spec,
I agree with you, @elizabethandrews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delayed response -- having power issues at the house.
I'm not sure how its expected to work but unless the extensions these attributes support are limited to SYCL2020, or these attributes are documented to work only in SYCL 2020, we probably should not be changing this behavior for earlier versions of SYCL. @AaronBallman please let us know your thoughts here.
Agreed. I think the route we want to go is:
- When a new attribute is added to the
intel
namespace, we support it in all the language modes where it has valid semantics. The semantics of the attribute can do whatever is most sensible for that given language mode, but once the semantics are set and the attribute has been released in the wild, the semantics should not change except to become more permissive (e.g., we shouldn't change the semantics such that code breaks, but we should be fine to allow the attribute to be used in ways that used to produce an error). - When a new attribute is added to the
sycl
namespace, we support it in all the language modes where it has valid semantics, but we diagnose use of a new feature in the older modes as an extension. The semantics of the attribute have to follow what's specified by the SYCL spec. If we think some semantics are going to cause implementation concerns for us, we need to talk to the SYCL spec authors about how to resolve it on a case-by-case basis.
** Note: community typically also adds a compatibility warning in the newer mode so people who want their code to remain compatible with older language standards can do so. If we didn't support SYCL 1.2.1, we could skip this diagnostic, but from talks with @kbsmith-intel, it sounds like SYCL 1.2.1 support is still mandatory and so the future compat warning should also be added. - When the attribute does not have valid semantics in a given mode (regardless of what vendor namespace the attribute is in), we should ignore the attribute with a diagnostic to let the user know it's being ignored. The only exception to this rule are SYCL attributes that are ignored in host mode but not device mode.
I guess the question is more - should we change existing behavior for these attributes in SYCL 2017
I don't think we should change existing behaviors -- that runs too much risk of silently breaking user code. However, it's also not clear to much just how much implementation effort it is to retain the old behavior in each case and whether the old behavior was sensible or not in SYCL 1.2.1. My reading of the 1.2.1 spec suggests that only the vec_type_hint
, work_group_size_hint
, and reqd_work_group_size
attributes are propagated and none of the rest of them are. I get this from (emphasis added by me for clarity):
The vec_type_hint, work_group_size_hint and reqd_work_group_size kernel attributes in OpenCL C
apply to kernel functions, but this is not syntactically possible in SYCL. In SYCL, these attributes are legal on
device functions and their specification is propagated down to any caller of those device functions, such that the
kernel attributes are the sum of all the kernel attributes of all device functions called.
That said, I have no idea if this is an accurate understanding of the SYCL spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AaronBallman and @elizabethandrews. I have added the attributes below in SYCL 1.2.1 mode. The semantic is same in all modes.
SYCLIntelFPGAMaxConcurrencyAttr,
SYCLIntelFPGADisableLoopPipeliningAttr,
SYCLIntelFPGAInitiationIntervalAttr
Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
@@ -2443,8 +2447,9 @@ lambda capture, or function object member, of the callable to which the | |||
attribute was applied. This effect is equivalent to annotating restrict on | |||
**all** kernel pointer arguments in an OpenCL or SPIR-V kernel. | |||
|
|||
If ``intel::kernel_args_restrict`` is applied to a function called from a device | |||
kernel, the attribute is not ignored and it is propagated to the kernel. | |||
In SYCL 1.2.1 mode, the ``intel::kernel_args_restrict`` attribute is propagated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this risk breaking code, or was the old documentation wrong in SYCL 2020 mode? Same question applies to the other documentation instances where we go from always propagating to sometimes propagating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this risk breaking code, or was the old documentation wrong in SYCL 2020 mode? Same question applies to the other documentation instances where we go from always propagating to sometimes propagating.
Not sure i understand your question correctly.
The current PR changes the existing behavior. Only breaking part happens here - propagation with SYCL 1.2.1 and no propagation with SYCL 2020 mode when the attribute is applied to a function from a device kernel. so old documentation was wrong in SYCL 2020 mode.
// If the [[intel::named_sub_group_size]] attribute spelling is used in | ||
// SYCL 2017 mode, we want to diagnose it as being an ignored attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docs may need to be updated for this attribute, as they currently imply there's a mode other than SYCL 2020 mode: https://github.com/intel/llvm/blob/sycl/clang/include/clang/Basic/AttrDocs.td#L4613
Also, this should be expressed in Attr.td in a LangOpts
clause.
} else { | ||
// Attributes that should not be propagated from device functions to a | ||
// kernel in SYCL 2020. | ||
if (DirectlyCalled) { | ||
llvm::copy_if(FD->getAttrs(), std::back_inserter(Attrs), [](Attr *A) { | ||
return isa< | ||
SYCLIntelFPGAMaxConcurrencyAttr, | ||
SYCLIntelFPGADisableLoopPipeliningAttr, SYCLSimdAttr, | ||
SYCLIntelKernelArgsRestrictAttr, ReqdWorkGroupSizeAttr, | ||
SYCLIntelNumSimdWorkItemsAttr, SYCLIntelSchedulerTargetFmaxMhzAttr, | ||
SYCLIntelNoGlobalWorkOffsetAttr, SYCLIntelMaxWorkGroupSizeAttr, | ||
IntelReqdSubGroupSizeAttr, SYCLIntelMaxGlobalWorkDimAttr, | ||
IntelNamedSubGroupSizeAttr, SYCLIntelFPGAInitiationIntervalAttr>(A); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to think more and more that it's past time to table generate this logic rather than continuing to struggle to maintain these lists manually. Do I understand correctly that we really have one collection need with three (maybe four) modes: never propagate, always propagate, only propagate when directly called (possibly with a diagnostic), but the mode may differ based on the language options? If so, we could perhaps try to design a tablegen feature to do this. e.g., something along the lines of
>>let SYCLKernelPropagationBehavior = [SYCLKernelPropMode<SYCL2020, NeverPropagate>, // SYCL 2020 behavior
SYCLKernelPropMode<SYCL2017, AlwaysPropagate>] // SYCL 2017 behavior
(Where
SYCL2020
andSYCL2017
are newLangOpt
definitions we add to Attr.td and the*Propagate
are enumerations we define.)
Thinking out loud: we'd generate a function named
static bool isAttributeCollected(const Attr *A, const LanguageOptions &LangOpts, bool IsDirectlyCalled);
that returns whether an attribute should be collected or not. We'd have to collect all of theSYCLKernelPropagationBehavior
objects in Attr.td so that we could group the language mode checks together in the resulting generated file. What we'd generate would effectively look like:
>>if (LangOpts.getSYCLVersion() == SYCL_2017) {
if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2017, propagate if directly called
return true;
if (isa<large generated list of attributes here>(A)) // SYCL 2017, always propagate
return true;
>>}
>>if (LangOpts.getSYCLVersion() >= SYCL_2020) {
if (isa<large generated list of attributes here>(A) && IsDirectlyCalled) // SYCL 2020, propagate if directly called
return true;
if (isa<large generated list of attributes here>(A)) // SYCL 2020, always propagate
return true;
}
return false;
where each of the generated lists of attributes in the isa<>
checks are based off the propagation enumeration from Attr.td.
If we want to include diagnostics in the logic, I think we'd return an enumeration rather than a boolean and let the caller figure out what diagnostic to emit, whether to drop the attribute, etc. But given that we only have one of those, we may just want to handle that case specially.
Then,
collectSYCLAttributes()
will defer most of the logic to the generatedisAttributeCollected()
, but can still house any custom logic we need (like for diagnostics).
WDYT? (Note, there may be tweaks needed to the idea -- this was designed somewhat off-the-cuff, so if you have a better idea of how to express this in Attr.td, we should definitely explore it.)
As for whether this is a separate task or done as part of this one... I'm on the fence. It's a bit separable, but at the same time, it'd implement the main point to this review so it seems reasonable to just do it here.
Thanks @AaronBallman for the tablegen design. I agree with you that it seems reasonable to do this here. I did not have a chance to look into the new design yesterday. I will take a look at this today and will follow up with you about this for any question.
My understanding is we have 2 broad classifications in SYCL2017-
In SYCL2020 point 1. is moot. Point 2 however is still relevant. The concept of |
Agreed.
Yes, Agreed. |
Excuse my being stupid, but I still don't understand the point to |
I don't think so. E.g. User Defined SYCL kernel -
SYCL Kernel Call -
AST node for this kernel however does not contain the attribute. I do not know why. Maybe because it is applied to operator method( ) ?
AST after we collect attributes -
|
I had a good, long conversation on the phone with @erichkeane about this and I think I have my head wrapped around it a little bit better. Thank you to everyone for the discussion! What I understand now is that this really is propagating the attribute to the opencl-kernel in the It also sounds like, at least in theory for SYCL 2020, we might want all attributes to be propagated in the If this is reasonably accurate, then I think the design basically works, except we basically would never mark anything in Attr.td as WDYT? |
Yep! Thank you for very succinctly summarizing this!
This is an interesting point. I would assume only a subset of function attributes should apply to kernel/device functions. We seem to do this for kernel functions by only allowing a certain subset to propagate. For device functions I am not sure. Maybe CodeGen only handles 'allowed attributes' for device functions? I haven't verified this.
I've never actually tinkered with inner workings of TableGen and so I apologize in advance if what I'm about to suggest/ask makes no sense :) Is there a reason why we can't just mark attributes as |
I'm not certain, to be honest. I agree that I think we only want a subset of function attributes to apply to device functions though, but how we decide what that list is.. no clue. I brought up multiversioning, but what about things like calling convention attributes? Sanitizers? Constructor/Cleanup? I think someone needs to audit what attributes are available for functions, pare out the ones that are only semantic checks we can't utilize, pare out the ones that don't make sense in a kernel, and see what's left. But this adds a new maintenance burden whenever we do a pulldown -- community isn't going to maintain the list of what's valid on a kernel or not, but they add new attributes that may be useful for kernels just the same.
Definitely not a silly question! I'm kind of on the fence about whether BTW, I'm sort of ignoring/hand-waving at the issue where the language options conflict. I figure the predicates in the generated file will be listed in source order from Attr.td, so if a later language option conflicts with an earlier one, oh well. |
In SYCL 1.2.1 spec, the attributes get propagated from device functions to a kernel.
The SYCL 2020 requirement mandating the avoidance of the propagation of all kernel attributes to the caller when used on a function.
Attributes that should not be propagated from device functions to a kernel to match with new SYCL 2020 spec.
1. scheduler_target_fmax_mhz
2. kernel_args_restrict
3. no_global_work_offset
4. max-work-group-size
5. max-global-work-dim
6. num-simd-work-items
7. reqd-sub-group-size
8. reqd-work-group-size
9. named_sub_group_size
10. sycl_explicit_simd
This patch
i. keeps the SYCL 1.2.1 spec functionality and propagates the attributes with the older SYCL mode(-sycl-std=2017)
ii. adds diagnostic for ignored attribute for attribute spelling [[intel:: named_sub_group_size()]] with earlier
version of SYCL mode (-sycl-std=2017) since this attribute follows the SYCL 2020 Attribute Rules.
iii. adds or updates tests to validate the propagating behavior with SYCL 2020 and SYCL 2017 modes.
Signed-off-by: Soumi Manna [email protected]