Skip to content

[SYCL] Add Clang support for FPGA loop fusion function attributes #2877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 22, 2020

Conversation

elizabethandrews
Copy link
Contributor

This patch adds support for FPGA function attributes loop_fuse and
loop_fuse_independent. [[intel::loop_fuse(N)]] is a strong request,
to the extent possible, to fuse loops within the function, that are
contained in at most N-1 other loops within the function. If the
optional parameter N is omitted, it is a strong request, to the extent
possible, to fuse loops within the function that are not contained in
any other loop within the function. [[intel::loop_fuse_independent(N)]]
is used to guarantee that fusion safety analysis can ignore
negative-distance dependences between these loops.

FrontEnd Specifications:

The attributes take one optional parameter, a constant integral
expression between 0 and 1024*1024. The paramter may be a template
parameter.

The same function definition can have atmost one of these two attributes.

The attributes can be applied explictly to kernel. However, attributes
should not be propagated to callers i.e it should not be propagated from
device functions to kernel.

LLVM IR is function metadata as follows:

define i32 @foo() !loop_fuse !0
!0 = !{i32 N, i32 D}

where N is the value specified by the optional attribute argument. If
the optional argument is omitted, N is set to 1. D is equal to 0 for
[[intel::loop_fuse]] and 1 for [[intel::loop_fuse_independent]].

Signed-off-by: Elizabeth Andrews [email protected]

This patch adds support for FPGA function attributes loop_fuse and
loop_fuse_independent. [[intel::loop_fuse(N)]] is a strong request,
to the extent possible, to fuse loops within the function, that are
contained in at most N-1 other loops within the function. If the
optional parameter N is omitted, it is a strong request, to the extent
possible, to fuse loops within the function that are not contained in
any other loop within the function. [[intel::loop_fuse_independent(N)]]
is used to guarantee that fusion safety analysis can ignore
negative-distance dependences between these loops.

FrontEnd Specifications:

The attributes take one optional parameter, a constant integral
expression between 0 and 1024*1024. The paramter may be a template
parameter.

The same function definition can have atmost one of these two attributes.

The attributes can be applied explictly to kernel. However, attributes
should not be propagated to callers i.e it should not be propagated from
device functions to kernel.

LLVM IR is function metadata as follows:

define i32 @foo() !loop_fuse !0
!0 = !{i32 N, i32 D}

where N is the value specified by the optional attribute argument. If
the optional argument is omitted, N is set to 1. D is equal to 0 for
[[intel::loop_fuse]] and 1 for [[intel::loop_fuse_independent]].

Signed-off-by: Elizabeth Andrews <[email protected]>
@elizabethandrews
Copy link
Contributor Author

I was just told this attribute should be disabled for host. I will be pushing a new patch with host support removed. I apologize for the confusion.

// Handle optional attribute argument.
if (Attr.isArgExpr(0))
// Attribute argument specified.
S.AddOneConstantValueAttr<AttrType>(D, Attr, Attr.getArgAsExpr(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out where the implementation for that is! FWIW, I would have concerns about upstreaming that function. Nothing for you to solve as part of this review though.

Attr))
return nullptr;

if (D->hasAttr<SYCLIntelLoopFuseAttr>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual pattern is: the mergeFooAttr() function checks all of the merging constraints and diagnoses problematic situations like mutually exclusive attributes, duplicate attributes with different arguments, etc, and the handleFooAttr() function checks all of the basic constraints and diagnoses problematic situations like incorrect number of attribute arguments, wrong argument type, etc. and if everything looks okay, calls mergeFooAttr() and only adds the resulting attribute from the merge function if it's nonnull.

The availability attribute is an example of one that has somewhat complicated checking in both handle and merge. What you have in this patch is somewhat close, but the merge function is not diagnosing the duplicate attribute before dropping it, and the handle function is doing a bit of manual work rather than calling the merge function.

Signed-off-by: Elizabeth Andrews <[email protected]>
@keryell
Copy link
Contributor

keryell commented Dec 10, 2020

I am curious about this D parameter between 0 and 1024*1024...
Why? What is this?

1. Changed attribute definitions to define only 1 semantic attribute
2. Added a couple of tests
3. Minor refactoring/code formatting

Signed-off-by: Elizabeth Andrews <[email protected]>
@elizabethandrews
Copy link
Contributor Author

I am curious about this D parameter between 0 and 1024*1024...
Why? What is this?

N indicates nesting. So it needs to be 0 or more. The upper limit is just an arbitrary high value.

Signed-off-by: Elizabeth Andrews <[email protected]>
during host compilation.

Signed-off-by: Elizabeth Andrews <[email protected]>
AaronBallman
AaronBallman previously approved these changes Dec 18, 2020
Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from some minor nits with comments.

Fznamznon
Fznamznon previously approved these changes Dec 18, 2020
@elizabethandrews
Copy link
Contributor Author

Are the buildbots down? @romanovvlad

@elizabethandrews
Copy link
Contributor Author

Thanks for the reviews everyone! The final commit just adds missing fullstops.

Fznamznon
Fznamznon previously approved these changes Dec 18, 2020
premanandrao
premanandrao previously approved these changes Dec 18, 2020
Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (thanks to a thorough review by @AaronBallman), I just had one question below.

romanovvlad
romanovvlad previously approved these changes Dec 21, 2020
@romanovvlad
Copy link
Contributor

Are the buildbots down? @romanovvlad

They should be restored now.

@elizabethandrews
Copy link
Contributor Author

Hmm... That's strange... The test passed locally for me even after I used -triple spir64-unknown-windows-sycldevice

Signed-off-by: Elizabeth Andrews <[email protected]>
@elizabethandrews
Copy link
Contributor Author

I think the test fail might have been a name mangling issue. I don't have a Windows workspace at the moment to confirm and I was unable to reproduce the issue on Linux even after using windows triple. Hopefully the latest commit fixes the test failure. If It doesn't I guess I'll make a Windows workspace to try and reproduce the fail.

@romanovvlad romanovvlad merged commit 23926b0 into intel:sycl Dec 22, 2020
jsji pushed a commit that referenced this pull request Dec 14, 2024
Semantically we can translate `OpIAddCarry` and `OpISubBorrow` back into `@llvm.uadd.with.overflow` and `@llvm.usub.with.overflow` respectively.
It require small transformation as there is a difference between return
types in SPIR-V and LLVM IR - e.g., SPIR-V instructions return {i32, i32} struct, but LLVM intrinsics return {i32, i1}.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@44bd69ee70b79ed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants