-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL][Fusion] API for kernel fusion library #7465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
be9e0fa
to
0ff42c4
Compare
The design document for the overall implementation of kernel fusion can be found here. |
deebaf4
to
523c851
Compare
Since this is a new library we may need to determine who would be the best reviewers going forward for these. I'm going to add a few people. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple initial comments. I think I'll need a good cup of coffee for this one, so I will return in a couple of days.
/// | ||
/// Enumerate possible kinds of parameters. | ||
/// 1:1 correspondence with the definition in kernel_desc.hpp in the DPC++ SYCL | ||
/// runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to stay in sync with the definition in kernel_desc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, only Accessor
(for internalization) and StdLayout
(for constant propagation) receive special treatment by the JIT compilation process. The interface logic to this library in the SYCL runtime (part of a later PR) uses the SYCL RT internal ArgDesc
directly, so additional members for this enum in kernel_desc
would not cause errors in the JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a brief review, I haven't been able to grasp it in details yet
sycl-fusion/jit-compiler/lib/translation/SPIRVLLVMTranslation.cpp
Outdated
Show resolved
Hide resolved
if (KernelFunction->hasMetadata(REQD_WORK_GROUP_SIZE_ATTR)) { | ||
auto *MD = KernelFunction->getMetadata(REQD_WORK_GROUP_SIZE_ATTR); | ||
SYCLKernelAttribute ReqdAttr{REQD_WORK_GROUP_SIZE_ATTR}; | ||
getAttributeValues(ReqdAttr.Values, MD, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that there is an ongoing effort to allow less than 3 operands in reqd_work_group_size
metadata, see #7450 (@steffenlarsen can link PRs to other components)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I wasn't aware of that. We should be able to adapt our processing of the attributes once that PR is merged, to match the new behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this particular case KhronosGroup/SPIRV-LLVM-Translator#1726 is also relevant. Note that in the reverse-translator case you will always get 3 operands back, but I believe you also care about generated LLVM IR that hasn't gone through the translator, in which case you may need to adjust for optional parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint!
Right now, we only care about consuming SPIR-V. Untranslated LLVM IR will become relevant for targets that the DPC++ frontend currently does not compile to SPIR-V, e.g., targeting CUDA or AMD GPUs. Will keep this in mind for when we work on that.
sycl-fusion/jit-compiler/lib/translation/SPIRVLLVMTranslation.cpp
Outdated
Show resolved
Hide resolved
sycl-fusion/jit-compiler/lib/translation/SPIRVLLVMTranslation.cpp
Outdated
Show resolved
Hide resolved
First components for the JIT compiler used for SYCL kernel fusion: * API definitions * Input translation from SPIR-V to LLVM IR * Insertion of fused kernel function stub and metadata * Supporting infrastructure such as compiler options. Co-authored-by: Lukas Sommer <[email protected]> Co-authored-by: Victor Perez <[email protected]> Signed-off-by: Lukas Sommer <[email protected]>
523c851
to
962c922
Compare
Thank you for the first reviews @steffenlarsen and @AlexeySachkov!
@pvchupin: I've added the Codeplay team that has been working on this implementation/library to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more comments, but I think it looks good overall.
Note that the comment about the use of auto
should probably also apply to a few other files. It doesn't have to be all uses of auto
, for example X::create
should be clear enough about the type being X
, but for those that aren't necessarily clear about exactly what type it uses would be nice to have explicit.
Also, what are the plans for testing this? I realize much of this is just the skeleton with more to come, but is there any of this that might make sense to unit-test now or does it make more sense to wait?
if (KernelFunction->hasMetadata(REQD_WORK_GROUP_SIZE_ATTR)) { | ||
auto *MD = KernelFunction->getMetadata(REQD_WORK_GROUP_SIZE_ATTR); | ||
SYCLKernelAttribute ReqdAttr{REQD_WORK_GROUP_SIZE_ATTR}; | ||
getAttributeValues(ReqdAttr.Values, MD, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this particular case KhronosGroup/SPIRV-LLVM-Translator#1726 is also relevant. Note that in the reverse-translator case you will always get 3 operands back, but I believe you also care about generated LLVM IR that hasn't gone through the translator, in which case you may need to adjust for optional parameters.
Signed-off-by: Lukas Sommer <[email protected]>
Thanks again for the feedback @steffenlarsen.
The
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @sommerlukas! LGTM!
Thanks for the approvals! Can either of you please merge this PR, I'm not authorized to merge this, @steffenlarsen @pvchupin? |
This is the third patch in a series of patches to add an implementation of the [kernel fusion extension](#7098). We have split the implementation into multiple patches to make them more easy to review. This patch integrates the kernel fusion extension into the SYCL runtime scheduler. Next to collecting the kernels submitted while in fusion mode in the fusion list associated with the queue, the integration into the scheduler is also responsible for detecting the synchronization scenarios. Various scenarios, such as buffer destruction or event wait, require fusion to be aborted early. The full list of scenarios is available in the [extension proposal](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_kernel_fusion.asciidoc#synchronization-in-the-sycl-application). A high-level description of the integration into the scheduler can be found in the [design document](#7204). This PR can be reviewed and merged independently of #7465. Signed-off-by: Lukas Sommer <[email protected]> Signed-off-by: Lukas Sommer <[email protected]>
SPIRVBinary &emplaceSPIRVBinary(std::string Binary); | ||
|
||
private: | ||
// FIXME: Change this to std::shared_mutex after switching to C++17. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have switched to C++17 as a minimal required version mid 2022.
This is the second patch in a series of patches to add an implementation of thekernel fusion extension. We have split the implementation into multiple patches to make them more easy to review.
This patch can be reviewed and merged independently of #7416.
This patch adds the first components for the JIT compiler used for implementation of kernel fusion at runtime , concretely:
CMake logic to link and code to invoke this JIT from the SYCL runtime will follow in a later patch.
Co-authored-by: Lukas Sommer [email protected]
Co-authored-by: Victor Perez [email protected]
Signed-off-by: Lukas Sommer [email protected]