-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL]Link Fallback Device Libraries On Demand #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, all. This patch implements the fallback device library(SPIRV files) on-demand loading.
|
38dfccb
to
c6af296
Compare
Signed-off-by: gejin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Hi @jinge90, you are right, there is no interaction with OpenMP currently. Just curious, have you considered running dead code elimination in Can you please add header comments in |
Why this is uint32_t bitmask rather than a list of symbolic library ids like in ELF? If this is an optimization, I feel that it won't bring tangible speedup, but complicate code. UPDATE: I withdraw this comment, as integer identification have existed before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments in the code
Hi, @vzakhari , I have considered dead code elimination and drafted a simple patch in sycl-post-link. I will send the patch to you later. |
Signed-off-by: Ge Jin <[email protected]>
0dec3ab
to
698b13d
Compare
… sycl runtime Signed-off-by: gejin <[email protected]>
Signed-off-by: gejin <[email protected]>
@@ -0,0 +1,23 @@ | |||
//=- SYCLRTShared.h - Shared definition between llvm tools and SYCL runtime -=// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@bader, @andykaylor, @romanovvlad - does this approach looks good to you? Is llvm::util::sycl a good namespace?
@jinge90 , I suggest to add a comments what this file is
- shared type definitions between llvm tools and SYCL RT library
- definitions which introduce SYCL RT - LLVM linkage dependencies should not be added here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems OK to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kbobrovs
I have added the comments in SYCLRTShared.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbobrovs, thanks for pinging me and sorry for late feedback.
I don't fully understand the design of this "linking on demand" solution (it would be great to extend https://github.com/intel/llvm/blob/sycl/sycl/doc/CompilerAndRuntimeDesign.md document), but this concerns me.
It seems like we bake into LLVM compiler framework the information about Intel specific OpenCL extensions. @andykaylor, do you think this design can be accepted by the LLVM community? I think community prefers scaleable solutions (e.g. framework for linking arbitrary libraries), whereas current implementation is able to address very specific SYCL use case.
It would be great to get a community feedback on this approach through RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we bake into LLVM compiler framework the information about Intel specific OpenCL extensions.
@bader, under "Intel specific OpenCL extensions" do you mean SYCL as a whole or just the "linking on demand part"?
If the former (SYCL), then in my view we already have baked a lot of SYCL into LLVM :) (for good reason) - take clang FE, for example, offload tools - bundler, wrapper, SYCL specific passes in LLVM itself. Maybe the location "llvm/Support/SYCLRTShared.h" is not ideal and there could be new folder with "SYCL" in it to underline SYCL specifics?
If the latter - then SYCLRTShared.h is not specific to this extension, it serves for sharing some type and string literal definitions between LLVM tools and SYCL runtime.
It would be great to get a community feedback on this approach through RFC.
Do you think this should be done before this PR can be merged or can be done in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we bake into LLVM compiler framework the information about Intel specific OpenCL extensions.
@bader, under "Intel specific OpenCL extensions" do you mean SYCL as a whole or just the "linking on demand part"?
cl_intel_devicelib_assert, cl_intel_devicelib_math, cl_intel_devicelib_math_fp64, cl_intel_devicelib_complex, cl_intel_devicelib_complex_fp64 - are Intel specific OpenCL extensions.
If the former (SYCL), then in my view we already have baked a lot of SYCL into LLVM :) (for good reason) - take clang FE, for example, offload tools - bundler, wrapper, SYCL specific passes in LLVM itself. Maybe the location "llvm/Support/SYCLRTShared.h" is not ideal and there could be new folder with "SYCL" in it to underline SYCL specifics?
If the latter - then SYCLRTShared.h is not specific to this extension, it serves for sharing some type and string literal definitions between LLVM tools and SYCL runtime.
Can we apply the same approach as regular C++ compiler does? I.e. link with the whole C++ standard library (using LLVM linker) + provide an compiler option to disable SYCL extension and skip linking with this libraries? This should much easier to implement and maintain as it should not require extending LLVM functionality.
It would be great to get a community feedback on this approach through RFC.
Do you think this should be done before this PR can be merged or can be done in parallel?
Ideally the review should happen before the merge, but if there are any time constraints we can merge and re-do later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kbobrovs @andykaylor @bader
This patch is used to fix some perf regression for cases which has "small" kernel execution time and we are required to fix it ASAP recently, could we merge it and continue to work on the RFC to collect community's feedback?
And I think it is valuable to add a header to share information between sycl runtime and llvm-tools. Currently, the header is located in llvm/Support/SYCLRTShared.h and following things are included: Property constant string such as "SYCL/specialization constants"; DeviceLibExt enum definition such as "cl_intel_devicelib_***"
We need a careful analysis and discussion on following points:
- The header's filename and location?
- The namespace used in this header?
- Which definitions or constants can be placed in this file?
- Can intel-specific things be placed in this header?
Thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this is the right approach to "fix performance regression". Moving some runtime overhead to compile time (e.g. #1398) seems to be a better option.
What is the root cause of the regression? Please, add a problem statement to the PR description.
@@ -5,15 +5,12 @@ | |||
// | |||
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out | |||
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out | |||
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out | |||
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kbobrovs
Currently, all program's kernel code will be linked with our devicelib as default, so sycl runtime will use piProgramLink to link all spv images. However, after we enabled devicelib "on-demand" loading, sycl runtime will use piProgramBuild to build users' device image. For OCL backend, piProgramBuild will be mapped to OCL's clBuildProgram API. According to OCL Spec, we can only pass "Compiler options" to this API and "Link Options" is not used.
To this test, the source code doesn't depend on devicelib, so I remove all test commands using "SYCL_PROGRAM_LINK_OPTIONS".
However, current status is a little tricky, we have to keep loading "assert" fallback spv as default to workaround another separate issue, so sycl runtime will still use piProgramCompile + piProgramLink for all program's device code. Do you think we need to keep those test commands for now?
Thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problems you describe are implementation detail and should be fixed. User setting SYCL_PROGRAM_LINK_OPTIONS=
should not lead to crashes if link options are good. If that's the case, the implementation should be fixed, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kbobrovs
For this case, those link options will lead to crash in OCL CPU runtime, the problem is in "-enable-link-options". If no device libraries will be loaded and linked with user's kernel code, sycl runtime will use piProgramBuild for jit compilation which will be mapped to clBuildProgram. According to OCL Spec, "-enable-link-options" must be specified with "--create-library" option. I think we have 2 ways to deal with the case:
- If sycl runtime uses piProgramBuild, the SYCL_PROGRAM_LINK_OPTIONS env variable value will be ignored, we only pass SYCL_PROGRAM_COMPILER_OPTIONS value to underlying OCL runtime.
- We keep passing "SYCL_PROGRAM_LINK_OPTIONS" to underlying OCL runtime even if piProgramBuild is used, it's user's duty to set "good" SYCL_PROGRAM_LINK_OPTIONS. We need to fix this case to remove "-enable-link-options".
Which one do you prefer or do you have any other idea?
Thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to OCL Spec, "-enable-link-options" must be specified with "--create-library" option.
I see, thanks. This makes the test invalid, so removal is OK then.
2) seems better solution. But for this PR just removal of the two invalid test cases (as you did before) should be enough
Signed-off-by: gejin <[email protected]>
using llvm::util::sycl::cl_intel_devicelib_assert; | ||
using llvm::util::sycl::cl_intel_devicelib_complex; | ||
using llvm::util::sycl::cl_intel_devicelib_complex_fp64; | ||
using llvm::util::sycl::cl_intel_devicelib_math; | ||
using llvm::util::sycl::cl_intel_devicelib_math_fp64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow my previous comment on this disappeared. Please don't using
all the individual enum elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I see that you add merge commits when you implement review comments:
Merge remote-tracking branch 'upstream/sycl' into link_fallback_devic… …
This does not seem right to me. Maybe @bader can comment more. I think this can lead to missing review comments. You should only use git push/git pull when working with the review, so that jinge90:link_fallback_devicelib_on_demand
does not have merge commits. Only if you have conflicts, you should rebase your feature branch. I do it like this:
git rebase <hash> --onto intel_llvm/sycl
where is the commit right below the first feature branch commit in git log
output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kbobrovs
Thank you for the remind. I updated patch to use c++11 scoped enum for DeviceLibExt, the definition looks like:
enum class DeviceLibExt : std::uint32_t {cl_intel_devicelib_assert,.....}
By doing this, we don't need to "using" all individual enum elements, we only need to "using llvm::util::sycl::DeviceLibExt;" in source code and use "DeviceLibExt::cl_intel_devicelib_*" to refer to all DeviceLibExt enum elements. Is this OK to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you.
using llvm::util::sycl::cl_intel_devicelib_assert; | ||
using llvm::util::sycl::cl_intel_devicelib_complex; | ||
using llvm::util::sycl::cl_intel_devicelib_complex_fp64; | ||
using llvm::util::sycl::cl_intel_devicelib_math; | ||
using llvm::util::sycl::cl_intel_devicelib_math_fp64; | ||
using llvm::util::sycl::DeviceLibExt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't using
individual enum elements
@@ -5,15 +5,12 @@ | |||
// | |||
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out | |||
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out | |||
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out | |||
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problems you describe are implementation detail and should be fixed. User setting SYCL_PROGRAM_LINK_OPTIONS=
should not lead to crashes if link options are good. If that's the case, the implementation should be fixed, I think.
Signed-off-by: gejin <[email protected]>
Signed-off-by: gejin <[email protected]>
Signed-off-by: gejin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for devicelib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay to merge if someone is waiting for this fix. I'd like to have separate discussion to better understand the problem this PR is supposed to solve, because I have serious concerns that we won't be able to upstream this solution.
@smaslov-intel, @intel/llvm-reviewers-runtime, ping. |
Hi, @bader @kbobrovs @andykaylor |
@jinge90, I'm ok to revert the shared header and return to code duplication if there is no consensus on the solution. If you do this, please restore cross-reference in the comments between the duplicated parts and match naming. |
Hi, @kbobrovs |
Signed-off-by: gejin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pi changes look OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for devicelib.
Signed-off-by: gejin [email protected]