Skip to content

[SYCL]Link Fallback Device Libraries On Demand #1787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 14, 2020

Conversation

jinge90
Copy link
Contributor

@jinge90 jinge90 commented May 30, 2020

Signed-off-by: gejin [email protected]

@jinge90
Copy link
Contributor Author

jinge90 commented May 30, 2020

Hi, all. This patch implements the fallback device library(SPIRV files) on-demand loading.
The idea is:

  1. After llvm-mc linking all kernel code into final device image, sycl-post-link tool will go through all modules to find all undefined "_devicelib*" function calls.
  2. For each undefined "_devicelib*" functional call detected, we will check which fallback device library is required to support it.
  3. After collecting all device libraries required, we will create an uint32_t "device library require mask" whose each bit corresponds to one fallback device library.
  4. We store the "device library require mask" into "SYCL/devicelib req mask" property via sycl toolchain's "property" mechanism.
  5. When sycl runtime loads device images, it will read the "device library require mask" and only load/link necessary fallback device libraries. If program doesn't use any device library functions, no fallback SPIRV file will be linked.
    Hi, @vzakhari
    Could you help evaluate the potential risk in OMP's perspective, I expect no impact as sycl runtime is not used in OMP offloading.
    Thank you very much.

@jinge90 jinge90 force-pushed the link_fallback_devicelib_on_demand branch 2 times, most recently from 38dfccb to c6af296 Compare May 31, 2020 07:15
Copy link
Contributor

@asavonic asavonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vzakhari
Copy link
Contributor

vzakhari commented Jun 1, 2020

Hi @jinge90, you are right, there is no interaction with OpenMP currently.

Just curious, have you considered running dead code elimination in sycl-post-link so that it removes unused devicelib wrappers before propagating the property?

Can you please add header comments in devicelib files mentioning internal structures of sycl-post-linlk that need to be updated, if new __devicelib_ APIs are added? It is probably also worth updating devicelib documentation.

@kbobrovs
Copy link
Contributor

kbobrovs commented Jun 1, 2020

After collecting all device libraries required, we will create an uint32_t "device library require mask" whose each bit corresponds to one fallback device library.

Why this is uint32_t bitmask rather than a list of symbolic library ids like in ELF? If this is an optimization, I feel that it won't bring tangible speedup, but complicate code.

UPDATE: I withdraw this comment, as integer identification have existed before.

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments in the code

@jinge90
Copy link
Contributor Author

jinge90 commented Jun 2, 2020

Hi @jinge90, you are right, there is no interaction with OpenMP currently.

Just curious, have you considered running dead code elimination in sycl-post-link so that it removes unused devicelib wrappers before propagating the property?

Can you please add header comments in devicelib files mentioning internal structures of sycl-post-linlk that need to be updated, if new __devicelib_ APIs are added? It is probably also worth updating devicelib documentation.

Hi, @vzakhari , I have considered dead code elimination and drafted a simple patch in sycl-post-link. I will send the patch to you later.

@jinge90 jinge90 force-pushed the link_fallback_devicelib_on_demand branch from 0dec3ab to 698b13d Compare June 4, 2020 14:11
@jinge90 jinge90 requested a review from a team as a code owner June 9, 2020 16:04
@@ -0,0 +1,23 @@
//=- SYCLRTShared.h - Shared definition between llvm tools and SYCL runtime -=//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
@bader, @andykaylor, @romanovvlad - does this approach looks good to you? Is llvm::util::sycl a good namespace?

@jinge90 , I suggest to add a comments what this file is

  • shared type definitions between llvm tools and SYCL RT library
  • definitions which introduce SYCL RT - LLVM linkage dependencies should not be added here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kbobrovs
I have added the comments in SYCLRTShared.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbobrovs, thanks for pinging me and sorry for late feedback.

I don't fully understand the design of this "linking on demand" solution (it would be great to extend https://github.com/intel/llvm/blob/sycl/sycl/doc/CompilerAndRuntimeDesign.md document), but this concerns me.
It seems like we bake into LLVM compiler framework the information about Intel specific OpenCL extensions. @andykaylor, do you think this design can be accepted by the LLVM community? I think community prefers scaleable solutions (e.g. framework for linking arbitrary libraries), whereas current implementation is able to address very specific SYCL use case.

It would be great to get a community feedback on this approach through RFC.

Copy link
Contributor

@kbobrovs kbobrovs Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we bake into LLVM compiler framework the information about Intel specific OpenCL extensions.

@bader, under "Intel specific OpenCL extensions" do you mean SYCL as a whole or just the "linking on demand part"?
If the former (SYCL), then in my view we already have baked a lot of SYCL into LLVM :) (for good reason) - take clang FE, for example, offload tools - bundler, wrapper, SYCL specific passes in LLVM itself. Maybe the location "llvm/Support/SYCLRTShared.h" is not ideal and there could be new folder with "SYCL" in it to underline SYCL specifics?

If the latter - then SYCLRTShared.h is not specific to this extension, it serves for sharing some type and string literal definitions between LLVM tools and SYCL runtime.

It would be great to get a community feedback on this approach through RFC.

Do you think this should be done before this PR can be merged or can be done in parallel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we bake into LLVM compiler framework the information about Intel specific OpenCL extensions.

@bader, under "Intel specific OpenCL extensions" do you mean SYCL as a whole or just the "linking on demand part"?

cl_intel_devicelib_assert, cl_intel_devicelib_math, cl_intel_devicelib_math_fp64, cl_intel_devicelib_complex, cl_intel_devicelib_complex_fp64 - are Intel specific OpenCL extensions.

If the former (SYCL), then in my view we already have baked a lot of SYCL into LLVM :) (for good reason) - take clang FE, for example, offload tools - bundler, wrapper, SYCL specific passes in LLVM itself. Maybe the location "llvm/Support/SYCLRTShared.h" is not ideal and there could be new folder with "SYCL" in it to underline SYCL specifics?

If the latter - then SYCLRTShared.h is not specific to this extension, it serves for sharing some type and string literal definitions between LLVM tools and SYCL runtime.

Can we apply the same approach as regular C++ compiler does? I.e. link with the whole C++ standard library (using LLVM linker) + provide an compiler option to disable SYCL extension and skip linking with this libraries? This should much easier to implement and maintain as it should not require extending LLVM functionality.

It would be great to get a community feedback on this approach through RFC.

Do you think this should be done before this PR can be merged or can be done in parallel?

Ideally the review should happen before the merge, but if there are any time constraints we can merge and re-do later if needed.

Copy link
Contributor Author

@jinge90 jinge90 Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kbobrovs @andykaylor @bader
This patch is used to fix some perf regression for cases which has "small" kernel execution time and we are required to fix it ASAP recently, could we merge it and continue to work on the RFC to collect community's feedback?
And I think it is valuable to add a header to share information between sycl runtime and llvm-tools. Currently, the header is located in llvm/Support/SYCLRTShared.h and following things are included: Property constant string such as "SYCL/specialization constants"; DeviceLibExt enum definition such as "cl_intel_devicelib_***"
We need a careful analysis and discussion on following points:

  1. The header's filename and location?
  2. The namespace used in this header?
  3. Which definitions or constants can be placed in this file?
  4. Can intel-specific things be placed in this header?
    Thank you very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this is the right approach to "fix performance regression". Moving some runtime overhead to compile time (e.g. #1398) seems to be a better option.
What is the root cause of the regression? Please, add a problem statement to the PR description.

@@ -5,15 +5,12 @@
//
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kbobrovs
Currently, all program's kernel code will be linked with our devicelib as default, so sycl runtime will use piProgramLink to link all spv images. However, after we enabled devicelib "on-demand" loading, sycl runtime will use piProgramBuild to build users' device image. For OCL backend, piProgramBuild will be mapped to OCL's clBuildProgram API. According to OCL Spec, we can only pass "Compiler options" to this API and "Link Options" is not used.
To this test, the source code doesn't depend on devicelib, so I remove all test commands using "SYCL_PROGRAM_LINK_OPTIONS".
However, current status is a little tricky, we have to keep loading "assert" fallback spv as default to workaround another separate issue, so sycl runtime will still use piProgramCompile + piProgramLink for all program's device code. Do you think we need to keep those test commands for now?
Thank you very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problems you describe are implementation detail and should be fixed. User setting SYCL_PROGRAM_LINK_OPTIONS= should not lead to crashes if link options are good. If that's the case, the implementation should be fixed, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kbobrovs
For this case, those link options will lead to crash in OCL CPU runtime, the problem is in "-enable-link-options". If no device libraries will be loaded and linked with user's kernel code, sycl runtime will use piProgramBuild for jit compilation which will be mapped to clBuildProgram. According to OCL Spec, "-enable-link-options" must be specified with "--create-library" option. I think we have 2 ways to deal with the case:

  1. If sycl runtime uses piProgramBuild, the SYCL_PROGRAM_LINK_OPTIONS env variable value will be ignored, we only pass SYCL_PROGRAM_COMPILER_OPTIONS value to underlying OCL runtime.
  2. We keep passing "SYCL_PROGRAM_LINK_OPTIONS" to underlying OCL runtime even if piProgramBuild is used, it's user's duty to set "good" SYCL_PROGRAM_LINK_OPTIONS. We need to fix this case to remove "-enable-link-options".
    Which one do you prefer or do you have any other idea?
    Thank you very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to OCL Spec, "-enable-link-options" must be specified with "--create-library" option.

I see, thanks. This makes the test invalid, so removal is OK then.
2) seems better solution. But for this PR just removal of the two invalid test cases (as you did before) should be enough

Comment on lines 40 to 44
using llvm::util::sycl::cl_intel_devicelib_assert;
using llvm::util::sycl::cl_intel_devicelib_complex;
using llvm::util::sycl::cl_intel_devicelib_complex_fp64;
using llvm::util::sycl::cl_intel_devicelib_math;
using llvm::util::sycl::cl_intel_devicelib_math_fp64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow my previous comment on this disappeared. Please don't using all the individual enum elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I see that you add merge commits when you implement review comments:

Merge remote-tracking branch 'upstream/sycl' into link_fallback_devic… …

This does not seem right to me. Maybe @bader can comment more. I think this can lead to missing review comments. You should only use git push/git pull when working with the review, so that jinge90:link_fallback_devicelib_on_demand does not have merge commits. Only if you have conflicts, you should rebase your feature branch. I do it like this:

git rebase <hash> --onto intel_llvm/sycl

where is the commit right below the first feature branch commit in git log output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kbobrovs
Thank you for the remind. I updated patch to use c++11 scoped enum for DeviceLibExt, the definition looks like:
enum class DeviceLibExt : std::uint32_t {cl_intel_devicelib_assert,.....}
By doing this, we don't need to "using" all individual enum elements, we only need to "using llvm::util::sycl::DeviceLibExt;" in source code and use "DeviceLibExt::cl_intel_devicelib_*" to refer to all DeviceLibExt enum elements. Is this OK to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you.

Comment on lines 47 to 52
using llvm::util::sycl::cl_intel_devicelib_assert;
using llvm::util::sycl::cl_intel_devicelib_complex;
using llvm::util::sycl::cl_intel_devicelib_complex_fp64;
using llvm::util::sycl::cl_intel_devicelib_math;
using llvm::util::sycl::cl_intel_devicelib_math_fp64;
using llvm::util::sycl::DeviceLibExt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't using individual enum elements

@@ -5,15 +5,12 @@
//
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_COMPILE_OPTIONS="-g" %t.out
// RUN: %CPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out
// RUN: %GPU_RUN_PLACEHOLDER SYCL_PROGRAM_LINK_OPTIONS="-enable-link-options -cl-denorms-are-zero" %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problems you describe are implementation detail and should be fixed. User setting SYCL_PROGRAM_LINK_OPTIONS= should not lead to crashes if link options are good. If that's the case, the implementation should be fixed, I think.

kbobrovs
kbobrovs previously approved these changes Jul 1, 2020
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

vzakhari
vzakhari previously approved these changes Jul 2, 2020
Copy link
Contributor

@vzakhari vzakhari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for devicelib

bader
bader previously approved these changes Jul 7, 2020
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay to merge if someone is waiting for this fix. I'd like to have separate discussion to better understand the problem this PR is supposed to solve, because I have serious concerns that we won't be able to upstream this solution.

@bader
Copy link
Contributor

bader commented Jul 7, 2020

@smaslov-intel, @intel/llvm-reviewers-runtime, ping.

@jinge90
Copy link
Contributor Author

jinge90 commented Jul 7, 2020

I'm okay to merge if someone is waiting for this fix. I'd like to have separate discussion to better understand the problem this PR is supposed to solve, because I have serious concerns that we won't be able to upstream this solution.

Hi, @bader @kbobrovs @andykaylor
"devicelib on-demand loading" is an optimization for current devicelib infra. And we can implement it without introducing header file to llvm/Support/...(We didn't touch any thing in llvm/ in the original patch) In fact, introducing some mechanism or header files to LLVM framework to share information between SYCL runtime and llvm tools is another separate problem and this problem is much more general than this patch. I agree with Alexey that we need a separate discussion on this problem.
Thank you very much,

@kbobrovs
Copy link
Contributor

kbobrovs commented Jul 7, 2020

@jinge90, I'm ok to revert the shared header and return to code duplication if there is no consensus on the solution. If you do this, please restore cross-reference in the comments between the duplicated parts and match naming.

@jinge90 jinge90 dismissed stale reviews from bader, vzakhari, and kbobrovs via 5445247 July 8, 2020 06:34
@jinge90
Copy link
Contributor Author

jinge90 commented Jul 8, 2020

@jinge90, I'm ok to revert the shared header and return to code duplication if there is no consensus on the solution. If you do this, please restore cross-reference in the comments between the duplicated parts and match naming.

Hi, @kbobrovs
I have reverted the shared header and added the comments to remind sync between sycl runtime and sycl-post-link tool.
Thank you very much.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pi changes look OK

Copy link
Contributor

@vzakhari vzakhari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for devicelib.

@jinge90
Copy link
Contributor Author

jinge90 commented Jul 10, 2020

Hi, @kbobrovs @bader @asavonic
I have removed the SYCLRTShared.h to avoid touching LLVM framework, could you help approve again.
Thank you very much.

@bader bader merged commit 9a8864c into intel:sycl Jul 14, 2020
@jinge90 jinge90 deleted the link_fallback_devicelib_on_demand branch January 26, 2021 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants