Skip to content

[SYCL] Embed bfloat16 devicelib into executable if necessary #16729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
Mar 12, 2025

Conversation

jinge90
Copy link
Contributor

@jinge90 jinge90 commented Jan 22, 2025

Currently, sycl bfloat16 conversion functions are implemented in 2 devicelib spvs(fallback version and native version).
The native version targets for any platform which supports "cl_intel_bfloat16_conversions" extension and fallback version is used for all other platforms.
SYCL runtime will select the bfloat16 spvs during execution time by checking bfloat16 extension.
The design requires us to ship 2 spv files together with sycl runtime which some users may dislike.
The PR uses sycl dynamic library mechanism to re-implement this behavior. These 2 bfloat16 lib files are regarded as dynamic library and embedded to final executable, so we don't need to ship any bfloat16 spv libs.
The PR consists following changes:

  1. Driver: pass the devicelib file location to sycl-post-link tool.
  2. sycl-post-link: analyze user's device image to see whether bfloat16 devicelib functions are used. If yes, add the 2 bfloat16 devicelib files as "required" dynamic library. All required bfloat16 devicelib functions are treated as "imported" symbols in user's device image and all functions in bfloat16 devicelib are "exported" symbols.
  3. Sycl runtime will load and link the required bfloat16 devicelib image and resolve the imported symbols.

Fallback and native version of bfloat16 devicelib files have exactly same exported functions, we add a new metadata("SYCL_DEVICELIB_BF16_TYPE") to indicate the version in them. SYCL runtime will check cl_intel_bfloat16_conversions extension and this metadata to decide which version will be linked.

@jinge90 jinge90 requested review from a team as code owners January 22, 2025 08:58
@jinge90 jinge90 marked this pull request as draft January 22, 2025 08:58
Signed-off-by: jinge90 <[email protected]>
@jinge90 jinge90 temporarily deployed to WindowsCILock March 7, 2025 07:01 — with GitHub Actions Inactive
@jinge90
Copy link
Contributor Author

jinge90 commented Mar 7, 2025

Hi, @mdtoguchi
Could you help review the driver part again?
Thanks very much!

@jinge90
Copy link
Contributor Author

jinge90 commented Mar 7, 2025

Hi, @maksimsab
Do you have any more concern or comments?
Thanks very much.

@jinge90
Copy link
Contributor Author

jinge90 commented Mar 10, 2025

Hi, @maksimsab Do you have any more concern or comments? Thanks very much.

Hi, @maksimsab
Kind ping~

; CHECK-BF16: [SYCL/imported symbols]
; CHECK-BF16-NEXT: __devicelib_ConvertFToBF16INTEL

%"class.sycl::_V1::id" = type { %"class.sycl::_V1::detail::array" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove all unnecessary instructions, attributes and metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @maksimsab
Done.
Thanks very much!

@@ -751,7 +794,7 @@ bool isTargetCompatibleWithModule(const std::string &Target,
}

std::vector<std::unique_ptr<util::SimpleTable>>
processInputModule(std::unique_ptr<Module> M) {
processInputModule(std::unique_ptr<Module> M, LLVMContext &Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for LLVMContext here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @maksimsab
Done.
Thanks for the careful review.

@@ -775,3 +752,44 @@ uint32_t llvm::getSYCLDeviceLibReqMask(const Module &M) {
}
return ReqMask;
}

static llvm::SmallVector<const char *, 14> BF16DeviceLibFuncs = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use static llvm::SmallVector<StringRef, 14>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @maksimsab
Done.
Thanks very much!

Copy link
Contributor

@maksimsab maksimsab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jinge90 jinge90 requested a review from a team March 12, 2025 14:03
@jinge90
Copy link
Contributor Author

jinge90 commented Mar 12, 2025

Hi, @intel/llvm-gatekeepers
Could you help merge this PR?
Thanks very much.

@sommerlukas sommerlukas merged commit e2eaf58 into intel:sycl Mar 12, 2025
23 checks passed
sommerlukas pushed a commit that referenced this pull request Mar 14, 2025
Incorporates recent changes to `sycl-post-link` into the RTC-specific
version:
- #16729
- #16236
- #17211

---------

Signed-off-by: Julian Oppermann <[email protected]>
martygrant pushed a commit that referenced this pull request Apr 16, 2025
…ion (#18023)

As agreement in #16729 , In general we
discourage the use of the extension strings as it is a remnant of the
OpenCL days. Instead we want to use urDeviceGetInfo when possible. This
is a follow-up pr to replace checking OpenCL extension string with
urGetDeviceInfo for native bfloat16 conversion extension.

---------

Signed-off-by: jinge90 <[email protected]>
Co-authored-by: Steffen Larsen <[email protected]>
uditagarwal97 added a commit that referenced this pull request Apr 22, 2025
…18108)

#16729 added support to embed BF16
device lib in executable using dynamic linking feature. However, it does
not work with `--offload-compress`. This PR fixes that.
See CMPLRLLVM-66723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants