Skip to content

[SYCL] Keep multiple copies for bf16 device library image #17461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 3, 2025

Conversation

jinge90
Copy link
Contributor

@jinge90 jinge90 commented Mar 14, 2025

SYCL RT addImages function may be invoked multiple times for different sycl binary images, more than 1 of these sycl binary images may depend on bfloat16 device library. These bfloat16 device library images are provided by compiler and the implementation are stable now, so we only keep single copy for native and fallback version bfloat16 device library in program manager, these images will not be removed unless program manager is destroyed.

@jinge90 jinge90 requested a review from a team as a code owner March 14, 2025 07:43
@jinge90 jinge90 requested a review from againull March 14, 2025 07:43
@jinge90 jinge90 marked this pull request as draft March 14, 2025 07:43
@jinge90 jinge90 requested a review from jopperm March 18, 2025 09:06
Comment on lines 1891 to 1915
// For bfloat16 device library image, it doesn't include any kernel, device
// global, virtual function, so just skip adding it to any related maps.
// We only need to: 1). add exported symbols to m_ExportedSymbolImages. 2).
// add the device image to m_DeviceImages used for future clean up when
// removeImage is called. RefCount is used to keep how many user device
// images are depending on native/fallback bfloat16 device library image,
// the corresponding image will be added to m_ExportedSymbolImages and
// m_DeviceImages only when RefCount is 0. These RefCount are used when
// KernelIDsGuard is acquired by current thread.
{
auto Bfloat16DeviceLibProp = Img->getDeviceLibMetadata();
if (Bfloat16DeviceLibProp.isAvailable()) {
uint32_t IsNative =
DeviceBinaryProperty(*(Bfloat16DeviceLibProp.begin())).asUint32();
if (!m_Bfloat16DeviceLibRefCount[IsNative]) {
for (const sycl_device_binary_property &ESProp :
Img->getExportedSymbols()) {
m_ExportedSymbolImages.insert({ESProp->Name, Img.get()});
}
m_DeviceImages.insert({RawImg, std::move(Img)});
}
m_Bfloat16DeviceLibRefCount[IsNative] += 1;
continue;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the special handling explicit (and only doing the nececssary things) is a good idea 👍

m_ExportedSymbolImages.erase(ESProp->Name);
}

m_DeviceImages.erase(DevImgIt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not erasing the device image here when the refcount > 0 won't keep the underlying device binaries alive. Consider the following situation:

  • bundle A loaded, contributes bfloat dev lib into runtime -> refcount = 1
  • bundle B loaded, uses bfloat dev lib -> refcount = 2
  • bundle A is freed -> refcount = 1
  • bundle C loaded, uses bfloat dev lib -> refcount = 2, and crash when PM tries to link the kernels because m_ExportedSymbols points to image coming from bundle A, which has been destroyed

@jopperm
Copy link
Contributor

jopperm commented Mar 19, 2025

As I commented inline, I don't think the refcounting alone fixes the underlying issue; rather the program manager would also need to take ownership of the bfloat device library image until the refcount drops to 0.

Given these complications, I think it would also be fair to just accept the presence of multiple copies of these special images. m_ExportedSymbols is already a multimap, so if we erase just the entries referencing the current image in removeImages (instead of blanket erase'ing of all entries with a given symbol name), we should be good.

@jopperm
Copy link
Contributor

jopperm commented Mar 19, 2025

CC @steffenlarsen for potential relation to #17442.

@jinge90
Copy link
Contributor Author

jinge90 commented Mar 19, 2025

As I commented inline, I don't think the refcounting alone fixes the underlying issue; rather the program manager would also need to take ownership of the bfloat device library image until the refcount drops to 0.

Given these complications, I think it would also be fair to just accept the presence of multiple copies of these special images. m_ExportedSymbols is already a multimap, so if we erase just the entries referencing the current image in removeImages (instead of blanket erase'ing of all entries with a given symbol name), we should be good.

Hi, @jopperm
I agree that keeping multiple copies in m_ExportedSymbols is acceptable. I worked out a way to move these special device images in a separate container and maintain it carefully to add/removeImages to guarantee only 1 copy exists but the code is too complicated and seems not to be the right direction.
The updated patch has following behavior:

  1. Each time addImages is called to add a user sycl binary, the included bf16 devicelib image will be registered into m_ExportedSymbols map.
  2. Each time removeImages is called to remove a user sycl binary, for any bf16 device image included, we check m_ExportedSymbols map and only remove the entries whose value is current bf16 device image ptr. When current sycl binary is removed, all entries from the removed sycl binary is cleared from m_ExportedSymbols map.

Thanks very much.

@jinge90 jinge90 requested a review from steffenlarsen March 19, 2025 07:34
Signed-off-by: jinge90 <[email protected]>
@jinge90
Copy link
Contributor Author

jinge90 commented Apr 1, 2025

Hi, @jopperm
Could you help review again?
Thanks very much.

@steffenlarsen steffenlarsen requested a review from jopperm April 1, 2025 10:38
Copy link
Contributor

@jopperm jopperm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor feedback, approach LGTM. Please also add a description for the PR.

int Bfloat16DeviceLibVersion = -1;
if (m_Bfloat16DeviceLibImages[0].get() == BinImage)
Bfloat16DeviceLibVersion = 0;
if (m_Bfloat16DeviceLibImages[1].get() == BinImage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (m_Bfloat16DeviceLibImages[1].get() == BinImage)
else if (m_Bfloat16DeviceLibImages[1].get() == BinImage)

if (!LibVersion)
return true;

*LibVersion = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to use 0 (= a valid lib version) here? Should that be ~0U or 2 or so?

Comment on lines 538 to 539
test_device_libraries(q) || test_device_libraries(q) ||
test_device_libraries(q) || test_unsupported_options(q) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test_esimd here (also, 3x test_device_libraries).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
Thanks very much.

// and 1 for native version. These bfloat16 device library images are
// provided by compiler long time ago, we expect no further update, so
// keeping 1 copy should be OK.
std::unordered_map<uint32_t, DynRTDeviceBinaryImageUPtr>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be just a std::array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @jopperm
Yes, already updated the PR.
Thanks very much.

@jinge90 jinge90 temporarily deployed to WindowsCILock April 2, 2025 03:11 — with GitHub Actions Inactive
@jinge90 jinge90 marked this pull request as ready for review April 2, 2025 03:12
Copy link
Contributor

@jopperm jopperm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating this, LGTM now. Please update the PR title and add a description.

@jinge90 jinge90 temporarily deployed to WindowsCILock April 3, 2025 01:53 — with GitHub Actions Inactive
@jinge90 jinge90 temporarily deployed to WindowsCILock April 3, 2025 01:53 — with GitHub Actions Inactive
@steffenlarsen steffenlarsen merged commit 80fd665 into intel:sycl Apr 3, 2025
34 of 35 checks passed
ggojska pushed a commit to ggojska/llvm that referenced this pull request Apr 7, 2025
SYCL RT addImages function may be invoked multiple times for different
sycl binary images, more than 1 of these sycl binary images may depend
on bfloat16 device library. These bfloat16 device library images are
provided by compiler and the implementation are stable now, so we only
keep single copy for native and fallback version bfloat16 device library
in program manager, these images will not be removed unless program
manager is destroyed.

---------

Signed-off-by: jinge90 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants