Skip to content

[SYCL] Reduce heap allocations from sycl::local_accessor #17147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 27, 2025

Conversation

aelovikov-intel
Copy link
Contributor

Ideally, there should be zero but we can't fix that without ABI break (both present, and potential future one for unanticipated future changes).

Reduce what we can now:

  • don't allocate memory on host (no host device, APIs accessing memory must be called from inside "command")
  • Use std::make_shared to fold two allocation into one for pImpl idiom

Ideally, there should be zero but we can't fix that without ABI break (both
present, and potential future one for unanticipated future changes).

Reduce what we can now:
- don't allocate memory on host (no host device, APIs accessing memory
  must be called from inside "command")
- Use `std::make_shared` to fold two allocation into one for pImpl idiom
@@ -119,19 +119,12 @@ const sycl::range<3> &LocalAccessorBaseHost::getSize() const {
return impl->MSize;
}
void *LocalAccessorBaseHost::getPtr() {
// Const cast this in order to call the const getPtr.
return const_cast<const LocalAccessorBaseHost *>(this)->getPtr();
// Must not be called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of debugging, in case this doesn't hold, could we have an assert here that always fails?

Copy link
Contributor Author

@aelovikov-intel aelovikov-intel Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have error handling in the derived classes before/instead "delegating" to here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. The reason I would like an assert here is to avoid us unintentionally adding code to the runtime that call these functions, be it directly or through other functions (like getQualifiedPtr().)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding an assertion that can't be reached costs approximately nothing and can save our bacon some time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts won't be useful, because library is built in release mode. Added abort, even though I still don't see much value here. Ideally, these shouldn't exist at all, but I can't remove them as that would remove a symbol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree, these should be removed ASAP, but the assert would be there to ensure us developers don't do anything we shouldn't do. It may not be a huge amount of value, but it's practically free and would make it easier for the runtime dev to understand why things are crashing over a potential nullptr-deref. I can live with abort too though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abort was hit by some MKL tests, when running on Windows, but not on Linux?? See CMPLRLLVM-65833

@mabraham
Copy link

@rolandschulz FYI

@aelovikov-intel aelovikov-intel marked this pull request as ready for review February 27, 2025 15:45
@aelovikov-intel
Copy link
Contributor Author

HostInteropTask/interop-task-cuda-buffer-migrate.cpp is unrelated: #17026.

@aelovikov-intel aelovikov-intel merged commit 78613dc into intel:sycl Feb 27, 2025
19 of 20 checks passed
@aelovikov-intel aelovikov-intel deleted the reduce-local-acc-allocs branch February 27, 2025 18:50
@KornevNikita
Copy link
Contributor

@aelovikov-intel this patch broke sycl-cts/test_accessor_legacy - https://github.com/intel/llvm/actions/runs/13580511533

@aelovikov-intel
Copy link
Contributor Author

aelovikov-intel commented Feb 28, 2025

@aelovikov-intel this patch broke sycl-cts/test_accessor_legacy - https://github.com/intel/llvm/actions/runs/13580511533

I believe this is a bug in CTS. local/constant_buffer accessor restrict get_pointer() to only be called from inside "command". I'll upload a CTS PR shortly.

KhronosGroup/SYCL-CTS#1061

aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Mar 18, 2025
aelovikov-intel added a commit that referenced this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants