Skip to content

[SYCL] Store stream buffers in the scheduler #2416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 4, 2020

Conversation

againull
Copy link
Contributor

@againull againull commented Sep 3, 2020

Stream buffers need to be alive after submitting a kernel because it is
executed by the scheduler asynchronously. For this reason currently
stream buffers are stored in an associated stream object. This stream
object is passed to the handler and then forwarded further to a command
group to keep stream buffers alive for the scheduler.

But there is a problem with this approach. A command group cannot be
destroyed while stream buffers (which are accessed in this command
group) are alive. Stream buffers are destroyed only if the stream
is destroyed. Stream object is destroyed only if command group is
destroyed. So, there is a loop dependency. Which results in memory leaks.

Solution is to store stream buffers in the scheduler for each stream.
With this approach resources are released properly.

Stream buffers need to be alive after submitting a kernel because it is
executed by the scheduler asynchronosly. For this reason currently
stream buffers are stored in an associated stream object. This stream
object is passed to the handler and then forwarded further to a commandi
group to keep streamm buffers alive for the scheduler.

But there is a problem with this approach. A command group cannot be
destroyed while stream buffers (which are accessed in this command
group) are alive. Stream buffers are destroyed only if the stream
is destroyed. Stream object is destrtoyed only if command group is
destroyed. So, there is a loop dependcy. Which results in memory leaks.

Solution is to store stream buffers in the scheduler for each stream.
With this approach resources are released properly.
@againull againull requested a review from a team as a code owner September 3, 2020 08:59
@againull againull requested a review from s-kanaev September 3, 2020 08:59
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.
A couple of comments only.
Some LIT tests failed also.

@@ -715,6 +715,38 @@ class Scheduler {

friend class Command;
friend class DispatchHostTask;

class StreamBuffers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need at least some info on this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added info in the style of this file

@@ -715,6 +715,38 @@ class Scheduler {

friend class Command;
friend class DispatchHostTask;

class StreamBuffers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all public. Why is it class then? Suggest switching to struct here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, switched to structure

Comment on lines 6 to 12
//==----------------------- release_resources_test.cpp ---------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part isn't needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Contributor Author

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed CI fails (hopefully testing will pass). Problem is that scheduler is a static object. And looks like some runtimes (for example, cuda, cpu) are unloaded before destructor of the scheduler is called. When buffers are deleted we at least call wait() for events in the scheduler, so this is a problem if device runtime is unloaded.
That is why I added changes to remove buffers for a stream object as soon as stream is flushed. We can do this because flushing is a blocking operation at this moment but this is going to be changed in the future.

@@ -715,6 +715,38 @@ class Scheduler {

friend class Command;
friend class DispatchHostTask;

class StreamBuffers {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added info in the style of this file

@@ -715,6 +715,38 @@ class Scheduler {

friend class Command;
friend class DispatchHostTask;

class StreamBuffers {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, switched to structure

Comment on lines 6 to 12
//==----------------------- release_resources_test.cpp ---------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@againull againull requested a review from s-kanaev September 4, 2020 06:36
s-kanaev
s-kanaev previously approved these changes Sep 4, 2020
@bader
Copy link
Contributor

bader commented Sep 4, 2020

@againull, please, take a look at CI regressions.

@bader
Copy link
Contributor

bader commented Sep 4, 2020

/summary:run

@againull
Copy link
Contributor Author

againull commented Sep 4, 2020

@againull, please, take a look at CI regressions.

@bader , I am quite sure it is caused by #2409

@bader
Copy link
Contributor

bader commented Sep 4, 2020

@againull, please, take a look at CI regressions.

@bader , I am quite sure it is caused by #2409

@againull, why do you think so? There are no CI failures in this PR pre-commit.
@yanfeng3721, could you confirm that #2409 introduced this regression?

@againull
Copy link
Contributor Author

againull commented Sep 4, 2020

@againull, please, take a look at CI regressions.

@bader , I am quite sure it is caused by #2409

@againull, why do you think so? There are no CI failures in this PR pre-commit.
@yanfeng3721, could you confirm that #2409 introduced this regression?

This test FAIL: SYCL::image_accessor_readsampler.cpp failed in precommit for #2409, but adding some new changes to #2409 triggered new testing and it passed, I believe it is a flaky failure. And this test is completely unrelated to changes of this PR.

@bader
Copy link
Contributor

bader commented Sep 4, 2020

@againull, @yanfeng3721, please, either revert the #2409 caused the regression or temporarily disable failing test.
Please, keep CI system sane.

@yanfeng3721
Copy link
Contributor

Hi , I have create a PR: #2422 to disable the flaky test temporarily .

@againull
Copy link
Contributor Author

againull commented Sep 4, 2020

/summary:run

@bader bader requested a review from s-kanaev September 4, 2020 10:59
@bader bader merged commit 841e1e7 into intel:sycl Sep 4, 2020
@againull againull deleted the stream_release_resources branch December 3, 2022 00:03
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
[L0] Disabling Driver In Order Lists by default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants