-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Store stream buffers in the scheduler #2416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stream buffers need to be alive after submitting a kernel because it is executed by the scheduler asynchronosly. For this reason currently stream buffers are stored in an associated stream object. This stream object is passed to the handler and then forwarded further to a commandi group to keep streamm buffers alive for the scheduler. But there is a problem with this approach. A command group cannot be destroyed while stream buffers (which are accessed in this command group) are alive. Stream buffers are destroyed only if the stream is destroyed. Stream object is destrtoyed only if command group is destroyed. So, there is a loop dependcy. Which results in memory leaks. Solution is to store stream buffers in the scheduler for each stream. With this approach resources are released properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
A couple of comments only.
Some LIT tests failed also.
@@ -715,6 +715,38 @@ class Scheduler { | |||
|
|||
friend class Command; | |||
friend class DispatchHostTask; | |||
|
|||
class StreamBuffers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need at least some info on this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added info in the style of this file
@@ -715,6 +715,38 @@ class Scheduler { | |||
|
|||
friend class Command; | |||
friend class DispatchHostTask; | |||
|
|||
class StreamBuffers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all public. Why is it class then? Suggest switching to struct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, switched to structure
//==----------------------- release_resources_test.cpp ---------------------==// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part isn't needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fixed CI fails (hopefully testing will pass). Problem is that scheduler is a static object. And looks like some runtimes (for example, cuda, cpu) are unloaded before destructor of the scheduler is called. When buffers are deleted we at least call wait() for events in the scheduler, so this is a problem if device runtime is unloaded.
That is why I added changes to remove buffers for a stream object as soon as stream is flushed. We can do this because flushing is a blocking operation at this moment but this is going to be changed in the future.
@@ -715,6 +715,38 @@ class Scheduler { | |||
|
|||
friend class Command; | |||
friend class DispatchHostTask; | |||
|
|||
class StreamBuffers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added info in the style of this file
@@ -715,6 +715,38 @@ class Scheduler { | |||
|
|||
friend class Command; | |||
friend class DispatchHostTask; | |||
|
|||
class StreamBuffers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, switched to structure
//==----------------------- release_resources_test.cpp ---------------------==// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
@againull, please, take a look at CI regressions. |
/summary:run |
This test FAIL: SYCL::image_accessor_readsampler.cpp failed in precommit for #2409, but adding some new changes to #2409 triggered new testing and it passed, I believe it is a flaky failure. And this test is completely unrelated to changes of this PR. |
@againull, @yanfeng3721, please, either revert the #2409 caused the regression or temporarily disable failing test. |
Hi , I have create a PR: #2422 to disable the flaky test temporarily . |
/summary:run |
[L0] Disabling Driver In Order Lists by default
Stream buffers need to be alive after submitting a kernel because it is
executed by the scheduler asynchronously. For this reason currently
stream buffers are stored in an associated stream object. This stream
object is passed to the handler and then forwarded further to a command
group to keep stream buffers alive for the scheduler.
But there is a problem with this approach. A command group cannot be
destroyed while stream buffers (which are accessed in this command
group) are alive. Stream buffers are destroyed only if the stream
is destroyed. Stream object is destroyed only if command group is
destroyed. So, there is a loop dependency. Which results in memory leaks.
Solution is to store stream buffers in the scheduler for each stream.
With this approach resources are released properly.