Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Move GC to a separate thread #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
iritkatriel opened this issue Jun 10, 2021 · 33 comments
Closed

Move GC to a separate thread #58

iritkatriel opened this issue Jun 10, 2021 · 33 comments

Comments

@iritkatriel
Copy link
Collaborator

Antoine started working on this in PEP 556.
@pitrou, can you update us where you are with this idea?

@gvanrossum gvanrossum mentioned this issue Jun 10, 2021
@pitrou
Copy link

pitrou commented Jun 11, 2021

I abandoned work on it. The implementation worked, but there was an obscure crash that I didn't manage to diagnose.
I believe @pablogsal was interested at some point, but he probably got distracted :-)
You should feel free to take it up.

@pitrou
Copy link

pitrou commented Jun 11, 2021

Note the challenge is to run the test suite with threaded GC enabled:

diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c
index 1467527da3..d6e8129849 100644
--- a/Python/pylifecycle.c
+++ b/Python/pylifecycle.c
@@ -901,7 +901,7 @@ _Py_InitializeMainInterpreter(PyInterpreterState *interp,
     }
 
     /* XXX allow setting GC mode with an env var? */
-    if (_PyGC_SetThreaded(0)) {
+    if (_PyGC_SetThreaded(1)) {
         Py_FatalError("_PyGC_SetThreaded failed");
     }
 

@iritkatriel
Copy link
Collaborator Author

Thanks. I see the crash in test__xxsubinterpreters. Also a few other tests are failing due to what looks like finalizers not being called. I'll try to understand what you did.

@pablogsal
Copy link
Collaborator

Thanks. I see the crash in test__xxsubinterpreters. Also a few other tests are failing due to what looks like finalizers not being called. I'll try to understand what you did.

I actually ended fixing some of the finalizer bugs, but I need to find where i stored that work 😅

In any case, is interesting to note that my benchmarks showed a little slowdown in general (although not much). I suspect this is due to the extra context switching, but in any case I don't think this will actually end giving us any speedups, although it will potentially help with the problems described in PEP 556

@pitrou
Copy link

pitrou commented Jun 12, 2021

Speedups were not the goal indeed, but to make the GC context more predictable (instead of GC-induced code such as finalizers running at potentially any point). Even latency would probably not be improved very much, because of the GIL (I don't know how workable it would be to release the GIL at some points of garbage collection).

@pitrou
Copy link

pitrou commented Jun 12, 2021

Note that this GC execution context issue is actually extremely serious for real-world code (anything with complex finalizers with side effects). It also motivated the creation of the SimpleQueue object, which can help alleviate the problem, but at the cost of additional complexity (and potential bugs or performance issues as well).

@pablogsal
Copy link
Collaborator

Note that this GC execution context issue is actually extremely serious for real-world code (anything with complex finalizers with side effects).

Yep, that's why I was interested in pushing it in the first place but got side-tracked by the parser work and extinguish the fires of the 3.9 release :(

@gvanrossum
Copy link
Collaborator

OTOH in the context of "Faster CPython" the idea was to make things faster by letting a second core handle (most of) the GC work -- this is what some other languages do. But it would require something like the Gilectomy first.

@pitrou
Copy link

pitrou commented Jun 13, 2021

Right, but those "other languages" are generally fully GC-based. In CPython, much of the cost of reclaiming memory is born by regular reference counting.

@markshannon
Copy link
Member

Regardless of where we run the "GC", i.e. the cycle finder and breaker, it would make sense to improve the general decref/dealloc code.

We want to:

  • maintain the invariant that all live objects have refcount > 0
  • not blow the C stack
  • make dealloc as efficient as we reasonably can.

To keep dealloc efficient we want to free memory for simple objects without having to chase function pointers, and not to have to worry about finalizers and weakrefs when clearing internal references.
We also want to handle the "trashcan" mechanism in dealloc, so that we don't need to handle it repeatedly in an ad-hoc fashion in many different places.

Something like this (in C flavoured Python):

@inline
def Py_DECREF(obj):
    # Refcount never reaches 0 unless object is genuinely unreachable.
    if obj.refcount == 1:
        dealloc(obj)
    else:
        obj.refcount -= 1

def dealloc(obj):
    if has_weakrefs(obj) or needs_finalizing(obj):
        finalizer_list.append(obj)
    elif has_internal_references(obj):
        recursive_clear_and_free(obj)
    else:
        # Objects that cannot refer to other objects e.g. ints, strs.
        # Implicitly reduce refcount to zero.
        free(obj)

def recursive_clear_and_free(obj):
     # obj cannot be "resurrected" as there are no references to it; strong or weak.
    if tstate.recursive_dealloc > TRASHCAN_LIMIT: # Currently limit is 50?
        trashcan_list.append(obj)
        return
    tstate.recursive_dealloc += 1
    for offset in obj.__class__.layout:
        item = obj[offset]
        obj[offset] = NULL
        Py_DECREF(item)
    assert obj.refcount == 1
    # Implicitly reduce refcount to zero.
    free(obj)
    tstate.recursive_dealloc -= 1
    if tstate.recursive_dealloc == 0:
        while trashcan_list:
            obj = trashcan_list.pop()
            Py_DECREF(obj)
        if finalizer_list and not RUN_FINALIZERS_IN_THREAD:
             clear_finalizer_list()

def clear_finalizer_list():
    while finalizer_list:
        item = finalizer_list.pop()
        if has_weak_refs(item):
            do_weak_ref_callbacks(item)
        if has_finalizer(item):
            finalize(item)
        Py_DECREF(item)
    

if RUN_FINALIZERS_IN_THREAD:
    def thread_run_clear_finalizer_list():
        while True:
            clear_finalizer_list()
            #sleep until there is something to do

I have done absolutely no testing on the above. So it will be incorrect. Hopefully the general idea is clear though.

@iritkatriel
Copy link
Collaborator Author

@markshannon Arranging gc like that would make it more predictable but could slow it down, right? (I’m assuming that trashcan only kicks in after 50 because it’s slower so not worth it for small chains.)

@markshannon
Copy link
Member

I think it should be quicker as type-specific dealloc code doesn't need worry about trashcans, finalizers or resurrection.

Maintaining a stack of objects, and making deallocation iterative rather than recursive should be even quicker. Pushing a single pointer is going to faster than making a C call.

@iritkatriel
Copy link
Collaborator Author

I'm assuming you meant that the last line of clear_finalizer_list is obj.refcount -= 1 rather than Py_DECREF(item) recursively.

I think this can get complicated because we need to make sure that we don't add the same object twice to the finalizer_list (right?), and I don't see how we do that here - for instance, if it gets resurrected and then decreffed twice during do_weak_ref_callbacks/finalize.

Another question - if we consume the finalizer_list on a separate thread then it's on the interpreter state rather than the thread state? Or the separate thread looks at the different thread's lists? In either case, doesn't this require locking?

@pablogsal
Copy link
Collaborator

I'm assuming you meant that the last line of clear_finalizer_list is obj.refcount -= 1 rather than Py_DECREF(item) recursively.

I think this can get complicated because we need to make sure that we don't add the same object twice to the finalizer_list (right?), and I don't see how we do that here - for instance, if it gets resurrected and then decreffed twice during do_weak_ref_callbacks/finalize.

This is indeed an extremely delicate area of the GC where very obscure bugs have been found over the years. I would be very careful to introduce any changes of that part of the GC machinery without a very good reason, because causing obscure regressions is super easy, and even more dangerous when you involve weak references with types that have tp_clear.

@markshannon
Copy link
Member

Why is this fragile? It shouldn't be. I believe the reason it is so fragile is that we break the most important invariant of refcounting, that refcount != 0 for any object that is live. Weakrefs and finalizers keep objects live until they are cleared. The concept of "resurrection" is fundamentally broken.

The algorithm above #58 (comment) should maintain that invariant, so weakref callbacks and finalizers can do as they please.
An object should be deallocated iff its reference count is about to hit 0 and there are no remaining finalizers or weakrefs.

I think this can get complicated because we need to make sure that we don't add the same object twice to the finalizer_list

Yes. The code should be:

def dealloc(obj):
    if has_weakrefs(obj) or obj.needs_finalizing:
        obj.needs_finalizing = False
        finalizer_list.append(obj)
   ...

@iritkatriel
Copy link
Collaborator Author

Do you mean and rather than or?

(I don't see how this takes into account everything that can happen in the weakref callback.)

The concept of "resurrection" is fundamentally broken.

I agree.

@markshannon
Copy link
Member

Yes, that is an or, although something like

def dealloc(obj):
    if has_weakrefs(obj):
        weakref_list.append(obj)
    elif obj.needs_finalizing:
        obj.needs_finalizing = False
        finalizer_list.append(obj)

might be better.

weakref callback functions are just functions. There is nothing magical about them.

@iritkatriel
Copy link
Collaborator Author

weakref callback functions are just functions. There is nothing magical about them.

Sure, but with most functions we don't have resurrection concerns. You're suggesting to replace the GC algorithm by a state transition algorithm, which is currently not fully specified, so it's hard to see how we prevent objects from going into the finalizing stage more than once.

@markshannon
Copy link
Member

So let's not have resurrection. Once an object is dead, it is dead.
Preventing an object being finalized more than once is trivial; have a flag on the object.

The tricky part is getting there from where we are now 🙁

@iritkatriel
Copy link
Collaborator Author

oh, I see. But that's a change in the language.

@pablogsal
Copy link
Collaborator

pablogsal commented Aug 6, 2021

Removing resurrection is not possible: there is a lot of code our there that relies on that.

We have already explored that route many times when improving the GC and the conclusion is that it will break many things in very obscure ways so is s very backwards incompatible change.

@benjaminp
Copy link

We already prevent GC objects from being finalized multiple times as a consequence of PEP 442. That's what _PyGCHead_FINALIZED is for.

@nanjekyejoannah
Copy link

Am personally interested in making some headway to realize PEP 556, other than @pitrou 's progress, has anyone made any more changes, if so is there a fork I can access? I have planned to work on this during the sprint to also abate other things related to the Python GC research I do. @pablogsal do you have any additional progress you made in addition to @pitrou's work?

@pablogsal
Copy link
Collaborator

pablogsal commented Sep 14, 2021

@pablogsal do you have any additional progress you made in addition to @pitrou's work?

(Sorry for the delay, I just returned from vacation)

Yeah, I did some work on top of it (see #58 (comment)) (mainly fixing some obscure edge cases and bugs) but I need to search for it. Otherwise, I think I mostly remember where the problems were so I will be happy to collaborate in the sprint :)

@nanjekyejoannah
Copy link

Good, @pablogsal , when you are successful in the search, please let me know. @pitrou where is your fork, I can lurk at that too. I don't plan to work on it much before the sprint anyway, so let's plan to work on it then.

@pablogsal
Copy link
Collaborator

@pitrou where is your fork,

The initial work by @pitrou can be found here:

https://github.com/pitrou/cpython/tree/threaded_gc

@nanjekyejoannah
Copy link

Noted and thanks @pablogsal

@iritkatriel
Copy link
Collaborator Author

Mark:

So let's not have resurrection. Once an object is dead, it is dead.

Pablo:

Removing resurrection is not possible: there is a lot of code our there that relies on that.

What if resurrection was not done in __del__ but in a new dunder function that just resurrects or not, but doesn't do any finalization. Then we still support resurrection, and it is clear when an object is alive or dead. Could we then simplify GC along the lines of Mark's proposal?

@nanjekyejoannah - I'd like to join you in the sprint as well.

@pablogsal
Copy link
Collaborator

What if resurrection was not done in __del__ but in a new dunder function that just resurrects or not, but doesn't do any finalization. Then we still support resurrection, and it is clear when an object is alive or dead. Could we then simplify GC along the lines of Mark's proposal?

That's is unfortunately still backwards incompatible because current code that resurrects on purpose (or not) is doing it in __del__ and we cannot change that.

Notice that currently is clear when an object is alive or death, and also when has been finalized or resurrected. The problem is that resurrections affect the GC and finalizes, and those semantics are guaranteed.

@iritkatriel
Copy link
Collaborator Author

That's is unfortunately still backwards incompatible because current code that resurrects on purpose (or not) is doing it in del and we cannot change that.

Sure, it would take several versions to deprecate resurrection in __del__. But would it help simplify gc?

IIUC the complication with resurrection is that you can't call finalise twice so you can have an object which is still alive after being finalised.

@pablogsal
Copy link
Collaborator

pablogsal commented Sep 18, 2021

That's is unfortunately still backwards incompatible because current code that resurrects on purpose (or not) is doing it in del and we cannot change that.

Sure, it would take several versions to deprecate resurrection in __del__. But would it help simplify gc?

Not by a lot. There is some complexity in resurrection but it is not one of the biggest pain points of the GC, like for instance weak references and tp_clear. The code that handles resurrection is quite contained and since some changes I worked in Python 3.9, it doesn't even abort the full collection anymore.

IIUC the complication with resurrection is that you can't call finalise twice so you can have an object which is still alive after being finalised

Yeah but those effects are almost non problematic. We just set a flag that says "this object had been already finalized" and to all purposes the resurrected object is alive like any other object.

@tiran
Copy link

tiran commented Oct 18, 2021

If you move GC to a separate thread, then please include an API to run functions in that thread, too. Language bindings like JCC or PythonNet must be able to register the thread with Java JVM and .NET CLR. Otherwise GC of a Python object with Java reference can lead to a segfault.

@njsmith
Copy link

njsmith commented Oct 18, 2021

Async generators are built-in objects whose semantics fundamentally rely on resurrection. Their __del__ just resurrects the object and passes it to asyncio (or whatever async loop is running) to run the real finalizer. I don't love this fact; it definitely makes other things awkward. But I don't see any realistic alternatives.

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants