Skip to content

gh-71936: Fix race condition in multiprocessing.Pool #98274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

hattya
Copy link
Contributor

@hattya hattya commented Oct 15, 2022

Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows.

A connection will be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task.

BaseProxy does not count references well.

Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it
will call BaseProxy._decref() when it is GCed. This may cause a race condition
with Pool(maxtasksperchild=None) on Windows.

A connection will be closed and raised TypeError when a GC occurs between
_ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in
_ConnectionBase.send() in the second or later task.

BaseProxy does not count references well.
@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@ghost
Copy link

ghost commented Oct 15, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@hattya
Copy link
Contributor Author

hattya commented Oct 15, 2022

Steps to reproduce:

> xcopy /I "%ProgramFiles%\Python310\Lib\multiprocessing" multiprocessing
> type a.patch
--- a/multiprocessing/connection.py
+++ b/multiprocessing/connection.py
@@ -282,6 +282,8 @@
             _CloseHandle(self._handle)

         def _send_bytes(self, buf):
+            if (10, 1) in util._finalizer_registry:
+                import gc; gc.collect()
             ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
             try:
                 if err == _winapi.ERROR_IO_PENDING:
> patch -p1 < a.patch
> type a.py
#!/usr/bin/env python
import multiprocessing as mp

def foo(lock, i):
    def func(x):
        # for circular reference
        with lock:
            pass
        if x > 0:
            return func(x - 1)

    func(i)

if __name__ == '__main__':
    m = mp.Manager()
    lock = m.Lock()
    with mp.Pool(2) as pool:
        for _ in pool.starmap(foo, ((lock, i) for i in range(10))):
            pass
> a.py

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@basnijholt
Copy link

basnijholt commented Jun 12, 2024

Like I mentioned here #71936 (comment), this patch resolves the problem I am running into on Python 3.12.3.

@hattya, could you add a news entry like the bot suggests? I see you already did that.

@basnijholt
Copy link

basnijholt commented Sep 11, 2024

I just ran into this exact problem again in a slightly different context than my previous comment.

@encukou, @gpshead, @vstinner, and @AlexWaygood (most recent reviewers of this module – apologies for direct tag), would any of you be able to take a look at this?

This PR solves the race condition I keep running into. For example, see this build of the documentation.

@encukou
Copy link
Member

encukou commented Oct 4, 2024

My review led to a “competing” PR: #124973
@basnijholt, could you check if that also solves the issue for you?

@encukou
Copy link
Member

encukou commented Nov 13, 2024

I have merged the PR that builds on this one.
Thank you for the fix, @hattya!

@encukou encukou closed this Nov 13, 2024
@basnijholt
Copy link

basnijholt commented Nov 15, 2024

@encukou, I also replied in #71936 (comment) and it indeed solves my problem!

With this fix hopefully Readthedocs builds for https://github.com/pipefunc/pipefunc will not randomly fail now 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants