You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix hang on testsuite completion: avoid forking with threads
We have observed that sometimes a multiprocessing worker fails to
properly terminate, getting stuck somewhere in the python
multiprocessing internals after the whole of `process_with_threads` has
completed. This results in the entire test suite hanging at 99%
completion, as the process join never completes.
This appears to be due to starting the `responses_processor` thread
before starting the worker processes - the default multiprocessing start
method on POSIX is `fork` which directly forks the python interpreter
without execing. This is generally unsafe in a multithreaded environment
as the child process may fork while another thread of the parent has
locked arbitrary mutexes or similar, meaning they are already-locked in
the child without any thread to ever unlock them, leading to deadlocks
if the child ever tries to lock them itself.
In fact, the default is changing to `forkserver` in Python 3.14
precisely because of subtle issues like this (see
python/cpython#84559). Rather than making that
same change here now, move the thread creation after the process
creation to remain compatible with both `fork` and `forkserver`. There
is no need to start the thread that early anyway; the worst that could
happen is a few responses piling up in the meantime.
This appears to fix the hang, as it has not reproduced with this patch
in several days of continuous runs (where previously it reproduced
within a few minutes).
It is possible that the macOS-specific logic at the top of the file that
"[forces] forking behavior at the expense of safety" should be revisited
too, since the docs suggest that system libraries could create threads
without our knowledge, but this is deferred to future work as no
specific problems have been observed yet, and the docs suggest that
problems here would lead to crashes rather than hangs.
Co-authored-by: Andrea Segalini <[email protected]>
0 commit comments