Skip to content

Commit 46561a2

Browse files
Fix hang on testsuite completion: avoid forking with threads
We have observed that sometimes a multiprocessing worker fails to properly terminate, getting stuck somewhere in the python multiprocessing internals after the whole of `process_with_threads` has completed. This results in the entire test suite hanging at 99% completion, as the process join never completes. This appears to be due to starting the `responses_processor` thread before starting the worker processes - the default multiprocessing start method on POSIX is `fork` which directly forks the python interpreter without execing. This is generally unsafe in a multithreaded environment as the child process may fork while another thread of the parent has locked arbitrary mutexes or similar, meaning they are already-locked in the child without any thread to ever unlock them, leading to deadlocks if the child ever tries to lock them itself. In fact, the default is changing to `forkserver` in Python 3.14 precisely because of subtle issues like this (see python/cpython#84559). Rather than making that same change here now, move the thread creation after the process creation to remain compatible with both `fork` and `forkserver`. There is no need to start the thread that early anyway; the worst that could happen is a few responses piling up in the meantime. This appears to fix the hang, as it has not reproduced with this patch in several days of continuous runs (where previously it reproduced within a few minutes). It is possible that the macOS-specific logic at the top of the file that "[forces] forking behavior at the expense of safety" should be revisited too, since the docs suggest that system libraries could create threads without our knowledge, but this is deferred to future work as no specific problems have been observed yet, and the docs suggest that problems here would lead to crashes rather than hangs. Co-authored-by: Andrea Segalini <[email protected]>
1 parent 4f27012 commit 46561a2

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

pytest_parallel/__init__.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -304,17 +304,6 @@ def pytest_runtestloop(self, session):
304304
for i in range(self.workers * tests_per_worker):
305305
queue.put('stop')
306306

307-
responses_processor = threading.Thread(
308-
target=self.process_responses,
309-
args=(self.responses_queue,),
310-
)
311-
responses_processor.daemon = True
312-
responses_processor.start()
313-
314-
def wait_for_responses_processor():
315-
self.responses_queue.put(('quit', {}))
316-
responses_processor.join()
317-
318307
processes = []
319308

320309
# Current process is not a worker.
@@ -327,9 +316,20 @@ def wait_for_responses_processor():
327316
process.start()
328317
processes.append(process)
329318

330-
[p.join() for p in processes]
319+
# Start the response processing thread. This must happen AFTER spawning
320+
# the subprocesses as forking a multithreaded application is not safe
321+
# and can result in deadlocks.
322+
responses_processor = threading.Thread(
323+
target=self.process_responses,
324+
args=(self.responses_queue,),
325+
)
326+
responses_processor.daemon = True
327+
responses_processor.start()
331328

332-
wait_for_responses_processor()
329+
# Wait for completion.
330+
[p.join() for p in processes]
331+
self.responses_queue.put(('quit', {}))
332+
responses_processor.join()
333333

334334
if not errors.empty():
335335
import six

0 commit comments

Comments
 (0)