-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ on macOS #153050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Any chance you could share a stack trace of the hang? From the two macos machines I could run it on, it either errors out or finishes just fine... |
When I press Ctrl+C process just ends without printing anything:
I tried attaching with GDB installed from Homebrew but I get "Don't know how to attach." error. Actually I got something from Activity Monitor Sample Process function. Main process:
Resource tracker process:
Does it help? Edit: after adding some prints I determined that this |
I can not reproduce the hang on my end, though to be fair I'm using local build rather than 2.7.0
@rafalh do you mind sharing output of |
|
This is basically how I test it:
A friend from work/team reproduced it as well, so it is not limited to my macBook. |
We are experiencing the same issue on macOS, both on x86 and ARM architectures. The problem is specific to CPython 3.12.10, as everything works correctly with CPython 3.12.9. The issue doesn't occur on other Unix-based systems or with different Python versions. We haven't determined whether the root cause lies in PyTorch or Python itself. |
This looks suspicious (from Python 3.12.10 changelog):
Also this:
I didn't test yet with older Python version. I'll try tomorrow. Edit: |
CPython 3.12.10 caused hanging issues in MacOS as it is unable to cleanup multiprocessor resource tracker processes. See PyTorch issue #153050: pytorch/pytorch#153050
CPython 3.12.10 caused hanging issues in MacOS as it is unable to cleanup multiprocessor resource tracker processes. See PyTorch issue #153050: pytorch/pytorch#153050
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
If a tensor is sent in multiprocessing queue, something blocks the process from ending after the end of script is reached (I have to press Ctrl+C to end the program).
It seems to be related to the resource tracker (
multiprocessing.resource_tracker.ResourceTracker
) process started by Python automatically, because when the process should end I can see resource tracker child process in the process tree and if I kill it the main process ends successfully.The problem occurs in Python 3.12. It doesn't occur in Python 3.11. I am using macOS Sequoia. I tried running examples in Ubuntu container and couldn't reproduce the problem there, so it may be macOS specific. Multiple Torch versions are affected - I tested 2.2.0 (the oldest one installing successfully in Python 3.12) and 2.7.0 (the latest)
Calling
multiprocessing.set_start_method("fork")
fixes the issue (default start method isspawn
), but it is not recommended according to Python docs. Start methodsspawn
andforkserver
do not work.Example using
DataLoader
:Example using just a tensor and a queue:
In both cases program after printing "DONE?" does not end (unless interrupted with Ctrl+C) and the process tree looks like this:
The second example works fine when sending non-tensor values, e.g.
int
.Versions
((venv_py312) ) ~/tmp$ python collect_env.py
/Users/rafal.harabien/tmp/venv_py312/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Collecting environment information...
PyTorch version: 2.7.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.4.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.3)
CMake version: version 4.0.1
Libc version: N/A
Python version: 3.12.10 (main, Apr 8 2025, 11:35:47) [Clang 16.0.0 (clang-1600.0.26.6)] (64-bit runtime)
Python platform: macOS-15.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M3 Pro
Versions of relevant libraries:
[pip3] torch==2.7.0
[conda] No relevant packages
cc @VitalyFedyunin @albanD @malfet
The text was updated successfully, but these errors were encountered: