Skip to content

Non-blocking call in _safe_divide leads to race condition #3095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wsascha opened this issue May 14, 2025 · 1 comment
Open

Non-blocking call in _safe_divide leads to race condition #3095

wsascha opened this issue May 14, 2025 · 1 comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed

Comments

@wsascha
Copy link

wsascha commented May 14, 2025

🐛 Bug

There's an apparent race condition here:

zero_division_tensor = torch.tensor(zero_division, dtype=num.dtype).to(num.device, non_blocking=True)
return torch.where(denom != 0, num / denom, zero_division_tensor)

When moving the tensor to the target device (MPS in my case), I get sometimes the correct default (0.0) but sometimes uninitialized numbers, screwing up everything.

Environment
  • TorchMetrics version (if build from source, add commit SHA): 1.7.1
  • Python & PyTorch Version (e.g., 1.0): Python 3.12.10, PyTorch 2.7.0
  • Any other relevant information such as OS (e.g., Linux): macOS, Darwin Kernel Version 24.4.0
@wsascha wsascha added bug / fix Something isn't working help wanted Extra attention is needed labels May 14, 2025
Copy link

Hi! Thanks for your contribution! Great first issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant