-
Notifications
You must be signed in to change notification settings - Fork 3.5k
torchmetrics.Accuracy
doesn't support inference mode with distributed backend.
#9431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
torchmetrics.Accuracy
doesn't support inference model with distributed backend.torchmetrics.Accuracy
doesn't support inference mode with distributed backend.
@tangbinh @ananthsub @SkafteNicki Mind looking into this ? |
I assume it does not like these four lines (as they are the only inplace operations): |
@WeichenXu123 Thank you for providing the code; it's very helpful. @SkafteNicki I try your suggestions but it looks like the problem remains. The problem also doesn't go away when I replace all the logics in |
@tchaton if it's safer, we can revert the change for inference mode to unbreak these use cases in the meantime |
In my test, this issue only happen in distributed backend, so I guess it related to the metric "aggregation" stage |
@WeichenXu123 - were you running ddp_spawn on CPU? On cuda/NCCL, this simple example works for me. So it looks like an issue with gloo backends.
|
Uh oh!
There was an error while loading. Please reload this page.
🐛 Bug
torchmetrics.Accuracy doesn't support inference mode (introduced in #8813) with distributed backend.
To Reproduce
This line
trainer.test(model, dm)
raise error of:Expected behavior
Environment
conda
,pip
, source):pip
torch.__config__.show()
: N/AAdditional context
The text was updated successfully, but these errors were encountered: