Skip to content

Multiclass accuracy with micro and top-k does not work as expected. #3068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mmcdermott opened this issue Apr 17, 2025 · 4 comments · May be fixed by #3078
Open

Multiclass accuracy with micro and top-k does not work as expected. #3068

mmcdermott opened this issue Apr 17, 2025 · 4 comments · May be fixed by #3078
Labels
bug / fix Something isn't working help wanted Extra attention is needed v1.7.x

Comments

@mmcdermott
Copy link

🐛 Bug

I expect that top_k should impact accuracy, even in micro mode -- namely, in that top_k should mean that you get a prediction "correct" if the "right" logit is among the "top top_k" scores, then you average the number of corrects as per usual. This is not the behavior I see with torchmetrics now. Instead, I see this:

>>> logits
tensor([[[0.0000, 0.1000, 0.5000, 0.4000],
         [0.0000, 0.2000, 0.7000, 0.1000]],

        [[0.0000, 0.4000, 0.3000, 0.3000],
         [1.0000, 0.0000, 0.0000, 0.0000]]])
>>> code
tensor([[3, 2],
        [1, 0]])
>>> logits.shape
torch.Size([2, 2, 4])
>>> code.shape
torch.Size([2, 2])
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=4)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=3)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=2)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=1)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)

Expected behavior

I would expect to see all accuracies with top_k >= 2 to be 1.0.

Additional context

This line https://github.com/Lightning-AI/torchmetrics/blob/master/src/torchmetrics/functional/classification/accuracy.py#L85 ignores top_k in the micro case.

I think the fix to this issue may be related: #3037

@mmcdermott mmcdermott added bug / fix Something isn't working help wanted Extra attention is needed labels Apr 17, 2025
@mmcdermott
Copy link
Author

Also related: #2418

@mmcdermott
Copy link
Author

I think it is actually the case that it doesn't work as expected in any multiclass setting. It may be working as the developers intended, and just an issue with my expectations, but I would expect top-k in multiclass to consider a prediction correct if any of the top-k predictions were correct, then average those # of corrects. I'm not really sure what it is doing now.

@rittik9
Copy link
Contributor

rittik9 commented Apr 17, 2025

Hi @mmcdermott thank you for raising this issue, can you please also specify the torchmetrics version?

@mmcdermott
Copy link
Author

1.7.1

@ved1beta ved1beta linked a pull request Apr 30, 2025 that will close this issue
4 tasks
@Borda Borda added the v1.7.x label May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed v1.7.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants