Multiclass accuracy with micro and top-k does not work as expected. #3068

mmcdermott · 2025-04-17T18:37:05Z

🐛 Bug

I expect that top_k should impact accuracy, even in micro mode -- namely, in that top_k should mean that you get a prediction "correct" if the "right" logit is among the "top top_k" scores, then you average the number of corrects as per usual. This is not the behavior I see with torchmetrics now. Instead, I see this:

>>> logits
tensor([[[0.0000, 0.1000, 0.5000, 0.4000],
         [0.0000, 0.2000, 0.7000, 0.1000]],

        [[0.0000, 0.4000, 0.3000, 0.3000],
         [1.0000, 0.0000, 0.0000, 0.0000]]])
>>> code
tensor([[3, 2],
        [1, 0]])
>>> logits.shape
torch.Size([2, 2, 4])
>>> code.shape
torch.Size([2, 2])
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=4)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=3)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=2)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)
>>> acc = Accuracy(task="multiclass", ignore_index=0, num_classes=4, multidim_average="global", average="micro", top_k=1)
>>> acc(logits.transpose(2, 1), code)
tensor(0.6667)

Expected behavior

I would expect to see all accuracies with top_k >= 2 to be 1.0.

Additional context

This line https://github.com/Lightning-AI/torchmetrics/blob/master/src/torchmetrics/functional/classification/accuracy.py#L85 ignores top_k in the micro case.

I think the fix to this issue may be related: #3037

The text was updated successfully, but these errors were encountered:

mmcdermott · 2025-04-17T18:42:55Z

Also related: #2418

mmcdermott · 2025-04-17T18:44:29Z

I think it is actually the case that it doesn't work as expected in any multiclass setting. It may be working as the developers intended, and just an issue with my expectations, but I would expect top-k in multiclass to consider a prediction correct if any of the top-k predictions were correct, then average those # of corrects. I'm not really sure what it is doing now.

rittik9 · 2025-04-17T20:47:37Z

Hi @mmcdermott thank you for raising this issue, can you please also specify the torchmetrics version?

mmcdermott · 2025-04-17T20:54:23Z

1.7.1

mmcdermott added bug / fix Something isn't working help wanted Extra attention is needed labels Apr 17, 2025

ved1beta linked a pull request Apr 30, 2025 that will close this issue

fix _multiclass_stat_scores_update in classification #3078

Open

4 tasks

Borda added the v1.7.x label May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiclass accuracy with micro and top-k does not work as expected. #3068

Multiclass accuracy with micro and top-k does not work as expected. #3068

mmcdermott commented Apr 17, 2025

mmcdermott commented Apr 17, 2025

Uh oh!

mmcdermott commented Apr 17, 2025

Uh oh!

rittik9 commented Apr 17, 2025

Uh oh!

mmcdermott commented Apr 17, 2025

Uh oh!

Multiclass accuracy with micro and top-k does not work as expected. #3068

Multiclass accuracy with micro and top-k does not work as expected. #3068

Comments

mmcdermott commented Apr 17, 2025

🐛 Bug

Expected behavior

Additional context

mmcdermott commented Apr 17, 2025

Uh oh!

mmcdermott commented Apr 17, 2025

Uh oh!

rittik9 commented Apr 17, 2025

Uh oh!

mmcdermott commented Apr 17, 2025

Uh oh!