Skip to content

[ML] AucRoc gives misleading results when num_top_classes is set too low. #63306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
przemekwitek opened this issue Oct 6, 2020 · 4 comments
Closed
Labels
>bug :ml Machine learning

Comments

@przemekwitek
Copy link
Contributor

przemekwitek commented Oct 6, 2020

In the case of multiclass classification, the calculation of AucRoc should require that the class in question appears in all documents top classes arrays, so that we know its probability for every document.
Otherwise, the results are not correct or, in some cases, as pointed out by @wwang500, the evaluation request fails because it cannot find even one single document with the class in question listed in top classes.

The solution is to set num_top_classes so that it is greater or equal to the total number of classes. We should minimize the surprise for the users though and possibly apply a sensible default ourselves.

@przemekwitek przemekwitek added >bug :ml Machine learning labels Oct 6, 2020
@przemekwitek przemekwitek self-assigned this Oct 6, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent
Copy link
Member

I think having a special value of -1 meaning "all classes" is probably a good idea. Otherwise the default value will have to be some "large value" (which I think would be anything 30+ as that is our current class size limit).

But if we ever increase the number of classes we allow, this default value would have to change. I think decoupling them is prudent.

@przemekwitek
Copy link
Contributor Author

I think having a special value of -1 meaning "all classes" is probably a good idea.

I think so too. I've implemented this idea on C++ side in elastic/ml-cpp#1526

@przemekwitek
Copy link
Contributor Author

All the changes are in so I consider this issue solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
Development

No branches or pull requests

3 participants