[ML] Add audit message when categorization detects too many categories #50319

droberts195 · 2019-12-18T15:49:59Z

If data that is not suitable for categorization is categorized then it is possible for an excessive number of categories to be created, each with a very small number of messages.

The resultant categories are not very useful, and also resource hungry, both in terms of results documents to process and because they increase the cardinality of the chained anomaly detection.

To make it clearer that such a situation has occurred and to encourage the user to stop the affected job we should write an audit message when there are lots of categories for a job.

The condition for doing this could be as simple as "number of categories > 1000".

Or we could go for something more advanced like "number of input documents > 1000 and number of categories > number of input documents / 10" or "number of categories > 3 * √number of input documents".

elasticmachine · 2019-12-18T15:50:04Z

Pinging @elastic/ml-core (:ml)

droberts195 · 2020-02-11T12:29:45Z

A rudimentary check was added in 7.6 in #51146

This was replaced with a better check for 7.7 and above in #52195

droberts195 added >enhancement :ml Machine learning labels Dec 18, 2019

droberts195 self-assigned this Feb 11, 2020

droberts195 closed this as completed Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Add audit message when categorization detects too many categories #50319

[ML] Add audit message when categorization detects too many categories #50319

droberts195 commented Dec 18, 2019

elasticmachine commented Dec 18, 2019

Uh oh!

droberts195 commented Feb 11, 2020

Uh oh!

[ML] Add audit message when categorization detects too many categories #50319

[ML] Add audit message when categorization detects too many categories #50319

Comments

droberts195 commented Dec 18, 2019

elasticmachine commented Dec 18, 2019

Uh oh!

droberts195 commented Feb 11, 2020

Uh oh!