You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If data that is not suitable for categorization is categorized then it is possible for an excessive number of categories to be created, each with a very small number of messages.
The resultant categories are not very useful, and also resource hungry, both in terms of results documents to process and because they increase the cardinality of the chained anomaly detection.
To make it clearer that such a situation has occurred and to encourage the user to stop the affected job we should write an audit message when there are lots of categories for a job.
The condition for doing this could be as simple as "number of categories > 1000".
Or we could go for something more advanced like "number of input documents > 1000 and number of categories > number of input documents / 10" or "number of categories > 3 * √number of input documents".
The text was updated successfully, but these errors were encountered:
If data that is not suitable for categorization is categorized then it is possible for an excessive number of categories to be created, each with a very small number of messages.
The resultant categories are not very useful, and also resource hungry, both in terms of results documents to process and because they increase the cardinality of the chained anomaly detection.
To make it clearer that such a situation has occurred and to encourage the user to stop the affected job we should write an audit message when there are lots of categories for a job.
The condition for doing this could be as simple as "number of categories > 1000".
Or we could go for something more advanced like "number of input documents > 1000 and number of categories > number of input documents / 10" or "number of categories > 3 * √number of input documents".
The text was updated successfully, but these errors were encountered: