[ML] Analytics Runtime Improvements

This is a placeholder for a number of analytics runtime improvements we have identified based on profiling high cardinality population analysis (see [oprofile_report.400k.txt](https://github.com/elastic/ml-cpp/files/1906967/oprofile_report.400k.txt), [oprofile_report.1m.txt](https://github.com/elastic/ml-cpp/files/1906971/oprofile_report.1m.txt) and [oprofile_report.2m.txt](https://github.com/elastic/ml-cpp/files/1906979/oprofile_report.2m.txt) for more details):

- [x] _Fixed Size Weights Array and Direct Indexing_
Rework weight styles to use an array of size equal to number of possible weights and then directly address by the weight enum. The fact that that methods like maths_t::countForUpdate are cropping up near the top of the profiles means this is an easy win.
- [ ] _Cache Probability Calculation Results for Multivariate Features_
Caching probabilities was a big win for univariate features. The caching strategy is harder for multivariate features such as lat/long, but should still be feasible.
- [ ] _Investigate Approaches for Speeding Up Multimodal Probability Calculation_
Investigate ways of speeding up ml::maths::CTools::CMixtureProbabilityOfLessLikelySamples, possibly using a different strategy at the expense of reducing accuracy where we don't need it.
- [ ] _Reduce the Time Spent Computing Logarithms_
std::log continues to show up as high on the profiles. We have an approximate fast log implementation ml::maths::CTools::fastLog which we should be able to roll out to more places in the code. This can't be done indiscriminately because not all our uses can suffer the loss in precision, but there are definitely more places where this would be possible.

   There are also approaches for using SSE instructions to accelerate logs on collection of values and we could perhaps refactor parts of the code to exploit these. There is quite a lot of relevant material online: of particular interest are the references in [this](https://stackoverflow.com/questions/8902971/logarithm-with-sse-or-switch-to-fpu) post and [this](https://github.com/romeric/fastapprox) repository of fast approximate special functions.
- [ ] _Investigate Alternative Unordered Container Implementations_
We have had situations in the past where data gatherering has taken a significant proportion of the runtime. In these cases various specific hash map lookups were the bottleneck. We should investigate alternative hash map implementations.

   Both std::unordered_map and boost::unordered_map use chaining for collision resolution. Because of LOR it is generally better to use linear probing from a speed perspective. Also key caching can help in some cases. In terms off the shelf implementations, google sparse and dense hash maps generally beat these implementations on various axes depending on what the specific requirements are.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Analytics Runtime Improvements #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Analytics Runtime Improvements #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions