Description
This is a placeholder for a number of analytics runtime improvements we have identified based on profiling high cardinality population analysis (see oprofile_report.400k.txt, oprofile_report.1m.txt and oprofile_report.2m.txt for more details):
-
Fixed Size Weights Array and Direct Indexing
Rework weight styles to use an array of size equal to number of possible weights and then directly address by the weight enum. The fact that that methods like maths_t::countForUpdate are cropping up near the top of the profiles means this is an easy win. -
Cache Probability Calculation Results for Multivariate Features
Caching probabilities was a big win for univariate features. The caching strategy is harder for multivariate features such as lat/long, but should still be feasible. -
Investigate Approaches for Speeding Up Multimodal Probability Calculation
Investigate ways of speeding up ml::maths::CTools::CMixtureProbabilityOfLessLikelySamples, possibly using a different strategy at the expense of reducing accuracy where we don't need it. -
Reduce the Time Spent Computing Logarithms
std::log continues to show up as high on the profiles. We have an approximate fast log implementation ml::maths::CTools::fastLog which we should be able to roll out to more places in the code. This can't be done indiscriminately because not all our uses can suffer the loss in precision, but there are definitely more places where this would be possible.There are also approaches for using SSE instructions to accelerate logs on collection of values and we could perhaps refactor parts of the code to exploit these. There is quite a lot of relevant material online: of particular interest are the references in this post and this repository of fast approximate special functions.
-
Investigate Alternative Unordered Container Implementations
We have had situations in the past where data gatherering has taken a significant proportion of the runtime. In these cases various specific hash map lookups were the bottleneck. We should investigate alternative hash map implementations.Both std::unordered_map and boost::unordered_map use chaining for collision resolution. Because of LOR it is generally better to use linear probing from a speed perspective. Also key caching can help in some cases. In terms off the shelf implementations, google sparse and dense hash maps generally beat these implementations on various axes depending on what the specific requirements are.