[ML] Eagerly discard node statistics for leaves which we will never split #1125

tveasey · 2020-04-03T20:28:13Z

This is a simple memory optimisation for regression and classification model training.

Since we limit the maximum tree size, we only need retain statistics for the maximum gain "max tree size" - "current tree size" leaves. Since we add one leaf statistic for each node we add to the tree we only ever need at most "maximum tree size" / 2 node statistics in total.

We use an upper bound for the leaf statistics memory estimates so this also indirectly helps improve our memory estimate accuracy and so helps with #1106. For example, this reduced estimates by around 25% testing.

tveasey · 2020-04-04T12:21:44Z

lib/api/unittest/CDataFrameAnalyzerTrainingTest.cc

-                           counter_t::E_DFTPMEstimatedPeakMemoryUsage) < 6000000);
-    BOOST_TEST_REQUIRE(core::CProgramCounters::counter(counter_t::E_DFTPMPeakMemoryUsage) < 1500000);
+                           counter_t::E_DFTPMEstimatedPeakMemoryUsage) < 4500000);
+    BOOST_TEST_REQUIRE(core::CProgramCounters::counter(counter_t::E_DFTPMPeakMemoryUsage) < 1600000);


We're not using more memory; however, we now properly account for the memory used by container for the leaf statistics because we resize it before estimating its memory usage.

droberts195

LGTM

…plit (elastic#1125)

…ver split (#1148) Backport #1125.

Eagerly discard node statistics for leaves which we will never split

0cddb29

tveasey added >enhancement review v8.0.0 :ml/DataFrameAnalysis v7.8.0 labels Apr 3, 2020

tveasey added 3 commits April 3, 2020 21:31

Docs

6689ee1

Correct memory estimates

6e8b57a

Update test thresholds

eba76a4

tveasey commented Apr 4, 2020

View reviewed changes

Merge branch 'master' into improve-memory-usage

7fd1f13

droberts195 approved these changes Apr 14, 2020

View reviewed changes

Merge branch 'master' into improve-memory-usage

30e55ff

tveasey merged commit cb7dc7e into elastic:master Apr 15, 2020

tveasey deleted the improve-memory-usage branch April 15, 2020 09:49

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Apr 17, 2020

[ML] Eagerly discard node statistics for leaves which we will never s…

b8032a6

…plit (elastic#1125)

tveasey mentioned this pull request Apr 17, 2020

[7.8][ML] Eagerly discard node statistics for leaves which we will never split #1148

Merged

tveasey added a commit that referenced this pull request Apr 17, 2020

[7.8][ML] Eagerly discard node statistics for leaves which we will ne…

5a02b59

…ver split (#1148) Backport #1125.

tveasey mentioned this pull request Apr 20, 2020

[ML] Boosted tree tidy ups #1155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Eagerly discard node statistics for leaves which we will never split #1125

[ML] Eagerly discard node statistics for leaves which we will never split #1125

Uh oh!

tveasey commented Apr 3, 2020 •

edited

Loading

Uh oh!

tveasey Apr 4, 2020 •

edited

Loading

Uh oh!

droberts195 left a comment

Uh oh!

Uh oh!

[ML] Eagerly discard node statistics for leaves which we will never split #1125

[ML] Eagerly discard node statistics for leaves which we will never split #1125

Uh oh!

Conversation

tveasey commented Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tveasey Apr 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tveasey commented Apr 3, 2020 •

edited

Loading

tveasey Apr 4, 2020 •

edited

Loading