[ML] Improvements to regression and classification memory handling #995

tveasey · 2020-02-12T10:06:03Z

Currently, our upfront memory usage estimate is an upper bound and a significant over estimate in most cases. This issue covers a couple of quick wins for improving the estimate:

Pass the training percentage: we need to know the number of rows used to train
Pass number of feature values for each feature: this would enable to better estimate how much memory we'll use for aggregate loss derivatives
Account for maximum number of features we will select
The SHAP's memory usage is not current with the leaf statistics memory usage

A better strategy (longer term) would be, rather than estimating a memory upper bound, estimate a value which training is very unlikely to exceed. This would require that we support circuit breaking during training. Since we snapshot state periodically the user would still be able to retrospectively increase the memory limit and restart analysis.

tveasey · 2020-02-17T11:33:27Z

This is also impacted by #1003.

tveasey · 2021-02-01T14:54:49Z

We've done quite a lot of work on memory usage since this issue was created. Whilst it would be possible to refine estimates if we knew certain features had relatively few distinct values this is not a priority at present. We can revisit if we decide we need better memory estimates in the future.

tveasey added >enhancement :ml/DataFrameAnalysis labels Feb 12, 2020

tveasey closed this as completed Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Improvements to regression and classification memory handling #995

[ML] Improvements to regression and classification memory handling #995

tveasey commented Feb 12, 2020 •

edited

Loading

tveasey commented Feb 17, 2020

tveasey commented Feb 1, 2021

[ML] Improvements to regression and classification memory handling #995

[ML] Improvements to regression and classification memory handling #995

Comments

tveasey commented Feb 12, 2020 • edited Loading

tveasey commented Feb 17, 2020

tveasey commented Feb 1, 2021

tveasey commented Feb 12, 2020 •

edited

Loading