Skip to content

Remove auto-cache mechanism #1780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Dec 6, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 5 additions & 10 deletions src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -132,21 +132,16 @@ protected virtual void CheckLabelCompatible(SchemaShape.Column labelCol)
protected TTransformer TrainTransformer(IDataView trainSet,
IDataView validationSet = null, IPredictor initPredictor = null)
{
var cachedTrain = Info.WantCaching ? new CacheDataView(Host, trainSet, prefetch: null) : trainSet;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested by @GalOshri in the issue, can we add documentation?

Currently the user will have no method of knowing if a specific learner already does it own form of caching, or won't benefit from caching.

Inline w/ @GalOshri's request, I think this documentation should be required before making this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the appropriate cookbook samples to illustrate the new pattern with this little caching checkpoint thing.


In reply to: 237951473 [](ancestors = 237951473)

Copy link
Contributor

@justinormont justinormont Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updating the example code is a good first step. And we should create a direct list of the components which benefit from caching. This is along with when they benefit, for instance, "a LinearSVM when the number of iterations are greater than 1".

Another route is perhaps a VS checker which look at Info.WantCaching and recommends from there? #WontFix

Copy link
Member Author

@wschin wschin Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sample and some tests are modified to use those caching functions. Every caching function has at least one test now. #Resolved

Copy link
Member Author

@wschin wschin Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having a list is a small task. We need another PR and issue.


In reply to: 237953879 [](ancestors = 237953879)

Copy link
Member Author

@wschin wschin Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will do it in next iteration.

[Update] Done. Please take a look again. Thank you.


In reply to: 237951764 [](ancestors = 237951764,237951473)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like to see documentation in the PR. This is more so true when the user can be surprised by the change and not understand what's different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many uses are added into Coookbook.


In reply to: 238949665 [](ancestors = 238949665)

var trainRoleMapped = MakeRoles(trainSet);

var trainRoles = MakeRoles(cachedTrain);

RoleMappedData validRoles;
RoleMappedData validRoleMapped;

if (validationSet == null)
validRoles = null;
validRoleMapped = null;
else
{
var cachedValid = Info.WantCaching ? new CacheDataView(Host, validationSet, prefetch: null) : validationSet;
validRoles = MakeRoles(cachedValid);
}
validRoleMapped = MakeRoles(validationSet);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just set it null as default and change only if validation set is != null #Resolved


var pred = TrainModelCore(new TrainContext(trainRoles, validRoles, null, initPredictor));
var pred = TrainModelCore(new TrainContext(trainRoleMapped, validRoleMapped, null, initPredictor));
return MakeTransformer(pred, trainSet.Schema);
}

Expand Down