Remove auto-cache mechanism #1780

wschin · 2018-11-29T18:07:11Z

src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs

sfilipi

justinormont · 2018-11-30T18:08:14Z

src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs

@@ -132,21 +132,16 @@ protected virtual void CheckLabelCompatible(SchemaShape.Column labelCol)
        protected TTransformer TrainTransformer(IDataView trainSet,
            IDataView validationSet = null, IPredictor initPredictor = null)
        {
-            var cachedTrain = Info.WantCaching ? new CacheDataView(Host, trainSet, prefetch: null) : trainSet;


As requested by @GalOshri in the issue, can we add documentation?

Currently the user will have no method of knowing if a specific learner already does it own form of caching, or won't benefit from caching.

Inline w/ @GalOshri's request, I think this documentation should be required before making this change.

Let's change the appropriate cookbook samples to illustrate the new pattern with this little caching checkpoint thing.

In reply to: 237951473 [](ancestors = 237951473)

Yes, updating the example code is a good first step. And we should create a direct list of the components which benefit from caching. This is along with when they benefit, for instance, "a LinearSVM when the number of iterations are greater than 1".

Another route is perhaps a VS checker which look at Info.WantCaching and recommends from there? #WontFix

A sample and some tests are modified to use those caching functions. Every caching function has at least one test now. #Resolved

I don't think having a list is a small task. We need another PR and issue.

In reply to: 237953879 [](ancestors = 237953879)

Ok. I will do it in next iteration.

[Update] Done. Please take a look again. Thank you.

In reply to: 237951764 [](ancestors = 237951764,237951473)

Generally, I like to see documentation in the PR. This is more so true when the user can be surprised by the change and not understand what's different.

Many uses are added into Coookbook.

In reply to: 238949665 [](ancestors = 238949665)

wschin · 2018-12-04T00:51:56Z

test/Microsoft.ML.Tests/FeatureContributionTests.cs

                .Append(ML.Transforms.Normalize("Features"));
            var data = pipeline.Fit(srcDV).Transform(srcDV);
-            var model = ML.Regression.Trainers.OnlineGradientDescent().Fit(data);


Because caching changes the behavior of batch size, we replace OnlineGradientDescent with another linear regressor. This should not affect the goal of this test. #Resolved

wschin · 2018-12-04T00:54:01Z

test/Microsoft.ML.Tests/FeatureContributionTests.cs

-            expectedValues.Add(new float[4] { 0.297142357F, 1, 0.2855884F, 0.193529665F });
-            expectedValues.Add(new float[4] { 0.45465675F, 0.8805887F, 0.4031663F, 1 });
-            expectedValues.Add(new float[4] { 0.0595234372F, 0.99999994F, 0.349647522F, 0.137912869F });
+            expectedValues.Add(new float[4] { 0.06319684F, 1, 0.1386623F, 4.46209469E-06F });


This new baseline was produced using master branch. #Resolved

wschin · 2018-12-04T01:00:52Z

test/Microsoft.ML.Tests/Scenarios/OvaTest.cs


            // Pipeline
-            var pipeline = new Ova(mlContext, new LinearSvm(mlContext),  useProbabilities: false);
+            var pipeline = new Ova(mlContext, new LinearSvm(mlContext, numIterations: 100),  useProbabilities: false);


Since the shuffled sequence changed, it now takes more iterations to converge to previous solution. #Resolved

docs/samples/Microsoft.ML.Samples/Dynamic/SDCA.cs

Co-Authored-By: wschin <[email protected]>

…to not-auto-cache

wschin · 2018-12-04T17:26:32Z

test/Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs

@@ -16,7 +16,7 @@ public void SdcaWorkout()
            var dataPath = GetDataPath("breast-cancer.txt");

            var data = TextLoader.CreateReader(Env, ctx => (Label: ctx.LoadFloat(0), Features: ctx.LoadFloat(1, 10)))
-                .Read(dataPath);
+                .Read(dataPath).Cache();


Need to cache otherwise SDCA can't shuffle. #Resolved

What would be user experience if you don't shuffle? Exception?

In reply to: 238761329 [](ancestors = 238761329)

It can be 100x slower in the worst case plus exception.

In reply to: 238883768 [](ancestors = 238883768,238761329)

…ding time becomes much shorter.

Ivanidzo4ka · 2018-12-04T23:43:37Z

src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs

-                var cachedValid = Info.WantCaching ? new CacheDataView(Host, validationSet, prefetch: null) : validationSet;
-                validRoles = MakeRoles(cachedValid);
-            }
+                validRoleMapped = MakeRoles(validationSet);


just set it null as default and change only if validation set is != null #Resolved

Ivanidzo4ka · 2018-12-04T23:45:12Z

test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamples.cs

@@ -562,6 +562,9 @@ private void CrossValidationOn(string dataPath)
                    Label: r.Label.ToKey(),
                    // Concatenate all the features together into one column 'Features'.
                    Features: r.SepalLength.ConcatWith(r.SepalWidth, r.PetalLength, r.PetalWidth)))
+                // Add a step for caching data in memory so that the downstream iterative training
+                // algorithm can efficiently scan through the data multiple times.
+                .AppendCacheCheckpoint()


AppendCacheCheckpoint [](start = 17, length = 21)

you need to update https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md with these changes. (here and in SamplesDynamicApi)

Done. Could you take a look again?

In reply to: 238883605 [](ancestors = 238883605)

TomFinley

Thank you @wschin !!

Remove auto-cache mechanism

eae2947

wschin self-assigned this Nov 29, 2018

wschin requested review from TomFinley and sfilipi November 29, 2018 18:07

justinormont reviewed Nov 29, 2018

View reviewed changes

src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs Show resolved Hide resolved

sfilipi approved these changes Nov 30, 2018

View reviewed changes

justinormont reviewed Nov 30, 2018

View reviewed changes

wschin added 6 commits November 30, 2018 14:13

Merge branch 'master' into not-auto-cache

7294b6e

Fix tests

426514c

Add caching usage into a sample

e368a1a

Add a test for new function

a17e3fe

Remove empty line

369bf11

Fix doc

47d213e

wschin commented Dec 4, 2018

View reviewed changes

justinormont reviewed Dec 4, 2018

View reviewed changes

docs/samples/Microsoft.ML.Samples/Dynamic/SDCA.cs Outdated Show resolved Hide resolved

wschin and others added 5 commits December 4, 2018 08:36

Merge remote-tracking branch 'origin/master' into not-auto-cache

dd8f3c3

Update baselines

fc8285c

Update docs/samples/Microsoft.ML.Samples/Dynamic/SDCA.cs

504f148

Co-Authored-By: wschin <[email protected]>

Fix tests

5e08d1e

Merge branch 'not-auto-cache' of github.com:wschin/machinelearning in…

176b7c1

…to not-auto-cache

wschin commented Dec 4, 2018

View reviewed changes

Avoid prefetch step but insert a caching step in pipeline so that loa…

067c88d

…ding time becomes much shorter.

Ivanidzo4ka reviewed Dec 4, 2018

View reviewed changes

wschin added 3 commits December 4, 2018 22:08

Cache data in some tests. Otherwise, building time may exceed 45 mins.

1d963bd

Update some code samples to use caching

bca4ccf

Merge branch 'master' into not-auto-cache

1342475

TomFinley approved these changes Dec 6, 2018

View reviewed changes

TomFinley merged commit 435a63b into dotnet:master Dec 6, 2018

wschin deleted the not-auto-cache branch December 6, 2018 17:50

ghost locked as resolved and limited conversation to collaborators Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove auto-cache mechanism #1780

Remove auto-cache mechanism #1780

wschin commented Nov 29, 2018

sfilipi left a comment

justinormont Nov 30, 2018

TomFinley Nov 30, 2018

justinormont Nov 30, 2018 •

edited by wschin

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 5, 2018 •

edited

Loading

wschin Dec 5, 2018 •

edited

Loading

justinormont Dec 5, 2018

wschin Dec 5, 2018

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 4, 2018 •

edited

Loading

Ivanidzo4ka Dec 4, 2018

wschin Dec 5, 2018 •

edited

Loading

Ivanidzo4ka Dec 4, 2018 •

edited by wschin

Loading

Ivanidzo4ka Dec 4, 2018

wschin Dec 5, 2018

TomFinley left a comment

Remove auto-cache mechanism #1780

Remove auto-cache mechanism #1780

Conversation

wschin commented Nov 29, 2018

sfilipi left a comment

Choose a reason for hiding this comment

justinormont Nov 30, 2018

Choose a reason for hiding this comment

TomFinley Nov 30, 2018

Choose a reason for hiding this comment

justinormont Nov 30, 2018 • edited by wschin Loading

Choose a reason for hiding this comment

wschin Dec 4, 2018 • edited Loading

Choose a reason for hiding this comment

wschin Dec 5, 2018 • edited Loading

Choose a reason for hiding this comment

wschin Dec 5, 2018 • edited Loading

Choose a reason for hiding this comment

justinormont Dec 5, 2018

Choose a reason for hiding this comment

wschin Dec 5, 2018

Choose a reason for hiding this comment

wschin Dec 4, 2018 • edited Loading

Choose a reason for hiding this comment

wschin Dec 4, 2018 • edited Loading

Choose a reason for hiding this comment

wschin Dec 4, 2018 • edited Loading

Choose a reason for hiding this comment

wschin Dec 4, 2018 • edited Loading

Choose a reason for hiding this comment

Ivanidzo4ka Dec 4, 2018

Choose a reason for hiding this comment

wschin Dec 5, 2018 • edited Loading

Choose a reason for hiding this comment

Ivanidzo4ka Dec 4, 2018 • edited by wschin Loading

Choose a reason for hiding this comment

Ivanidzo4ka Dec 4, 2018

Choose a reason for hiding this comment

wschin Dec 5, 2018

Choose a reason for hiding this comment

TomFinley left a comment

Choose a reason for hiding this comment

justinormont Nov 30, 2018 •

edited by wschin

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 5, 2018 •

edited

Loading

wschin Dec 5, 2018 •

edited

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 4, 2018 •

edited

Loading

wschin Dec 5, 2018 •

edited

Loading

Ivanidzo4ka Dec 4, 2018 •

edited by wschin

Loading