Skip to content

API scenarios implementation with Estimators #688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 21, 2018

Conversation

Zruty0
Copy link
Contributor

@Zruty0 Zruty0 commented Aug 17, 2018

First version of a couple scenarios over Estimators

commit 5409376
Author: Pete Luferenko <[email protected]>
Date:   Fri Aug 17 09:59:08 2018 -0700

    Scenarios implementation

commit d7a53ad
Merge: 759dafb 11ff554
Author: Pete Luferenko <[email protected]>
Date:   Fri Aug 17 09:59:00 2018 -0700

    Scenarios implementation: merge

commit 11ff554
Author: Pete Luferenko <[email protected]>
Date:   Fri Aug 17 08:25:41 2018 -0700

    CrossValidator class

commit 7a66e19
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 15 08:55:11 2018 -0700

    Reconfigurable prediction

commit 231516a
Author: Pete Luferenko <[email protected]>
Date:   Tue Aug 14 09:11:01 2018 -0700

    Cross-validation

commit adc4e4b
Merge: 660ebeb e77f24e
Author: Pete Luferenko <[email protected]>
Date:   Tue Aug 14 08:30:37 2018 -0700

    Merge remote-tracking branch 'upstream/master' into feature/estimators

commit 660ebeb
Author: Pete Luferenko <[email protected]>
Date:   Mon Aug 13 10:01:09 2018 -0700

    Added transformer scope enum for decomposability

commit b82f4c6
Author: Pete Luferenko <[email protected]>
Date:   Fri Aug 10 14:14:29 2018 -0700

    Saving/loading data

commit bf097ab
Author: Pete Luferenko <[email protected]>
Date:   Thu Aug 9 16:59:57 2018 -0700

    Added auto-normalization to everything

commit ef416f7
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 8 18:12:57 2018 -0700

    No need to give schema to prediction engine

commit 41df05a
Merge: 215856d d0664c1
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 8 17:59:33 2018 -0700

    Merge remote-tracking branch 'upstream/master' into feature/estimators

commit 215856d
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 8 09:45:53 2018 -0700

    Added evaluation

commit 184027b
Merge: f85cd9c d932807
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 8 08:40:12 2018 -0700

    Merge branch 'feature/api-proposal' into feature/estimators

commit d932807
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 8 08:33:51 2018 -0700

    Another rework of first example

commit f85cd9c
Author: Ivan Matantsev <[email protected]>
Date:   Mon Aug 6 14:41:55 2018 -0700

    ReconfigurablePrediction
    and shorter execution

commit fffa4eb
Merge: 4ee52f8 77d42a5
Author: Ivan Matantsev <[email protected]>
Date:   Mon Aug 6 13:51:24 2018 -0700

    Merge branch 'feature/estimators' of https://github.com/Zruty0/machinelearning into feature/estimators

commit 4ee52f8
Author: Ivan Matantsev <[email protected]>
Date:   Mon Aug 6 13:51:08 2018 -0700

    cross validation

commit 77d42a5
Author: Pete Luferenko <[email protected]>
Date:   Mon Aug 6 13:32:24 2018 -0700

    Small edits on top of Ivan's code

commit 528a270
Merge: bf344a3 b468056
Author: Pete Luferenko <[email protected]>
Date:   Mon Aug 6 13:27:35 2018 -0700

    Merge branch 'feature/estimators' of github.com:Zruty0/machinelearning into feature/estimators

commit bf344a3
Author: Pete Luferenko <[email protected]>
Date:   Mon Aug 6 13:27:18 2018 -0700

    Lowered execution times on some tests.
    Implemented new API for initial predictor and validation sets

commit 1e90463
Author: Pete Luferenko <[email protected]>
Date:   Mon Aug 6 08:29:10 2018 -0700

    Updated for PR comments

commit b468056
Author: Ivan Matantsev <[email protected]>
Date:   Fri Aug 3 15:35:02 2018 -0700

    idv and decomposableTrain

commit d051138
Author: Pete Luferenko <[email protected]>
Date:   Fri Aug 3 09:08:50 2018 -0700

    Minor tweaks to Ivan's tests

commit 6c1d437
Merge: be90030 e3aea3b
Author: Pete Luferenko <[email protected]>
Date:   Thu Aug 2 17:57:20 2018 -0700

    Merge branch 'feature/estimators' of github.com:Zruty0/machinelearning into feature/estimators

commit be90030
Merge: 9fd480e 96661bb
Author: Pete Luferenko <[email protected]>
Date:   Thu Aug 2 17:56:46 2018 -0700

    Merged with examples

commit 96661bb
Author: Pete Luferenko <[email protected]>
Date:   Thu Aug 2 17:47:43 2018 -0700

    Rework the example with Tom

commit 13bbc43
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 1 14:25:06 2018 -0700

    Adding 'aspirational examples' and some more baseline scenarios

commit e3aea3b
Merge: 8b65829 9fd480e
Author: Ivan Matantsev <[email protected]>
Date:   Wed Aug 1 15:12:28 2018 -0700

    Merge branch 'feature/estimators' of https://github.com/Zruty0/machinelearning into feature/estimators

commit 8b65829
Author: Ivan Matantsev <[email protected]>
Date:   Wed Aug 1 15:12:24 2018 -0700

    add evaluation and normalizationAndCaching example

commit 9fd480e
Merge: 70d3fb4 89dfc82
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 1 14:38:33 2018 -0700

    Merge remote-tracking branch 'upstream/master' into feature/estimators

commit 70d3fb4
Author: Pete Luferenko <[email protected]>
Date:   Wed Aug 1 14:25:06 2018 -0700

    Adding 'aspirational examples' and some more baseline scenarios

commit 1999de8
Merge: a8dabb3 bdb742d
Author: Pete Luferenko <[email protected]>
Date:   Tue Jul 31 16:48:25 2018 -0700

    Merge remote-tracking branch 'upstream/master' into feature/estimators

commit a8dabb3
Author: Pete Luferenko <[email protected]>
Date:   Tue Jul 31 16:47:38 2018 -0700

    Added second test

commit 75f09b6
Merge: 598e174 b727d10
Author: Pete Luferenko <[email protected]>
Date:   Mon Jul 30 18:15:01 2018 -0700

    Merge remote-tracking branch 'upstream/master' into feature/estimators

commit 598e174
Author: Pete Luferenko <[email protected]>
Date:   Thu Jul 26 20:32:23 2018 -0700

    Converted one scenario to estimators

commit 6299a1f
Merge: 6aeb7cc 20e59a2
Author: Pete Luferenko <[email protected]>
Date:   Thu Jul 26 19:15:03 2018 -0700

    Merge branch 'feature/api-examples' into feature/estimators

commit 20e59a2
Author: Pete Luferenko <[email protected]>
Date:   Thu Jul 26 19:14:31 2018 -0700

    Added the first two scenarios from the list.

commit 6aeb7cc
Author: Pete Luferenko <[email protected]>
Date:   Thu Jul 26 14:04:16 2018 -0700

    Fixed whitespaces

commit 5088ab4
Merge: f84e67a 0f94a3b
Author: Pete Luferenko <[email protected]>
Date:   Thu Jul 26 13:59:46 2018 -0700

    Merge remote-tracking branch 'upstream/master' into feature/estimators

commit f84e67a
Author: Pete Luferenko <[email protected]>
Date:   Thu Jul 26 13:56:13 2018 -0700

    Some renaming to interfaces
    Removed non-typed estimator
    Fixed collections in ad-hoc tests

commit 49730cb
Author: Pete Luferenko <[email protected]>
Date:   Thu Jul 26 10:49:09 2018 -0700

    Renamed and changed comments on data readers.

commit 5a819d3
Author: Pete Luferenko <[email protected]>
Date:   Wed Jul 25 17:53:02 2018 -0700

    Typed estimators

commit 91dc0f2
Author: Pete Luferenko <[email protected]>
Date:   Tue Jul 24 20:00:57 2018 -0700

    Added prediction engine to playground

commit 3826648
Author: Pete Luferenko <[email protected]>
Date:   Tue Jul 24 19:21:04 2018 -0700

    Added an ad hoc test playground

commit 3b2edab
Author: Pete Luferenko <[email protected]>
Date:   Tue Jul 24 17:44:46 2018 -0700

    Initial take on IEstimator interfaces
@Zruty0
Copy link
Contributor Author

Zruty0 commented Aug 17, 2018

using System;

headers #Resolved


Refers to: test/Microsoft.ML.Tests/Scenarios/Api/AspirationalExamples.cs:1 in e05ca2d. [](commit_id = e05ca2d, deletion_comment = False)

@Zruty0
Copy link
Contributor Author

Zruty0 commented Aug 17, 2018

using Microsoft.ML.Core.Data;

I think I'll promote these container classes to the core assembly: it seems that they are actually useful for pipelining as well as saving. Probably there is no way to not have them in the core. #Resolved


Refers to: test/Microsoft.ML.Tests/Scenarios/Api/EstimatorPipe.cs:1 in e05ca2d. [](commit_id = e05ca2d, deletion_comment = False)

@Zruty0
Copy link
Contributor Author

Zruty0 commented Aug 17, 2018

using Microsoft.ML.Core.Data;

Everything is this file is temporary. I deliberately omit the Contracts.Check and flout other norms here. Please do not review this file in isolation: start with the tests, and then you can look up individual MyThis / MyThat implementations to see what I am doing #ByDesign


Refers to: test/Microsoft.ML.Tests/Scenarios/Api/Wrappers.cs:1 in e05ca2d. [](commit_id = e05ca2d, deletion_comment = False)

@@ -2,6 +2,9 @@
<PropertyGroup>
<TargetFramework>netcoreapp2.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<Compile Remove="Scenarios\Api\AspirationalExamples.cs" />
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AspirationalExamples [](start = 35, length = 20)

why? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because right now it doesn't compile, and may never compile, since it uses my own imaginary version of Pigsty, which may differ from the real one, when it appears.


In reply to: 210986324 [](ancestors = 210986324)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we even add this file? What's the point?


In reply to: 211030708 [](ancestors = 211030708,210986324)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, as an aspirational example? It's a way to document what we want to reach at the end. Another way would be to put it into Markdown somewhere, but I think I like this way somewhat better.


In reply to: 211688654 [](ancestors = 211688654,211030708,210986324)

@@ -26,4 +29,8 @@
<NativeAssemblyReference Include="SymSgdNative" />
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 0, length = 1)

since you editing this file...
TAB! #Resolved


internal TransformerChain(ITransformer[] transformers, TransformerScope[] scopes)
{
_transformers = transformers.ToArray();
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.ToArray() [](start = 40, length = 10)

why? #Resolved

{
_transformers = transformers.ToArray();
_scopes = scopes.ToArray();
LastTransformer = transformers.Last() as TLastTransformer;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last [](start = 43, length = 4)

LastOrDefeault? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we actually require at least one here


In reply to: 210996903 [](ancestors = 210996903)

internal TransformerChain(ITransformer[] transformers, TransformerScope[] scopes)
{
_transformers = transformers.ToArray();
_scopes = scopes.ToArray();
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.ToArray(); [](start = 28, length = 11)

I assume it was IEnumerable parameters before #Resolved

None = 0,
Training = 1 << 0,
Testing = 1 << 1,
Scoring = 1 << 2,
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoring [](start = 8, length = 7)

considering what usually testing is scoring + get metrics, can we put scoring above of testing? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking more chronologically at it: first, you need to train the model. Then, you are going to test it, and then, finally, when you are satisfied with what you've got, you package it and use it for prediction / scoring.


In reply to: 210998340 [](ancestors = 210998340)

public sealed class TransformerChain<TLastTransformer> : ITransformer, ICanSaveModel
where TLastTransformer : class, ITransformer
{
private readonly ITransformer[] _transformers;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] [](start = 37, length = 2)

any reason why it's array? #Resolved

Copy link
Contributor Author

@Zruty0 Zruty0 Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a fixed-size sequence of objects that have a common type. I think array is a natural choice for such a structure.


In reply to: 210998941 [](ancestors = 210998941)

model.SavePipeline(env, fs);

// Load model.
loadedModel = CompositeReader.LoadPipeline(env, file.OpenReadStream());
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadedModel [](start = 20, length = 11)

should we use loadedModel for predictions? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup


In reply to: 211000998 [](ancestors = 211000998)


// Train model with validation set.
var trainer = new MySdca(env, new Runtime.Learners.LinearClassificationTrainer.Arguments(), "Features", "Label");
var transformer = trainer.Train(trainData, validData);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformer [](start = 20, length = 11)

why it's transformer? maybe predictor, or model? #Resolved

var model = trainer.Fit(trainData);

var scoredTest = model.Transform(pipeline.Transform(testData));
var metrics = new MyBinaryClassifierEvaluator(env, new BinaryClassifierEvaluator.Arguments()).Evaluate(scoredTest, "Label", "Probability");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Label", "Probability" [](start = 131, length = 22)

any reason why they are not default values for evaluate method? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No


In reply to: 211001825 [](ancestors = 211001825)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe yes: to make it somewhat more obvious that you need to provide these columns for correct evaluation.


In reply to: 211031542 [](ancestors = 211031542,211001825)

using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Runtime.CommandLine;
using Microsoft.ML.Runtime.Data;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we moved this files into api/estimators folder?
also header :) #Resolved

@Zruty0
Copy link
Contributor Author

Zruty0 commented Aug 17, 2018

using Microsoft.ML.Core.Data;

Nah, could not be done. But I moved them to Data.


In reply to: 413936797 [](ancestors = 413936797)


Refers to: test/Microsoft.ML.Tests/Scenarios/Api/EstimatorPipe.cs:1 in e05ca2d. [](commit_id = e05ca2d, deletion_comment = False)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Zruty0 ... the tests for now look horrifying but I suppose that's the point is to iteratively change the code so that it serves the scenarios of #584 better. 😄

/// <summary>
/// Exception class for schema validation errors.
/// </summary>
public class SchemaException : Exception
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaException [](start = 17, length = 15)

Since it's not part of Contracts, who will ever throw it? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We all shall, in time.


In reply to: 211687059 [](ancestors = 211687059)

/// For an empty chain, <typeparamref name="TLastTransformer"/> is always <see cref="ITransformer"/>.
/// </summary>
public sealed class TransformerChain<TLastTransformer> : ITransformer, ICanSaveModel, IEnumerable<ITransformer>
where TLastTransformer : class, ITransformer
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where TLastTransformer : class [](start = 4, length = 30)

can we put it on separate line for esthetic purposes? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you meant. It's already on a separate line: TLastTransformer needs to be a class, and implement ITransformer


In reply to: 211688165 [](ancestors = 211688165)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that ITransformer actually are requirement for TLastTransformer.
at least add one tab in front of 'where', otherwise my brain read this as continuation of interfaces for class, and in general across our code we always have
private class some
where T:
not
private class some
where T


In reply to: 211693191 [](ancestors = 211693191,211688165)


public IEnumerator<ITransformer> GetEnumerator() => ((IEnumerable<ITransformer>)_transformers).GetEnumerator();

IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); [](start = 8, length = 59)

can you add implicit private? #ByDesign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's not private. Nor is it public of protected :)


In reply to: 211688343 [](ancestors = 211688343)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Zruty0 Zruty0 merged commit 73b0308 into dotnet:master Aug 21, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants