-
Notifications
You must be signed in to change notification settings - Fork 1.9k
API scenarios implementation with Estimators #688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
commit 5409376 Author: Pete Luferenko <[email protected]> Date: Fri Aug 17 09:59:08 2018 -0700 Scenarios implementation commit d7a53ad Merge: 759dafb 11ff554 Author: Pete Luferenko <[email protected]> Date: Fri Aug 17 09:59:00 2018 -0700 Scenarios implementation: merge commit 11ff554 Author: Pete Luferenko <[email protected]> Date: Fri Aug 17 08:25:41 2018 -0700 CrossValidator class commit 7a66e19 Author: Pete Luferenko <[email protected]> Date: Wed Aug 15 08:55:11 2018 -0700 Reconfigurable prediction commit 231516a Author: Pete Luferenko <[email protected]> Date: Tue Aug 14 09:11:01 2018 -0700 Cross-validation commit adc4e4b Merge: 660ebeb e77f24e Author: Pete Luferenko <[email protected]> Date: Tue Aug 14 08:30:37 2018 -0700 Merge remote-tracking branch 'upstream/master' into feature/estimators commit 660ebeb Author: Pete Luferenko <[email protected]> Date: Mon Aug 13 10:01:09 2018 -0700 Added transformer scope enum for decomposability commit b82f4c6 Author: Pete Luferenko <[email protected]> Date: Fri Aug 10 14:14:29 2018 -0700 Saving/loading data commit bf097ab Author: Pete Luferenko <[email protected]> Date: Thu Aug 9 16:59:57 2018 -0700 Added auto-normalization to everything commit ef416f7 Author: Pete Luferenko <[email protected]> Date: Wed Aug 8 18:12:57 2018 -0700 No need to give schema to prediction engine commit 41df05a Merge: 215856d d0664c1 Author: Pete Luferenko <[email protected]> Date: Wed Aug 8 17:59:33 2018 -0700 Merge remote-tracking branch 'upstream/master' into feature/estimators commit 215856d Author: Pete Luferenko <[email protected]> Date: Wed Aug 8 09:45:53 2018 -0700 Added evaluation commit 184027b Merge: f85cd9c d932807 Author: Pete Luferenko <[email protected]> Date: Wed Aug 8 08:40:12 2018 -0700 Merge branch 'feature/api-proposal' into feature/estimators commit d932807 Author: Pete Luferenko <[email protected]> Date: Wed Aug 8 08:33:51 2018 -0700 Another rework of first example commit f85cd9c Author: Ivan Matantsev <[email protected]> Date: Mon Aug 6 14:41:55 2018 -0700 ReconfigurablePrediction and shorter execution commit fffa4eb Merge: 4ee52f8 77d42a5 Author: Ivan Matantsev <[email protected]> Date: Mon Aug 6 13:51:24 2018 -0700 Merge branch 'feature/estimators' of https://github.com/Zruty0/machinelearning into feature/estimators commit 4ee52f8 Author: Ivan Matantsev <[email protected]> Date: Mon Aug 6 13:51:08 2018 -0700 cross validation commit 77d42a5 Author: Pete Luferenko <[email protected]> Date: Mon Aug 6 13:32:24 2018 -0700 Small edits on top of Ivan's code commit 528a270 Merge: bf344a3 b468056 Author: Pete Luferenko <[email protected]> Date: Mon Aug 6 13:27:35 2018 -0700 Merge branch 'feature/estimators' of github.com:Zruty0/machinelearning into feature/estimators commit bf344a3 Author: Pete Luferenko <[email protected]> Date: Mon Aug 6 13:27:18 2018 -0700 Lowered execution times on some tests. Implemented new API for initial predictor and validation sets commit 1e90463 Author: Pete Luferenko <[email protected]> Date: Mon Aug 6 08:29:10 2018 -0700 Updated for PR comments commit b468056 Author: Ivan Matantsev <[email protected]> Date: Fri Aug 3 15:35:02 2018 -0700 idv and decomposableTrain commit d051138 Author: Pete Luferenko <[email protected]> Date: Fri Aug 3 09:08:50 2018 -0700 Minor tweaks to Ivan's tests commit 6c1d437 Merge: be90030 e3aea3b Author: Pete Luferenko <[email protected]> Date: Thu Aug 2 17:57:20 2018 -0700 Merge branch 'feature/estimators' of github.com:Zruty0/machinelearning into feature/estimators commit be90030 Merge: 9fd480e 96661bb Author: Pete Luferenko <[email protected]> Date: Thu Aug 2 17:56:46 2018 -0700 Merged with examples commit 96661bb Author: Pete Luferenko <[email protected]> Date: Thu Aug 2 17:47:43 2018 -0700 Rework the example with Tom commit 13bbc43 Author: Pete Luferenko <[email protected]> Date: Wed Aug 1 14:25:06 2018 -0700 Adding 'aspirational examples' and some more baseline scenarios commit e3aea3b Merge: 8b65829 9fd480e Author: Ivan Matantsev <[email protected]> Date: Wed Aug 1 15:12:28 2018 -0700 Merge branch 'feature/estimators' of https://github.com/Zruty0/machinelearning into feature/estimators commit 8b65829 Author: Ivan Matantsev <[email protected]> Date: Wed Aug 1 15:12:24 2018 -0700 add evaluation and normalizationAndCaching example commit 9fd480e Merge: 70d3fb4 89dfc82 Author: Pete Luferenko <[email protected]> Date: Wed Aug 1 14:38:33 2018 -0700 Merge remote-tracking branch 'upstream/master' into feature/estimators commit 70d3fb4 Author: Pete Luferenko <[email protected]> Date: Wed Aug 1 14:25:06 2018 -0700 Adding 'aspirational examples' and some more baseline scenarios commit 1999de8 Merge: a8dabb3 bdb742d Author: Pete Luferenko <[email protected]> Date: Tue Jul 31 16:48:25 2018 -0700 Merge remote-tracking branch 'upstream/master' into feature/estimators commit a8dabb3 Author: Pete Luferenko <[email protected]> Date: Tue Jul 31 16:47:38 2018 -0700 Added second test commit 75f09b6 Merge: 598e174 b727d10 Author: Pete Luferenko <[email protected]> Date: Mon Jul 30 18:15:01 2018 -0700 Merge remote-tracking branch 'upstream/master' into feature/estimators commit 598e174 Author: Pete Luferenko <[email protected]> Date: Thu Jul 26 20:32:23 2018 -0700 Converted one scenario to estimators commit 6299a1f Merge: 6aeb7cc 20e59a2 Author: Pete Luferenko <[email protected]> Date: Thu Jul 26 19:15:03 2018 -0700 Merge branch 'feature/api-examples' into feature/estimators commit 20e59a2 Author: Pete Luferenko <[email protected]> Date: Thu Jul 26 19:14:31 2018 -0700 Added the first two scenarios from the list. commit 6aeb7cc Author: Pete Luferenko <[email protected]> Date: Thu Jul 26 14:04:16 2018 -0700 Fixed whitespaces commit 5088ab4 Merge: f84e67a 0f94a3b Author: Pete Luferenko <[email protected]> Date: Thu Jul 26 13:59:46 2018 -0700 Merge remote-tracking branch 'upstream/master' into feature/estimators commit f84e67a Author: Pete Luferenko <[email protected]> Date: Thu Jul 26 13:56:13 2018 -0700 Some renaming to interfaces Removed non-typed estimator Fixed collections in ad-hoc tests commit 49730cb Author: Pete Luferenko <[email protected]> Date: Thu Jul 26 10:49:09 2018 -0700 Renamed and changed comments on data readers. commit 5a819d3 Author: Pete Luferenko <[email protected]> Date: Wed Jul 25 17:53:02 2018 -0700 Typed estimators commit 91dc0f2 Author: Pete Luferenko <[email protected]> Date: Tue Jul 24 20:00:57 2018 -0700 Added prediction engine to playground commit 3826648 Author: Pete Luferenko <[email protected]> Date: Tue Jul 24 19:21:04 2018 -0700 Added an ad hoc test playground commit 3b2edab Author: Pete Luferenko <[email protected]> Date: Tue Jul 24 17:44:46 2018 -0700 Initial take on IEstimator interfaces
I think I'll promote these container classes to the core assembly: it seems that they are actually useful for pipelining as well as saving. Probably there is no way to not have them in the core. #Resolved Refers to: test/Microsoft.ML.Tests/Scenarios/Api/EstimatorPipe.cs:1 in e05ca2d. [](commit_id = e05ca2d, deletion_comment = False) |
Everything is this file is temporary. I deliberately omit the Refers to: test/Microsoft.ML.Tests/Scenarios/Api/Wrappers.cs:1 in e05ca2d. [](commit_id = e05ca2d, deletion_comment = False) |
@@ -2,6 +2,9 @@ | |||
<PropertyGroup> | |||
<TargetFramework>netcoreapp2.0</TargetFramework> | |||
</PropertyGroup> | |||
<ItemGroup> | |||
<Compile Remove="Scenarios\Api\AspirationalExamples.cs" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AspirationalExamples [](start = 35, length = 20)
why? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because right now it doesn't compile, and may never compile, since it uses my own imaginary version of Pigsty, which may differ from the real one, when it appears.
In reply to: 210986324 [](ancestors = 210986324)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we even add this file? What's the point?
In reply to: 211030708 [](ancestors = 211030708,210986324)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, as an aspirational example? It's a way to document what we want to reach at the end. Another way would be to put it into Markdown somewhere, but I think I like this way somewhat better.
In reply to: 211688654 [](ancestors = 211688654,211030708,210986324)
@@ -26,4 +29,8 @@ | |||
<NativeAssemblyReference Include="SymSgdNative" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 1)
since you editing this file...
TAB! #Resolved
|
||
internal TransformerChain(ITransformer[] transformers, TransformerScope[] scopes) | ||
{ | ||
_transformers = transformers.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.ToArray() [](start = 40, length = 10)
why? #Resolved
{ | ||
_transformers = transformers.ToArray(); | ||
_scopes = scopes.ToArray(); | ||
LastTransformer = transformers.Last() as TLastTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last [](start = 43, length = 4)
LastOrDefeault? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal TransformerChain(ITransformer[] transformers, TransformerScope[] scopes) | ||
{ | ||
_transformers = transformers.ToArray(); | ||
_scopes = scopes.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.ToArray(); [](start = 28, length = 11)
I assume it was IEnumerable parameters before #Resolved
None = 0, | ||
Training = 1 << 0, | ||
Testing = 1 << 1, | ||
Scoring = 1 << 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scoring [](start = 8, length = 7)
considering what usually testing is scoring + get metrics, can we put scoring above of testing? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking more chronologically at it: first, you need to train the model. Then, you are going to test it, and then, finally, when you are satisfied with what you've got, you package it and use it for prediction / scoring.
In reply to: 210998340 [](ancestors = 210998340)
public sealed class TransformerChain<TLastTransformer> : ITransformer, ICanSaveModel | ||
where TLastTransformer : class, ITransformer | ||
{ | ||
private readonly ITransformer[] _transformers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[] [](start = 37, length = 2)
any reason why it's array? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a fixed-size sequence of objects that have a common type. I think array is a natural choice for such a structure.
In reply to: 210998941 [](ancestors = 210998941)
model.SavePipeline(env, fs); | ||
|
||
// Load model. | ||
loadedModel = CompositeReader.LoadPipeline(env, file.OpenReadStream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadedModel [](start = 20, length = 11)
should we use loadedModel for predictions? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// Train model with validation set. | ||
var trainer = new MySdca(env, new Runtime.Learners.LinearClassificationTrainer.Arguments(), "Features", "Label"); | ||
var transformer = trainer.Train(trainData, validData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformer [](start = 20, length = 11)
why it's transformer? maybe predictor, or model? #Resolved
var model = trainer.Fit(trainData); | ||
|
||
var scoredTest = model.Transform(pipeline.Transform(testData)); | ||
var metrics = new MyBinaryClassifierEvaluator(env, new BinaryClassifierEvaluator.Arguments()).Evaluate(scoredTest, "Label", "Probability"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Label", "Probability" [](start = 131, length = 22)
any reason why they are not default values for evaluate method? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe yes: to make it somewhat more obvious that you need to provide these columns for correct evaluation.
In reply to: 211031542 [](ancestors = 211031542,211001825)
using Microsoft.ML.Runtime; | ||
using Microsoft.ML.Runtime.Api; | ||
using Microsoft.ML.Runtime.CommandLine; | ||
using Microsoft.ML.Runtime.Data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we moved this files into api/estimators folder?
also header :) #Resolved
+ mylambda transform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary> | ||
/// Exception class for schema validation errors. | ||
/// </summary> | ||
public class SchemaException : Exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaException [](start = 17, length = 15)
Since it's not part of Contracts, who will ever throw it? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// For an empty chain, <typeparamref name="TLastTransformer"/> is always <see cref="ITransformer"/>. | ||
/// </summary> | ||
public sealed class TransformerChain<TLastTransformer> : ITransformer, ICanSaveModel, IEnumerable<ITransformer> | ||
where TLastTransformer : class, ITransformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where TLastTransformer : class [](start = 4, length = 30)
can we put it on separate line for esthetic purposes? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you meant. It's already on a separate line: TLastTransformer
needs to be a class, and implement ITransformer
In reply to: 211688165 [](ancestors = 211688165)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that ITransformer actually are requirement for TLastTransformer.
at least add one tab in front of 'where', otherwise my brain read this as continuation of interfaces for class, and in general across our code we always have
private class some
where T:
not
private class some
where T
In reply to: 211693191 [](ancestors = 211693191,211688165)
|
||
public IEnumerator<ITransformer> GetEnumerator() => ((IEnumerable<ITransformer>)_transformers).GetEnumerator(); | ||
|
||
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); [](start = 8, length = 59)
can you add implicit private? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's not private. Nor is it public of protected :)
In reply to: 211688343 [](ancestors = 211688343)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First version of a couple scenarios over Estimators