-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Conversion of ITrainer.Train returns predictor, accepts +TrainContext #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// <summary> | ||
/// The training set. Cannot be <c>null</c>. | ||
/// </summary> | ||
public RoleMappedData Train { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad name. Probably TrainDataset
or TrainingDataset
would be better. #Closed
/// <summary> | ||
/// The initial | ||
/// </summary> | ||
public IPredictor InitialPredictor { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial predictor is no longer typed and we cannot through ITrainer
alone force the trainer to identify what it is typed as. An optional interface on trainers for this specifically may be a superior solution. This hypothetical IIncrementalTrainer<in TPred>
would not have a train method, but would be more or less a marker to indicate that a trainer might be able to incrementally train on something. Probably with a bool CanIncrementallyTrain(TPred initPredictor)
method?
If that comes to pass then I would remove ITrainerEx.SupportIncrementalTrain
, which is a very non-specific property anyway.
It may be that similar treatment of most (all?) things on ITrainerEx
may be appropriate. ITrainerEx
is just a funny little bag of stuff...
namespace Microsoft.ML.Runtime | ||
{ | ||
/// <summary> | ||
/// Instances of this class are meant to be constructed and passed to trainers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insufficient documentation. While the easy case (no validation, no init predictor) is "easy," how to deal with those complexities for those that find it necessary will be important... this includes both users and implementors of ITrainer
. #Closed
@@ -87,91 +81,60 @@ public interface ITrainer | |||
PredictionKind PredictionKind { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to get rid of this, in future. It seems like this test on ITrainer
for, say, binary classification could be just as easily done by doing something like trainer is ITrainer<IBinaryClassifierPredictor>
. (This hypothetical IBinaryClassifierPredictor
is a marker interface, akin to and possibly descending IPredictorProducing<float>
). I think we need to do that anyway for once we nuke the concept of Signature
.
@dotnet-bot test Windows_NT Debug |
|
||
/// <summary> | ||
/// The validation set. Can be <c>null</c>. Note that passing a non-<c>null</c> validation set into | ||
/// a trainer that does not support validation sets should not be considered an error condition. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an error condition [](start = 85, length = 18)
Can we encourage trainer implementer to show at least warning? if it make sense. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not. Warnings are already printed by the tooling utilizing it as we see here for validation sets and here for initial predictors. To decentralize the checking, in addition to the problems of having to insert that boilerplate somehow, also runs somewhat contrary to the philosophy behind TrainContext
which is that we can add new members to that without trainer implementors having to care. Under this proposal, suddenly they'd be encouraged to care, and I'd rather keep that object as simple as possible. Plus of course if we did that things that work through those aforementioned pathways would show two warnings.
In reply to: 202148448 [](ancestors = 202148448)
Contracts.CheckValueOrNull(initPredictor); | ||
|
||
// REVIEW: Should there be code here to ensure that the role mappings between the two are compatible? | ||
// That is, all the role mappings are the same and the columns between them have identical types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know by this time what are the mappings going to be used by the trainer?
I doubt if its is clear at this point. So it will be difficult to do it here. Merely doing a brute force checking will not do a job because there could be additional mappings that trainer
won't utilize at all and if missing in valid
will throw error here.
That's what I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm OK. So the point is, there might be a distinction between roles, but we have no way of knowing whether that role is even relevant to the trainer anyway. That's fair I think.
In reply to: 202161615 [](ancestors = 202161615)
just a formatting note (not necessarily to be done in this PR). Refers to: src/Microsoft.ML.Core/Prediction/ITrainer.cs:32 in eb5d0ee. [](commit_id = eb5d0ee, deletion_comment = False) |
@@ -79,10 +79,9 @@ public void TrainAndPredictSentimentModelWithDirectionInstantiationTest() | |||
}); | |||
|
|||
var trainRoles = new RoleMappedData(trans, label: "Label", feature: "Features"); | |||
trainer.Train(trainRoles); | |||
var pred = trainer.Train(trainRoles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var pred = trainer.Train(trainRoles); [](start = 16, length = 37)
Looks good now in the test...:) #Closed
ch.Warning("Ignoring " + nameof(TrainCommand.Arguments.InputModelFile) + | ||
": Trainer does not support incremental training."); | ||
inpPredictor = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it gives me a feeling that its caller responsibility to check for this warning. However, it should be the trainer
that should warn the caller that wrong set of arguments were passed because its the trainer who know which arguments to use and which one to ignore.
Also, I think this path won't be followed in ML.Net so I don't think that this warning will be printed in ML.Net, right? (though I would prefer to see this warning in case I am expecting something else from the trainer) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't really think so. A caller that cares about whether the validation set or initialization predictor will be used really ought to check or know that. I don't want it to be the callee's responsilibity, for the reasons mentioned elsewhere.
We might provide a convenience say on ITrainerExtensions
that runs that check, but the idea that every single trainer implementation has to have boilerplate is not something I'm entirely enthusiastic about. Also remember that these are things written to the console or a logger, which won't be evident to someone calling this anyway. One way or the other people are going to have to check this somehow.
In reply to: 202174443 [](ancestors = 202174443)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
TextWriter StdOut { get; } | ||
TextWriter StdErr { get; } | ||
bool SupportsValidation { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want XML comments on these properties, like the other properties? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Interface implemented by the MetalinearLearners base class. | ||
/// Used to distinguish the MetaLinear Learners from the other learners | ||
/// </summary> | ||
public interface IMetaLinearTrainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on using a custom attribute instead of a marker interface here?
Check out:
https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/interface
and
https://docs.microsoft.com/en-us/visualstudio/code-quality/ca1040-avoid-empty-interfaces
Specifically:
X AVOID using marker interfaces (interfaces with no members).
If you need to mark a class as having a specific characteristic (marker), in general, use a custom attribute rather than an interface.
If your design includes empty interfaces that types are expected to implement, you are probably using an interface as a marker or a way to identify a group of types. If this identification will occur at run time, the correct way to accomplish this is to use a custom attribute. Use the presence or absence of the attribute, or the properties of the attribute, to identify the target types. If the identification must occur at compile time, then it is acceptable to use an empty interface.
Another interesting point is that I can't find any implementations of this interface.... yet.
(Obviously this shouldn't be changed in this PR, just starting the discussion with the thoughts of logging a new, separate issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface needs to go. Its entire reason for existing is apparently so that this code here can run this test:
.Where(cls => !cls.IsHidden && !typeof(IMetaLinearTrainer).IsAssignableFrom(cls.Type)); |
has a way to not use a specific type of trainer. But because we dislike that trainer so much we didn't bother to ship it as part of ML.NET, so to keep that check, apparently someone thought it was a brilliant idea to introduce that interface. (I guess?)... then of course put it right into ITrainer.cs
. So the entire reason why we have that interface and put it in such a central location, I suppose, is because we don't think the trainer it corresponds to is very good. That's twisted. :) However, I would prefer to solve it in the following way, by either migrating MetaLinearPredictor
to ML.NET (and putting the check directly in that pipeline inference code), or annihilating it altogether. But yeah, this marker interface is a bad idea. But I feel like I don't have a good way to do that in this PR.
Regarding marker interfaces in general being bad, yeah, they probably ought to be avoided... I'm a little wary though of any sort of blanket policy against them. The common complaint against them is that interfaces are contracts and an interface with no contract is inherently confusing. (And indeed you may have observed in this code-base, we certainly don't shy away from attributes either.) But like anything, you have to weight the confusion caused by that "solution" to a problem, to the alternative.
I am imagining something: let's imagine that we had IBinaryPredictor
as a marker interface on binary predictors. Then we get to, just in the type system, check whether a trainer is a binary trainer just by doing trainer is ITrainer<IBinaryPredictor>
. In one fell swoop:
-
we get to get rid of
PredictionKind
(I consider thatenum
a mistake since its existence discourages new types of predictors from being added, e.g., multilabel). We might still have it in some tooling (like a GUI), but as a central concept, it is out. -
that also means no more
PredictionKind
properties onIPredictor
andITrainer
, -
we can correspondingly remove
SignatureBinaryClassifierTrainer
since the little tricks -
I want to get rid of our internal depenency injection framework, alternate schemes definitely work over types, but they definitely won't work over our custom little attributes marking this or that as binary classification, regression, or whatever,
-
utilities that require specific types of trainers and predictors can ask for them. So for example, as we've discussed elsewhere, we have the idea of a
OneVersusAll
trainer, that would take a mechanism.
If we could turn this:
machinelearning/src/Microsoft.ML.StandardLearners/Standard/MultiClass/MetaMulticlassTrainer.cs
Lines 26 to 27 in 6503167
public SubComponent<TScalarTrainer, SignatureBinaryClassifierTrainer> PredictorType = | |
new SubComponent<TScalarTrainer, SignatureBinaryClassifierTrainer>(LinearSvm.LoadNameValue); |
Into something like this:
Func<IHostEnvironment, ITrainer<IBinaryPredictor>> TrainerFactor = env => new LinearSvm(env);
Then inserting the wrong sort of trainer would immediately result in a compile-time error, not a runtime error.
So are marker interfaces inherently bad, I'm not so convinced of this, since the alternative envisioned is a runtime failure, and one for which we have to write a non-trivial amount of support code to accomodate. But, eh, I might be wrong. Just not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, I do definitely cringe whenever I see a marker interface (if you like that, you're going to love IPredictorProducing<...>
). So I think the advice is generally good. I'm just not 100% sold that they aren't sometimes the most elegant, straightforward answer to some problems, and indeed some problems that we actually immediately have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the rule does give an escape hatch for when these are valuable at compile time.
If the identification must occur at compile time, then it is acceptable to use an empty interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we can get rid of it, according to @justinormont thanks Justin.
inpPredictor = null; | ||
} | ||
ch.Assert(validData == null || CanUseValidationData(trainer)); | ||
var predictor = trainer.Train(new TrainContext(data, validData, inpPredictor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) is inpPredictor
a type-o? Should it just be inPredictor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intent from the caller was that this be inputPredictor
so maybe they were just being weird about abbreviating it? I'll change it to inputPredictor
since inp
as an abbreviation for input
seems odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have sworn I edited it, yet I see the file hasn't been changed... should be in next commit.
} | ||
|
||
private const int DefaultNumModels = 50; | ||
/// <summary> Command-line arguments </summary> | ||
protected readonly ArgumentsBase Args; | ||
protected readonly int NumModels; | ||
protected internal readonly ArgumentsBase Args; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these fields all marked as protected internal
now? Looking through the code, I don't see anyone outside of the class actually using them, and I would assume they would be private
.
If we want to expose them outside of the class, should they be properties instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @eerhardt ... blunder on my part, I got confused between that and private protected
, everywhere I introduced protected internal
I really meant for it to be private protected
. protected
.
Last night I somehow got confused and forgot that protected internal
is the one with the or in it, while private protected
is like protected and internal.
private readonly bool _needCalibration; | ||
|
||
internal EnsembleTrainerBase(ArgumentsBase args, IHostEnvironment env, string name) | ||
protected internal EnsembleTrainerBase(ArgumentsBase args, IHostEnvironment env, string name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this class is abstract
, there is no point in making this protected internal
. protected internal
means it can be called from either a subclass, or any other class in this current assembly (or friend assembly). Since it is abstract
, it can only be called from a sub-class - no other code can directly instantiate this class. So instead, I think this should just be marked protected
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup yup... I know, I'm silly. I meant to actually make it so that no one outside the assembly could subclass it.
@@ -143,14 +133,15 @@ private void TrainCore(IChannel ch, RoleMappedData data) | |||
validationDataSetProportion = Math.Max(validationDataSetProportion, stackingTrainer.ValidationDatasetProportion); | |||
|
|||
var needMetrics = Args.ShowMetrics || Combiner is IWeightedAverager; | |||
var Models = new List<FeatureSubsetModel<IPredictorProducing<TOutput>>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) local variables should be camelCased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Thanks @eerhardt . My strategy for "de-statifying" the objects was sometimes to make the fields local variables, fix any problems, then rename them, but I forgot the rename step here. 😄
} | ||
|
||
protected internal abstract TPredictor CreatePredictor(List<FeatureSubsetModel<IPredictorProducing<TOutput>>> models); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar thoughts here - are we expecting other callers from within this assembly to call this method? I think it should just be marked protected
and not protected internal
.
NOTE: If you want the intersection between protected
and internal
(meaning only sub-classes that are in this assembly can see this method), then you are looking for a new C# 7.2 feature: private protected
. See https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/private-protected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I wanted, thanks @eerhardt !
@@ -1887,7 +1880,7 @@ private void MakeBoundariesAndCheckLabels(out long missingInstances, out long to | |||
missingInstances = cursor.BadFeaturesRowCount; | |||
} | |||
|
|||
ch.Check(totalInstances > 0, TrainerBase.NoTrainingInstancesMessage); | |||
ch.Check(totalInstances > 0, TrainerBase<IPredictor>.NoTrainingInstancesMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unfortunate usage of generics...
Some thoughts:
- Should this message really be public?
- Should this message be moved to a non-generic class, so it can be accessed without the generic type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this too. Not quite sure... this field has about three total usages, and the idea of having the message standardize "hey you have no instances" I feel might be of limited enough worth that I might want to just nuke it altogether. I think it causes more confusion than it saves.
/// </summary> | ||
public interface IMetaLinearTrainer | ||
public interface ITrainerEx : ITrainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about the future proofing of this interface. What about exposing a capabilities class that could be extended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh... actually I like that idea a lot. Let me see if I understand this: There will be a property ITrainer.Capabilities
that is an instance of the sealed class TrainerCapabilities
, which contains information on this trainer. Is that so? Once that exists, we can get rid of ITrainerEx
. I can do that in a future commit @ericstj as soon as you tell me if I am right (or wrong, in which case feel free to share more details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was what I was thinking. Bonus that you can get rid of Ex. Seems like it can be reasonably lightweight to implement so it shouldn’t increase the burden of implementing the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. If though it contains information about whether it needs normalization, or desires caching, or whatever, that's less about capabilities. Hmmm, I don't like how generic the name is, but would TrainerInfo
be that bad of a name, compared to TrainerCapabilities
? I don't like how generic the name info is though... everything is info. 😛 Anyway I'll just whip something up, we can always change the name.
@@ -992,8 +985,7 @@ public StandardArrayDualsTable(int length) | |||
_duals = new Float[length]; | |||
} | |||
|
|||
public override Float this[long index] | |||
{ | |||
public override Float this[long index] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) inconsistent formatting here with the curly brace on the same line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. My VS autoformatting had a strange setting about properties with newlines somehow, thanks Eric.
// The discretization step renders this trainer non-parametric, and therefore it does not need normalization. | ||
// Also since it builds its own internal discretized columnar structures, it cannot benefit from caching. | ||
// Finally, even the binary classifiers, being logitboost, tend to not benefit from external calibration. | ||
Info = new TrainerInfo(normalization: false, caching: false, calibration: this is FastForestClassification); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calibration: this is FastForestClassification
It feels like that concrete classes should be able to decide this. The base class shouldn't decide which of its concrete classes should have this true/false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
|
||
public MultiClassNaiveBayesTrainer(IHostEnvironment env, Arguments args) | ||
: base(env, LoadName) | ||
{ | ||
Host.CheckValue(args, nameof(args)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly question, but why do we have the args
parameter? I don't see it being used. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the class itself is empty, since this has no settings, except in entry-point land. I guess for consistency?
/// <param name="train">Will set <see cref="TrainingSet"/> to this value. This must be specified</param> | ||
/// <param name="valid">Will set <see cref="ValidationSet"/> to this value if specified</param> | ||
/// <param name="initPredictor">Will set <see cref="InitialPredictor"/> to this value if specified</param> | ||
public TrainContext(RoleMappedData train, RoleMappedData valid = null, IPredictor initPredictor = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't abbreviate here
|
||
public MultiClassNaiveBayesTrainer(IHostEnvironment env, Arguments args) | ||
: base(env, LoadName) | ||
{ | ||
Host.CheckValue(args, nameof(args)); | ||
Info = new TrainerInfo(normalization: false, caching: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is immutable and the same for all instances you can use a static singleton.
319a2fe
to
f2096e9
Compare
// etc. This interface seems like the most natural conduit for that sort | ||
// of extra information. | ||
|
||
// REVIEW: Can we please have consistent naming here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to log issues for things like this? Once we ship the API we won't be able to change it. So this REVIEW comment has a lifetime. In my experience TODO comments in code get missed and rarely addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Opened #543, and will get rid of this comment.
@@ -207,8 +207,8 @@ public class TestHistory : Test | |||
protected IList<TestResult[]> History; | |||
protected int Iteration { get; private set; } | |||
|
|||
public TestResult BestResult { get; protected internal set; } | |||
public int BestIteration { get; protected internal set; } | |||
public TestResult BestResult { get; private protected set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these just be protected
? It seems odd that only inheriting classes from in this assembly can set these properties, but inheriting classes outside can't.
Maybe changing the constructor to internal
would be better if we don't want external derived classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... this entire assembly needs some serious work on what is public and internal. It isn't very consistent. But we can certainly make this small step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, @TomFinley. Thanks for taking care of this.
* `ITrainer.Train` returns a predictor. There is no `CreatePredictor` method on the interface. * `ITrainer.Train` always accepts a `TrainContext`. Dataset type is no longer a generic parameter. This context object replaces the functionality previously offered by the combination of `ITrainer`, `IValidatingTrainer`, `IIncrementalTrainer`, and `IIncrementalValidatingTrainer`, which is now captured in one `ITrainer.Train` method with differently configured contexts. * All trainers updated to these two new idioms. Many trainers correspondingly improved to no longer be stateful objects. (The exceptions are those that are just too far gone to be done with less than herculean effort at refactoring them to no longer use instance fields for their computation. Most notably, LBFGS and FastTree based trainers.) * Utility code meant to deal with the complexity of the aforementioned `IT/IVT/IIT/IIVT` idiom reduced considerably. * Opportunistic improvements to `ITrainer` implementors where observed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this direction!
…dotnet#522) * Conversion of ITrainer.Train returns predictor, accepts TrainContext * `ITrainer.Train` returns a predictor. There is no `CreatePredictor` method on the interface. * `ITrainer.Train` always accepts a `TrainContext`. Dataset type is no longer a generic parameter. This context object replaces the functionality previously offered by the combination of `ITrainer`, `IValidatingTrainer`, `IIncrementalTrainer`, and `IIncrementalValidatingTrainer`, which is now captured in one `ITrainer.Train` method with differently configured contexts. * All trainers updated to these two new idioms. Many trainers correspondingly improved to no longer be stateful objects. (The exceptions are those that are just too far gone to be done with less than herculean effort at refactoring them to no longer use instance fields for their computation. Most notably, LBFGS and FastTree based trainers.) * Utility code meant to deal with the complexity of the aforementioned `IT/IVT/IIT/IIVT` idiom reduced considerably. * Opportunistic improvements to `ITrainer` implementors where observed. * TrainerInfo introduction, ITrainerEx destruction * Remove `IMetaLinearTrainer`
Fixes #509.
ITrainer.Train
returns a predictor. There is noCreatePredictor
method on the interface.ITrainer.Train
always accepts aTrainContext
. Dataset type is no longer a generic parameter. This context object replaces the functionality previously offered by the combination ofITrainer
,IValidatingTrainer
,IIncrementalTrainer
, andIIncrementalValidatingTrainer
, which is now captured in oneITrainer.Train
method with differently configured contexts.All trainers updated to these two new idioms. Many trainers correspondingly improved to no longer be stateful objects. (The exceptions are those that are just too far gone to be done with less than herculean effort at refactoring them to no longer use instance fields for their computation, most notably, LBFGS and FastTree based trainers.)
Utility code meant to deal with the complexity of the aforementioned
IT/IVT/IIT/IIVT
idiom reduced considerably.Opportunistic improvements to
ITrainer
implementors where observed.