Skip to content

Conversion of ITrainer.Train returns predictor, accepts +TrainContext #522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 18, 2018

Conversation

TomFinley
Copy link
Contributor

@TomFinley TomFinley commented Jul 12, 2018

Fixes #509.

  • ITrainer.Train returns a predictor. There is no CreatePredictor method on the interface.

  • ITrainer.Train always accepts a TrainContext. Dataset type is no longer a generic parameter. This context object replaces the functionality previously offered by the combination of ITrainer, IValidatingTrainer, IIncrementalTrainer, and IIncrementalValidatingTrainer, which is now captured in one ITrainer.Train method with differently configured contexts.

  • All trainers updated to these two new idioms. Many trainers correspondingly improved to no longer be stateful objects. (The exceptions are those that are just too far gone to be done with less than herculean effort at refactoring them to no longer use instance fields for their computation, most notably, LBFGS and FastTree based trainers.)

  • Utility code meant to deal with the complexity of the aforementioned IT/IVT/IIT/IIVT idiom reduced considerably.

  • Opportunistic improvements to ITrainer implementors where observed.

/// <summary>
/// The training set. Cannot be <c>null</c>.
/// </summary>
public RoleMappedData Train { get; }
Copy link
Contributor Author

@TomFinley TomFinley Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad name. Probably TrainDataset or TrainingDataset would be better. #Closed

/// <summary>
/// The initial
/// </summary>
public IPredictor InitialPredictor { get; }
Copy link
Contributor Author

@TomFinley TomFinley Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial predictor is no longer typed and we cannot through ITrainer alone force the trainer to identify what it is typed as. An optional interface on trainers for this specifically may be a superior solution. This hypothetical IIncrementalTrainer<in TPred> would not have a train method, but would be more or less a marker to indicate that a trainer might be able to incrementally train on something. Probably with a bool CanIncrementallyTrain(TPred initPredictor) method?

If that comes to pass then I would remove ITrainerEx.SupportIncrementalTrain, which is a very non-specific property anyway.

It may be that similar treatment of most (all?) things on ITrainerEx may be appropriate. ITrainerEx is just a funny little bag of stuff...

namespace Microsoft.ML.Runtime
{
/// <summary>
/// Instances of this class are meant to be constructed and passed to trainers.
Copy link
Contributor Author

@TomFinley TomFinley Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insufficient documentation. While the easy case (no validation, no init predictor) is "easy," how to deal with those complexities for those that find it necessary will be important... this includes both users and implementors of ITrainer. #Closed

@@ -87,91 +81,60 @@ public interface ITrainer
PredictionKind PredictionKind { get; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to get rid of this, in future. It seems like this test on ITrainer for, say, binary classification could be just as easily done by doing something like trainer is ITrainer<IBinaryClassifierPredictor>. (This hypothetical IBinaryClassifierPredictor is a marker interface, akin to and possibly descending IPredictorProducing<float>). I think we need to do that anyway for once we nuke the concept of Signature.

@TomFinley
Copy link
Contributor Author

@dotnet-bot test Windows_NT Debug


/// <summary>
/// The validation set. Can be <c>null</c>. Note that passing a non-<c>null</c> validation set into
/// a trainer that does not support validation sets should not be considered an error condition. It
Copy link
Contributor

@zeahmed zeahmed Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an error condition [](start = 85, length = 18)

Can we encourage trainer implementer to show at least warning? if it make sense. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not. Warnings are already printed by the tooling utilizing it as we see here for validation sets and here for initial predictors. To decentralize the checking, in addition to the problems of having to insert that boilerplate somehow, also runs somewhat contrary to the philosophy behind TrainContext which is that we can add new members to that without trainer implementors having to care. Under this proposal, suddenly they'd be encouraged to care, and I'd rather keep that object as simple as possible. Plus of course if we did that things that work through those aforementioned pathways would show two warnings.


In reply to: 202148448 [](ancestors = 202148448)

Contracts.CheckValueOrNull(initPredictor);

// REVIEW: Should there be code here to ensure that the role mappings between the two are compatible?
// That is, all the role mappings are the same and the columns between them have identical types?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know by this time what are the mappings going to be used by the trainer?

I doubt if its is clear at this point. So it will be difficult to do it here. Merely doing a brute force checking will not do a job because there could be additional mappings that trainer won't utilize at all and if missing in valid will throw error here.

That's what I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm OK. So the point is, there might be a distinction between roles, but we have no way of knowing whether that role is even relevant to the trainer anyway. That's fair I think.


In reply to: 202161615 [](ancestors = 202161615)

@zeahmed
Copy link
Contributor

zeahmed commented Jul 12, 2018

public interface ITrainerEx : ITrainer

just a formatting note (not necessarily to be done in this PR). ITrainerEx comes before ITrainer in the file. I prefer it to be after ITrainer as it makes review easier. What do you think?


Refers to: src/Microsoft.ML.Core/Prediction/ITrainer.cs:32 in eb5d0ee. [](commit_id = eb5d0ee, deletion_comment = False)

@@ -79,10 +79,9 @@ public void TrainAndPredictSentimentModelWithDirectionInstantiationTest()
});

var trainRoles = new RoleMappedData(trans, label: "Label", feature: "Features");
trainer.Train(trainRoles);
var pred = trainer.Train(trainRoles);
Copy link
Contributor

@zeahmed zeahmed Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var pred = trainer.Train(trainRoles); [](start = 16, length = 37)

Looks good now in the test...:) #Closed

@TomFinley
Copy link
Contributor Author

public interface ITrainerEx : ITrainer

Makes sense I think.


In reply to: 404635227 [](ancestors = 404635227)


Refers to: src/Microsoft.ML.Core/Prediction/ITrainer.cs:32 in eb5d0ee. [](commit_id = eb5d0ee, deletion_comment = False)

ch.Warning("Ignoring " + nameof(TrainCommand.Arguments.InputModelFile) +
": Trainer does not support incremental training.");
inpPredictor = null;
}
Copy link
Contributor

@zeahmed zeahmed Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it gives me a feeling that its caller responsibility to check for this warning. However, it should be the trainer that should warn the caller that wrong set of arguments were passed because its the trainer who know which arguments to use and which one to ignore.

Also, I think this path won't be followed in ML.Net so I don't think that this warning will be printed in ML.Net, right? (though I would prefer to see this warning in case I am expecting something else from the trainer) #Closed

Copy link
Contributor Author

@TomFinley TomFinley Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't really think so. A caller that cares about whether the validation set or initialization predictor will be used really ought to check or know that. I don't want it to be the callee's responsilibity, for the reasons mentioned elsewhere.

We might provide a convenience say on ITrainerExtensions that runs that check, but the idea that every single trainer implementation has to have boilerplate is not something I'm entirely enthusiastic about. Also remember that these are things written to the console or a logger, which won't be evident to someone calling this anyway. One way or the other people are going to have to check this somehow.


In reply to: 202174443 [](ancestors = 202174443)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, make sense.


In reply to: 202182888 [](ancestors = 202182888,202174443)


TextWriter StdOut { get; }
TextWriter StdErr { get; }
bool SupportsValidation { get; }
Copy link
Member

@eerhardt eerhardt Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want XML comments on these properties, like the other properties? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


In reply to: 202181399 [](ancestors = 202181399)

/// Interface implemented by the MetalinearLearners base class.
/// Used to distinguish the MetaLinear Learners from the other learners
/// </summary>
public interface IMetaLinearTrainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on using a custom attribute instead of a marker interface here?

Check out:
https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/interface
and
https://docs.microsoft.com/en-us/visualstudio/code-quality/ca1040-avoid-empty-interfaces

Specifically:

X AVOID using marker interfaces (interfaces with no members).
If you need to mark a class as having a specific characteristic (marker), in general, use a custom attribute rather than an interface.

If your design includes empty interfaces that types are expected to implement, you are probably using an interface as a marker or a way to identify a group of types. If this identification will occur at run time, the correct way to accomplish this is to use a custom attribute. Use the presence or absence of the attribute, or the properties of the attribute, to identify the target types. If the identification must occur at compile time, then it is acceptable to use an empty interface.

Another interesting point is that I can't find any implementations of this interface.... yet.

(Obviously this shouldn't be changed in this PR, just starting the discussion with the thoughts of logging a new, separate issue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface needs to go. Its entire reason for existing is apparently so that this code here can run this test:

.Where(cls => !cls.IsHidden && !typeof(IMetaLinearTrainer).IsAssignableFrom(cls.Type));

has a way to not use a specific type of trainer. But because we dislike that trainer so much we didn't bother to ship it as part of ML.NET, so to keep that check, apparently someone thought it was a brilliant idea to introduce that interface. (I guess?)... then of course put it right into ITrainer.cs. So the entire reason why we have that interface and put it in such a central location, I suppose, is because we don't think the trainer it corresponds to is very good. That's twisted. :) However, I would prefer to solve it in the following way, by either migrating MetaLinearPredictor to ML.NET (and putting the check directly in that pipeline inference code), or annihilating it altogether. But yeah, this marker interface is a bad idea. But I feel like I don't have a good way to do that in this PR.

Regarding marker interfaces in general being bad, yeah, they probably ought to be avoided... I'm a little wary though of any sort of blanket policy against them. The common complaint against them is that interfaces are contracts and an interface with no contract is inherently confusing. (And indeed you may have observed in this code-base, we certainly don't shy away from attributes either.) But like anything, you have to weight the confusion caused by that "solution" to a problem, to the alternative.

I am imagining something: let's imagine that we had IBinaryPredictor as a marker interface on binary predictors. Then we get to, just in the type system, check whether a trainer is a binary trainer just by doing trainer is ITrainer<IBinaryPredictor>. In one fell swoop:

  1. we get to get rid of PredictionKind (I consider that enum a mistake since its existence discourages new types of predictors from being added, e.g., multilabel). We might still have it in some tooling (like a GUI), but as a central concept, it is out.

  2. that also means no more PredictionKind properties on IPredictor and ITrainer,

  3. we can correspondingly remove SignatureBinaryClassifierTrainer since the little tricks

  4. I want to get rid of our internal depenency injection framework, alternate schemes definitely work over types, but they definitely won't work over our custom little attributes marking this or that as binary classification, regression, or whatever,

  5. utilities that require specific types of trainers and predictors can ask for them. So for example, as we've discussed elsewhere, we have the idea of a OneVersusAll trainer, that would take a mechanism.

If we could turn this:

public SubComponent<TScalarTrainer, SignatureBinaryClassifierTrainer> PredictorType =
new SubComponent<TScalarTrainer, SignatureBinaryClassifierTrainer>(LinearSvm.LoadNameValue);

Into something like this:

Func<IHostEnvironment, ITrainer<IBinaryPredictor>> TrainerFactor = env => new LinearSvm(env);

Then inserting the wrong sort of trainer would immediately result in a compile-time error, not a runtime error.

So are marker interfaces inherently bad, I'm not so convinced of this, since the alternative envisioned is a runtime failure, and one for which we have to write a non-trivial amount of support code to accomodate. But, eh, I might be wrong. Just not sure about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I do definitely cringe whenever I see a marker interface (if you like that, you're going to love IPredictorProducing<...>). So I think the advice is generally good. I'm just not 100% sold that they aren't sometimes the most elegant, straightforward answer to some problems, and indeed some problems that we actually immediately have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the rule does give an escape hatch for when these are valuable at compile time.

If the identification must occur at compile time, then it is acceptable to use an empty interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we can get rid of it, according to @justinormont thanks Justin.

inpPredictor = null;
}
ch.Assert(validData == null || CanUseValidationData(trainer));
var predictor = trainer.Train(new TrainContext(data, validData, inpPredictor));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) is inpPredictor a type-o? Should it just be inPredictor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent from the caller was that this be inputPredictor so maybe they were just being weird about abbreviating it? I'll change it to inputPredictor since inp as an abbreviation for input seems odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have sworn I edited it, yet I see the file hasn't been changed... should be in next commit.

}

private const int DefaultNumModels = 50;
/// <summary> Command-line arguments </summary>
protected readonly ArgumentsBase Args;
protected readonly int NumModels;
protected internal readonly ArgumentsBase Args;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these fields all marked as protected internal now? Looking through the code, I don't see anyone outside of the class actually using them, and I would assume they would be private.

If we want to expose them outside of the class, should they be properties instead?

Copy link
Contributor Author

@TomFinley TomFinley Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @eerhardt ... blunder on my part, I got confused between that and private protected, everywhere I introduced protected internal I really meant for it to be private protected. ☹️ Except one or two places I now see that I should have just left as protected.

Last night I somehow got confused and forgot that protected internal is the one with the or in it, while private protected is like protected and internal.

private readonly bool _needCalibration;

internal EnsembleTrainerBase(ArgumentsBase args, IHostEnvironment env, string name)
protected internal EnsembleTrainerBase(ArgumentsBase args, IHostEnvironment env, string name)
Copy link
Member

@eerhardt eerhardt Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this class is abstract, there is no point in making this protected internal. protected internal means it can be called from either a subclass, or any other class in this current assembly (or friend assembly). Since it is abstract, it can only be called from a sub-class - no other code can directly instantiate this class. So instead, I think this should just be marked protected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup yup... I know, I'm silly. I meant to actually make it so that no one outside the assembly could subclass it.

@@ -143,14 +133,15 @@ private void TrainCore(IChannel ch, RoleMappedData data)
validationDataSetProportion = Math.Max(validationDataSetProportion, stackingTrainer.ValidationDatasetProportion);

var needMetrics = Args.ShowMetrics || Combiner is IWeightedAverager;
var Models = new List<FeatureSubsetModel<IPredictorProducing<TOutput>>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) local variables should be camelCased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Thanks @eerhardt . My strategy for "de-statifying" the objects was sometimes to make the fields local variables, fix any problems, then rename them, but I forgot the rename step here. 😄

}

protected internal abstract TPredictor CreatePredictor(List<FeatureSubsetModel<IPredictorProducing<TOutput>>> models);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thoughts here - are we expecting other callers from within this assembly to call this method? I think it should just be marked protected and not protected internal.

NOTE: If you want the intersection between protected and internal (meaning only sub-classes that are in this assembly can see this method), then you are looking for a new C# 7.2 feature: private protected. See https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/private-protected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I wanted, thanks @eerhardt !

@@ -1887,7 +1880,7 @@ private void MakeBoundariesAndCheckLabels(out long missingInstances, out long to
missingInstances = cursor.BadFeaturesRowCount;
}

ch.Check(totalInstances > 0, TrainerBase.NoTrainingInstancesMessage);
ch.Check(totalInstances > 0, TrainerBase<IPredictor>.NoTrainingInstancesMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unfortunate usage of generics...

Some thoughts:

  • Should this message really be public?
  • Should this message be moved to a non-generic class, so it can be accessed without the generic type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this too. Not quite sure... this field has about three total usages, and the idea of having the message standardize "hey you have no instances" I feel might be of limited enough worth that I might want to just nuke it altogether. I think it causes more confusion than it saves.

/// </summary>
public interface IMetaLinearTrainer
public interface ITrainerEx : ITrainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about the future proofing of this interface. What about exposing a capabilities class that could be extended?

Copy link
Contributor Author

@TomFinley TomFinley Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh... actually I like that idea a lot. Let me see if I understand this: There will be a property ITrainer.Capabilities that is an instance of the sealed class TrainerCapabilities, which contains information on this trainer. Is that so? Once that exists, we can get rid of ITrainerEx. I can do that in a future commit @ericstj as soon as you tell me if I am right (or wrong, in which case feel free to share more details).

Copy link
Member

@ericstj ericstj Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was what I was thinking. Bonus that you can get rid of Ex. Seems like it can be reasonably lightweight to implement so it shouldn’t increase the burden of implementing the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. If though it contains information about whether it needs normalization, or desires caching, or whatever, that's less about capabilities. Hmmm, I don't like how generic the name is, but would TrainerInfo be that bad of a name, compared to TrainerCapabilities? I don't like how generic the name info is though... everything is info. 😛 Anyway I'll just whip something up, we can always change the name.

@@ -992,8 +985,7 @@ public StandardArrayDualsTable(int length)
_duals = new Float[length];
}

public override Float this[long index]
{
public override Float this[long index] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) inconsistent formatting here with the curly brace on the same line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. My VS autoformatting had a strange setting about properties with newlines somehow, thanks Eric.

// The discretization step renders this trainer non-parametric, and therefore it does not need normalization.
// Also since it builds its own internal discretized columnar structures, it cannot benefit from caching.
// Finally, even the binary classifiers, being logitboost, tend to not benefit from external calibration.
Info = new TrainerInfo(normalization: false, caching: false, calibration: this is FastForestClassification);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calibration: this is FastForestClassification

It feels like that concrete classes should be able to decide this. The base class shouldn't decide which of its concrete classes should have this true/false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for the GamTrainer.


In reply to: 202365055 [](ancestors = 202365055)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough


public MultiClassNaiveBayesTrainer(IHostEnvironment env, Arguments args)
: base(env, LoadName)
{
Host.CheckValue(args, nameof(args));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question, but why do we have the args parameter? I don't see it being used. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the class itself is empty, since this has no settings, except in entry-point land. I guess for consistency?

/// <param name="train">Will set <see cref="TrainingSet"/> to this value. This must be specified</param>
/// <param name="valid">Will set <see cref="ValidationSet"/> to this value if specified</param>
/// <param name="initPredictor">Will set <see cref="InitialPredictor"/> to this value if specified</param>
public TrainContext(RoleMappedData train, RoleMappedData valid = null, IPredictor initPredictor = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't abbreviate here


public MultiClassNaiveBayesTrainer(IHostEnvironment env, Arguments args)
: base(env, LoadName)
{
Host.CheckValue(args, nameof(args));
Info = new TrainerInfo(normalization: false, caching: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is immutable and the same for all instances you can use a static singleton.

@TomFinley TomFinley force-pushed the tfinley/Train branch 2 times, most recently from 319a2fe to f2096e9 Compare July 14, 2018 03:23
// etc. This interface seems like the most natural conduit for that sort
// of extra information.

// REVIEW: Can we please have consistent naming here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to log issues for things like this? Once we ship the API we won't be able to change it. So this REVIEW comment has a lifetime. In my experience TODO comments in code get missed and rarely addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Opened #543, and will get rid of this comment.

@@ -207,8 +207,8 @@ public class TestHistory : Test
protected IList<TestResult[]> History;
protected int Iteration { get; private set; }

public TestResult BestResult { get; protected internal set; }
public int BestIteration { get; protected internal set; }
public TestResult BestResult { get; private protected set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these just be protected? It seems odd that only inheriting classes from in this assembly can set these properties, but inheriting classes outside can't.

Maybe changing the constructor to internal would be better if we don't want external derived classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... this entire assembly needs some serious work on what is public and internal. It isn't very consistent. But we can certainly make this small step.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, @TomFinley. Thanks for taking care of this.

Tom Finley and others added 12 commits July 17, 2018 09:45
* `ITrainer.Train` returns a predictor. There is no `CreatePredictor` method
  on the interface.

* `ITrainer.Train` always accepts a `TrainContext`. Dataset type is no longer
  a generic parameter. This context object replaces the functionality
  previously offered by the combination of `ITrainer`, `IValidatingTrainer`,
  `IIncrementalTrainer`, and `IIncrementalValidatingTrainer`, which is now
  captured in one `ITrainer.Train` method with differently configured
  contexts.

* All trainers updated to these two new idioms. Many trainers correspondingly
  improved to no longer be stateful objects. (The exceptions are those that
  are just too far gone to be done with less than herculean effort at
  refactoring them to no longer use instance fields for their computation.
  Most notably, LBFGS and FastTree based trainers.)

* Utility code meant to deal with the complexity of the aforementioned
  `IT/IVT/IIT/IIVT` idiom reduced considerably.

* Opportunistic improvements to `ITrainer` implementors where observed.
Copy link
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like this direction!

@TomFinley TomFinley merged commit 0e37508 into dotnet:master Jul 18, 2018
@TomFinley TomFinley deleted the tfinley/Train branch July 19, 2018 15:57
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
…dotnet#522)

* Conversion of ITrainer.Train returns predictor, accepts TrainContext

* `ITrainer.Train` returns a predictor. There is no `CreatePredictor` method
  on the interface.

* `ITrainer.Train` always accepts a `TrainContext`. Dataset type is no longer
  a generic parameter. This context object replaces the functionality
  previously offered by the combination of `ITrainer`, `IValidatingTrainer`,
  `IIncrementalTrainer`, and `IIncrementalValidatingTrainer`, which is now
  captured in one `ITrainer.Train` method with differently configured
  contexts.

* All trainers updated to these two new idioms. Many trainers correspondingly
  improved to no longer be stateful objects. (The exceptions are those that
  are just too far gone to be done with less than herculean effort at
  refactoring them to no longer use instance fields for their computation.
  Most notably, LBFGS and FastTree based trainers.)

* Utility code meant to deal with the complexity of the aforementioned
  `IT/IVT/IIT/IIVT` idiom reduced considerably.

* Opportunistic improvements to `ITrainer` implementors where observed.

* TrainerInfo introduction, ITrainerEx destruction

* Remove `IMetaLinearTrainer`
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants