Skip to content

Public Interface of RegressionTree and TreeEnsemble #2243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Feb 1, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Jan 25, 2019

This PR proposes some changes to make RegressionTree and TreeEnsemble not mutable to users. Our strategy is

  1. Create wrapper classes RegressionTree and TreeEnsemble over their internal relatives. Those wrapper classes are not mutable.
  2. Internalize everything (around InternalRegressionTree and InternalTreeEnsemble) which should not be public.
  3. Internalize public constructors such as FastForestClassificationModelParameters, FastForestRegressionModelParameters, etc.
  4. Some cleaning. For example, changing Float to float and removing unused using statements.

Hopefully this will fix #1960.

@wschin wschin self-assigned this Jan 25, 2019
@codecov
Copy link

codecov bot commented Jan 25, 2019

Codecov Report

Merging #2243 into master will increase coverage by 0.06%.
The diff coverage is 94.67%.

@@            Coverage Diff             @@
##           master    #2243      +/-   ##
==========================================
+ Coverage   71.17%   71.24%   +0.06%     
==========================================
  Files         780      783       +3     
  Lines      140404   140721     +317     
  Branches    16053    16086      +33     
==========================================
+ Hits        99936   100259     +323     
+ Misses      36018    36008      -10     
- Partials     4450     4454       +4
Flag Coverage Δ
#Debug 71.24% <94.67%> (+0.06%) ⬆️
#production 67.6% <89.55%> (+0.04%) ⬆️
#test 85.28% <100%> (+0.08%) ⬆️


namespace Microsoft.ML.Trainers.FastTree.Internal
{
public class RegressionTree
public class RegressionTreeView
Copy link
Contributor

@TomFinley TomFinley Jan 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegressionTreeView [](start = 17, length = 18)

RegressionTreeView [](start = 17, length = 18)

From an external perspective this name is problematic. Sure you've architectured it as a wrapper, but the name "view" asks people to wonder, a view of what? If that regression tree below were public you might have a case, but that is something we want to hide. I don't think you're going to get out of this without changing the name of the internal thing. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about DecisionTreeRegressor for this public class (inspired by ONNX)?


In reply to: 251182801 [](ancestors = 251182801)

Copy link
Contributor

@TomFinley TomFinley Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it doesn't perform regression, so calling it a Regressor makes no sense? How about just DecisionTree? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be RegressionTree in iteration 21. For internal classes, we use InternalRegression and InternalTreeEnsemble.


In reply to: 252497974 [](ancestors = 252497974)

@@ -15,7 +15,8 @@ namespace Microsoft.ML.Trainers.FastTree.Internal
/// https://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf
/// </summary>
/// <remarks>Author was Yasser Ganjisaffar during his internship.</remarks>
public class LassoBasedEnsembleCompressor : IEnsembleCompressor<short>
[BestFriend]
Copy link
Contributor

@TomFinley TomFinley Jan 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestFriend] [](start = 4, length = 12)

While this probably should be internal, not best friend probably? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right.


In reply to: 251186572 [](ancestors = 251186572)

@@ -16,7 +16,37 @@

namespace Microsoft.ML.Trainers.FastTree.Internal
{
public class TreeEnsemble
public class TreeEnsembleModel
Copy link
Member Author

@wschin wschin Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TreeEnsembleModel [](start = 17, length = 17)

TreeEnsembleModel [](start = 17, length = 17)

Would DecisionTreeRegressorCollection be a better name? #Resolved

Copy link
Contributor

@TomFinley TomFinley Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like TreeEnsembleModel, it is just a TreeEnsemble. Everywhere else we use the name "model" that carries with it certain expectations that don't quite fit what this is. This is just an ensemble of trees, so it might be nice if it were named that. The names of the classes themselves make sense to me, I just don't like the mutable behavior on them. So maybe just rename the internal classes to whatever you like (since they're internal, I care less about that), and give the public surface the sensible name. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will rename TreeEnsembleModel to TreeEnsemble and TreeRegressor to RegressionTree in iteration 21.


In reply to: 252498510 [](ancestors = 252498510)

@wschin wschin force-pushed the public-tree branch 2 times, most recently from b7563a2 to 403e019 Compare January 28, 2019 22:26
@wschin wschin requested review from codemzs and Ivanidzo4ka January 28, 2019 22:27
@wschin wschin changed the title [WIP] Public Interface of RegressionTree and TreeEnsemble Public Interface of RegressionTree and TreeEnsemble Jan 28, 2019
@@ -129,7 +129,7 @@ protected override ObjectiveFunctionBase ConstructObjFunc(IChannel ch)
return new ObjectiveImpl(TrainSet, Args);
}

protected override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch)
internal override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch)
Copy link
Contributor

@TomFinley TomFinley Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal [](start = 8, length = 8)

private protected is made for cases such as this. We want this to be internal and protected, and that is what that will do. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much Tom! It works perfectly as you suggested.


In reply to: 251651933 [](ancestors = 251651933)

@@ -124,7 +124,7 @@ protected override ObjectiveFunctionBase ConstructObjFunc(IChannel ch)
return new ObjectiveImpl(TrainSet, Args);
}

protected override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch)
internal override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch)
Copy link
Contributor

@TomFinley TomFinley Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal [](start = 8, length = 8)

Another candidate for private protected. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All ConstructOptimizationAlgorithm are private protected now because we have

    public abstract class FastTreeTrainerBase<TArgs, TTransformer, TModel>
    ....
        private protected abstract OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch);
    ....

In reply to: 251652163 [](ancestors = 251652163)

@@ -494,7 +494,7 @@ private static FastTreeRegressionModelParameters Create(IHostEnvironment env, Mo
public override PredictionKind PredictionKind => PredictionKind.Regression;
}

public static partial class FastTree
public static partial class FastTreeEntryPoint
Copy link
Contributor

@TomFinley TomFinley Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastTreeEntryPoint [](start = 32, length = 18)

FYI I have a change to make this internal, so perhaps this will not be necessary. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will handle some merge conflicts but let me keep this for now for building it on CI.


In reply to: 251652346 [](ancestors = 251652346)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. If this is to be a general rule, which it might be, we may want to make this general policy... there are lots of classes "like" this that exist merely as holders for static methods like Sdca.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to make it general. Having the same name of class and namespace doesn't look very good to me.


In reply to: 251949357 [](ancestors = 251949357)

@TomFinley
Copy link
Contributor

Adding @sfilipi , @zeahmed, and @glebuk since they often have thoughts about APIs.

Add a test

Fix typo

Some docs

Seperate TreeEnsemble
{
_tree = tree;

_lteChild = ImmutableArray.Create(_tree.LteChild, 0, _tree.NumNodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableArray [](start = 24, length = 14)

This can do for now, but FYI wrapping in an immutable array creates a copy. Not a deal breaker, just something you ought to keep in mind.

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wschin ! I wonder if there's a way to validate scenarios against which people were using this structure to see if they're still well served. I guess we can tell from the screaming later. Anyway, thank you again!!

@wschin
Copy link
Member Author

wschin commented Feb 1, 2019

@TomFinley, Many thanks! I will refine the doc and make private protected in another PR today. For an scenarios, I'd like to use FastTree binary classifier as an example (given its popularity) but I need to fix #2319 first.

@wschin wschin merged commit 3eccc93 into dotnet:master Feb 1, 2019
@wschin wschin deleted the public-tree branch February 1, 2019 17:11
wschin added a commit to wschin/machinelearning that referenced this pull request Feb 1, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make FastTree/LightGBM learned model suitable for public consumption
6 participants