-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Public Interface of RegressionTree and TreeEnsemble #2243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2243 +/- ##
==========================================
+ Coverage 71.17% 71.24% +0.06%
==========================================
Files 780 783 +3
Lines 140404 140721 +317
Branches 16053 16086 +33
==========================================
+ Hits 99936 100259 +323
+ Misses 36018 36008 -10
- Partials 4450 4454 +4
|
|
||
namespace Microsoft.ML.Trainers.FastTree.Internal | ||
{ | ||
public class RegressionTree | ||
public class RegressionTreeView |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RegressionTreeView [](start = 17, length = 18)
RegressionTreeView [](start = 17, length = 18)
From an external perspective this name is problematic. Sure you've architectured it as a wrapper, but the name "view" asks people to wonder, a view of what? If that regression tree below were public you might have a case, but that is something we want to hide. I don't think you're going to get out of this without changing the name of the internal thing. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about DecisionTreeRegressor
for this public class (inspired by ONNX)?
In reply to: 251182801 [](ancestors = 251182801)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it doesn't perform regression, so calling it a Regressor
makes no sense? How about just DecisionTree
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be RegressionTree in iteration 21. For internal classes, we use InternalRegression and InternalTreeEnsemble.
In reply to: 252497974 [](ancestors = 252497974)
@@ -15,7 +15,8 @@ namespace Microsoft.ML.Trainers.FastTree.Internal | |||
/// https://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf | |||
/// </summary> | |||
/// <remarks>Author was Yasser Ganjisaffar during his internship.</remarks> | |||
public class LassoBasedEnsembleCompressor : IEnsembleCompressor<short> | |||
[BestFriend] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestFriend] [](start = 4, length = 12)
While this probably should be internal, not best friend probably? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -16,7 +16,37 @@ | |||
|
|||
namespace Microsoft.ML.Trainers.FastTree.Internal | |||
{ | |||
public class TreeEnsemble | |||
public class TreeEnsembleModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TreeEnsembleModel [](start = 17, length = 17)
TreeEnsembleModel [](start = 17, length = 17)
Would DecisionTreeRegressorCollection
be a better name? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like TreeEnsembleModel
, it is just a TreeEnsemble
. Everywhere else we use the name "model" that carries with it certain expectations that don't quite fit what this is. This is just an ensemble of trees, so it might be nice if it were named that. The names of the classes themselves make sense to me, I just don't like the mutable behavior on them. So maybe just rename the internal classes to whatever you like (since they're internal, I care less about that), and give the public surface the sensible name. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will rename TreeEnsembleModel to TreeEnsemble and TreeRegressor to RegressionTree in iteration 21.
In reply to: 252498510 [](ancestors = 252498510)
b7563a2
to
403e019
Compare
@@ -129,7 +129,7 @@ protected override ObjectiveFunctionBase ConstructObjFunc(IChannel ch) | |||
return new ObjectiveImpl(TrainSet, Args); | |||
} | |||
|
|||
protected override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch) | |||
internal override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
private protected
is made for cases such as this. We want this to be internal and protected, and that is what that will do. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much Tom! It works perfectly as you suggested.
In reply to: 251651933 [](ancestors = 251651933)
@@ -124,7 +124,7 @@ protected override ObjectiveFunctionBase ConstructObjFunc(IChannel ch) | |||
return new ObjectiveImpl(TrainSet, Args); | |||
} | |||
|
|||
protected override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch) | |||
internal override OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
Another candidate for private protected
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All ConstructOptimizationAlgorithm
are private protected
now because we have
public abstract class FastTreeTrainerBase<TArgs, TTransformer, TModel>
....
private protected abstract OptimizationAlgorithm ConstructOptimizationAlgorithm(IChannel ch);
....
In reply to: 251652163 [](ancestors = 251652163)
@@ -494,7 +494,7 @@ private static FastTreeRegressionModelParameters Create(IHostEnvironment env, Mo | |||
public override PredictionKind PredictionKind => PredictionKind.Regression; | |||
} | |||
|
|||
public static partial class FastTree | |||
public static partial class FastTreeEntryPoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastTreeEntryPoint [](start = 32, length = 18)
FYI I have a change to make this internal, so perhaps this will not be necessary. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will handle some merge conflicts but let me keep this for now for building it on CI.
In reply to: 251652346 [](ancestors = 251652346)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. If this is to be a general rule, which it might be, we may want to make this general policy... there are lots of classes "like" this that exist merely as holders for static methods like Sdca
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to make it general. Having the same name of class and namespace doesn't look very good to me.
In reply to: 251949357 [](ancestors = 251949357)
src/Microsoft.ML.FastTree/Representation/TreeRegressorCollection.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.FastTree/Training/EnsembleCompression/LassoBasedEnsembleCompressor.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.FastTree/Training/OptimizationAlgorithms/OptimizationAlgorithm.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.FastTree/TreeEnsemble/QuantileRegressionTree.cs
Outdated
Show resolved
Hide resolved
1. Rename classes. 2. Use IReadOnlyList instead of ReadOnlySpan. 3. Remove new namespace, Representation.
Add a test Fix typo Some docs Seperate TreeEnsemble
{ | ||
_tree = tree; | ||
|
||
_lteChild = ImmutableArray.Create(_tree.LteChild, 0, _tree.NumNodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableArray [](start = 24, length = 14)
This can do for now, but FYI wrapping in an immutable array creates a copy. Not a deal breaker, just something you ought to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wschin ! I wonder if there's a way to validate scenarios against which people were using this structure to see if they're still well served. I guess we can tell from the screaming later. Anyway, thank you again!!
@TomFinley, Many thanks! I will refine the doc and make |
This PR proposes some changes to make RegressionTree and TreeEnsemble not mutable to users. Our strategy is
RegressionTree
andTreeEnsemble
over their internal relatives. Those wrapper classes are not mutable.InternalRegressionTree
andInternalTreeEnsemble
) which should not be public.FastForestClassificationModelParameters
,FastForestRegressionModelParameters
, etc.Float
tofloat
and removing unusedusing
statements.Hopefully this will fix #1960.