-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Modify API for advanced settings (LightGBM) #2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify API for advanced settings (LightGBM) #2261
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2261 +/- ##
===========================================
- Coverage 80.82% 69.82% -11%
===========================================
Files 159 786 +627
Lines 28540 144185 +115645
Branches 1909 16617 +14708
===========================================
+ Hits 23068 100684 +77616
- Misses 5175 38954 +33779
- Partials 297 4547 +4250
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few questions about API standardization and test coverage.
/// it is only a way for the caller to be informed about what was learnt.</param> | ||
/// <returns>The Score output column indicating the predicted value.</returns> | ||
public static Scalar<float> LightGbm(this RegressionCatalog.RegressionTrainers catalog, | ||
Scalar<float> label, Vector<float> features, Scalar<float> weights, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastTree
just has catalog
and options
for its input parameters when there is an explicit options
field. Is there a standard for what the "options
API" looks like? If not, let's make one. This needs to be consistent across the learners. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the static extensions. For static extensions FastTree
follows the same convention :
machinelearning/src/Microsoft.ML.StaticPipe/TreeTrainersStatic.cs
Lines 87 to 90 in 620ca89
public static Scalar<float> FastTree(this RegressionCatalog.RegressionTrainers catalog, | |
Scalar<float> label, Vector<float> features, Scalar<float> weights, | |
FastTreeRegressionTrainer.Options options, | |
Action<FastTreeRegressionModelParameters> onFit = null) |
For the dynamic API extensions, both FastTree
and LightGBM
API will also be consistent i.e. presence of catalog
and options
for its input parameters when there is an explicit options field
In reply to: 251533791 [](ancestors = 251533791)
/// <returns>The set of output columns including in order the predicted binary classification score (which will range | ||
/// from negative to positive infinity), the calibrated prediction (from 0 to 1), and the predicted label.</returns> | ||
public static (Scalar<float> score, Scalar<float> probability, Scalar<bool> predictedLabel) LightGbm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, | ||
Scalar<bool> label, Vector<float> features, Scalar<float> weights, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto "options
API" comment. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are static extensions (details in other comment)
In reply to: 251533977 [](ancestors = 251533977)
/// <returns>The set of output columns including in order the predicted binary classification score (which will range | ||
/// from negative to positive infinity), the calibrated prediction (from 0 to 1), and the predicted label.</returns> | ||
public static Scalar<float> LightGbm<TVal>(this RankingCatalog.RankingTrainers catalog, | ||
Scalar<float> label, Vector<float> features, Key<uint, TVal> groupId, Scalar<float> weights, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto "options
API" comment. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are static extensions (details in other comment)
In reply to: 251534206 [](ancestors = 251534206)
Scalar<float> weights, | ||
Options options, | ||
Action<OvaModelParameters> onFit = null) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto "options
API" comment. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are static extensions (details in other comment)
In reply to: 251534372 [](ancestors = 251534372)
Vector<float> features, | ||
Scalar<float> weights, | ||
Options options, | ||
Action<OvaModelParameters> onFit = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why OVA
? Is this just one-versus-all under the hood? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multiclass, LightGBM uses the OvaModelParameters
We can also see it in the class definition as well
public sealed class LightGbmMulticlassTrainer : LightGbmTrainerBase<VBuffer<float>, MulticlassPredictionTransformer<OvaModelParameters>, OvaModelParameters> | |
{ | |
internal const string Summary = "LightGBM Multi Class Classifier"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multiclass, LightGBM uses the OvaModelParameters
We can also see it in the class definition as well
public sealed class LightGbmMulticlassTrainer : LightGbmTrainerBase<VBuffer<float>, MulticlassPredictionTransformer<OvaModelParameters>, OvaModelParameters> | |
{ | |
internal const string Summary = "LightGBM Multi Class Classifier"; |
In reply to: 251534510 [](ancestors = 251534510)
|
||
/// <summary> | ||
/// Predict a target using a decision tree binary classification model trained with the <see cref="LightGbmRankingTrainer"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary classification => ranking #Resolved
|
||
/// <summary> | ||
/// Predict a target using a decision tree binary classification model trained with the <see cref="LightGbmRankingTrainer"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary classification => ranking #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
Contracts.CheckValue(catalog, nameof(catalog)); | ||
var env = CatalogUtils.GetEnvironment(catalog); | ||
return new LightGbmBinaryTrainer(env, options); | ||
} | ||
|
||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary classification => ranking #Resolved
@@ -781,7 +781,7 @@ public void TestMultiClassEnsembleCombiner() | |||
|
|||
var predictors = new PredictorModel[] | |||
{ | |||
LightGbm.TrainMultiClass(Env, new LightGbmArguments | |||
LightGbm.TrainMultiClass(Env, new Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have tests for the catalog entries? I don't see them here — they didn't show up as a change, so it looks like they aren't being tested. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some tests in FeatureContribution / Onnx which exercise the LightGBM extensions.
In any case, I have updated several tests in this file so they exercise the public API extensions
In reply to: 251537729 [](ancestors = 251537729)
new FastForestRegression.Options { | ||
var trainer = ML.Regression.Trainers.FastForest( | ||
new FastForestRegression.Options | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Usually I'd pack cleanup for a different section in a different commit / PR. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense.. i think i did a control + k + d on this file which did some of the auto cleanups
In reply to: 251538653 [](ancestors = 251538653)
/// if both are present and have different values. | ||
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param> | ||
public LightGbmBinaryTrainer(IHostEnvironment env, | ||
internal LightGbmBinaryTrainer(IHostEnvironment env, | ||
string labelColumn = DefaultColumnNames.Label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not possible to combine the two constructors?
Or have you added this work to the reconciliation issue #2100? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// if both are present and have different values. | ||
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param> | ||
public LightGbmMulticlassTrainer(IHostEnvironment env, | ||
internal LightGbmMulticlassTrainer(IHostEnvironment env, | ||
string labelColumn = DefaultColumnNames.Label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, regarding unifying the two constructors? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// if both are present and have different values. | ||
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param> | ||
public LightGbmRankingTrainer(IHostEnvironment env, | ||
internal LightGbmRankingTrainer(IHostEnvironment env, | ||
string labelColumn = DefaultColumnNames.Label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// if both are present and have different values. | ||
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param> | ||
public LightGbmRegressorTrainer(IHostEnvironment env, | ||
internal LightGbmRegressorTrainer(IHostEnvironment env, | ||
string labelColumn = DefaultColumnNames.Label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary> | ||
/// Predict a target using a decision tree binary classification model trained with the <see cref="LightGbmRankingTrainer"/>. | ||
/// </summary> | ||
/// <param name="catalog">The <see cref="RankingCatalog"/>.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RankingCatalog [](start = 49, length = 14)
Should be MultiClass catalog I think. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, otherwise looks good!
It would be great if you could fix it as part of the PR, as it's related. In reply to: 458257429 [](ancestors = 458257429) Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:135 in 33fe723. [](commit_id = 33fe723, deletion_comment = False) |
Towards #1798 .
This PR addresses the following algos
The following changes have been made:
public
extension methods, one for simple arguments and the other for advanced optionsinternal
. Also a few other fields have been madeinternal
Options
for API consistency.