Skip to content

Modify API for advanced settings (LightGBM) #2261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 28, 2019

Conversation

abgoswam
Copy link
Member

Towards #1798 .

This PR addresses the following algos

  • LightGbmRegressorTrainer
  • LightGbmBinaryTrainer
  • LightGbmRankingTrainer
  • LightGbmMulticlassTrainer

The following changes have been made:

  1. Two public extension methods, one for simple arguments and the other for advanced options
  2. Make constructors internal . Also a few other fields have been made internal
  3. Pass Options objects as arguments instead of Action delegate. Also, added 3 fields to Options for API consistency.
  4. Rename Arguments to Options
  5. Rename Options objects as options (instead of args or advancedSettings used so far)

@codecov
Copy link

codecov bot commented Jan 28, 2019

Codecov Report

Merging #2261 into master will decrease coverage by 10.99%.
The diff coverage is 62.82%.

@@             Coverage Diff             @@
##           master    #2261       +/-   ##
===========================================
- Coverage   80.82%   69.82%      -11%     
===========================================
  Files         159      786      +627     
  Lines       28540   144185   +115645     
  Branches     1909    16617    +14708     
===========================================
+ Hits        23068   100684    +77616     
- Misses       5175    38954    +33779     
- Partials      297     4547     +4250
Flag Coverage Δ
#Debug 69.82% <62.82%> (-11%) ⬇️
#production 66.08% <52.84%> (?)
#test 85% <100%> (+4.17%) ⬆️

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few questions about API standardization and test coverage.

/// it is only a way for the caller to be informed about what was learnt.</param>
/// <returns>The Score output column indicating the predicted value.</returns>
public static Scalar<float> LightGbm(this RegressionCatalog.RegressionTrainers catalog,
Scalar<float> label, Vector<float> features, Scalar<float> weights,
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastTree just has catalog and options for its input parameters when there is an explicit options field. Is there a standard for what the "options API" looks like? If not, let's make one. This needs to be consistent across the learners. #Resolved

Copy link
Member Author

@abgoswam abgoswam Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the static extensions. For static extensions FastTree follows the same convention :

public static Scalar<float> FastTree(this RegressionCatalog.RegressionTrainers catalog,
Scalar<float> label, Vector<float> features, Scalar<float> weights,
FastTreeRegressionTrainer.Options options,
Action<FastTreeRegressionModelParameters> onFit = null)

For the dynamic API extensions, both FastTree and LightGBM API will also be consistent i.e. presence of catalog and options for its input parameters when there is an explicit options field


In reply to: 251533791 [](ancestors = 251533791)

/// <returns>The set of output columns including in order the predicted binary classification score (which will range
/// from negative to positive infinity), the calibrated prediction (from 0 to 1), and the predicted label.</returns>
public static (Scalar<float> score, Scalar<float> probability, Scalar<bool> predictedLabel) LightGbm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,
Scalar<bool> label, Vector<float> features, Scalar<float> weights,
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto "options API" comment. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are static extensions (details in other comment)


In reply to: 251533977 [](ancestors = 251533977)

/// <returns>The set of output columns including in order the predicted binary classification score (which will range
/// from negative to positive infinity), the calibrated prediction (from 0 to 1), and the predicted label.</returns>
public static Scalar<float> LightGbm<TVal>(this RankingCatalog.RankingTrainers catalog,
Scalar<float> label, Vector<float> features, Key<uint, TVal> groupId, Scalar<float> weights,
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto "options API" comment. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are static extensions (details in other comment)


In reply to: 251534206 [](ancestors = 251534206)

Scalar<float> weights,
Options options,
Action<OvaModelParameters> onFit = null)
{
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto "options API" comment. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are static extensions (details in other comment)


In reply to: 251534372 [](ancestors = 251534372)

Vector<float> features,
Scalar<float> weights,
Options options,
Action<OvaModelParameters> onFit = null)
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why OVA? Is this just one-versus-all under the hood? #Resolved

Copy link
Member Author

@abgoswam abgoswam Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiclass, LightGBM uses the OvaModelParameters

We can also see it in the class definition as well

public sealed class LightGbmMulticlassTrainer : LightGbmTrainerBase<VBuffer<float>, MulticlassPredictionTransformer<OvaModelParameters>, OvaModelParameters>
{
internal const string Summary = "LightGBM Multi Class Classifier";
#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiclass, LightGBM uses the OvaModelParameters

We can also see it in the class definition as well

public sealed class LightGbmMulticlassTrainer : LightGbmTrainerBase<VBuffer<float>, MulticlassPredictionTransformer<OvaModelParameters>, OvaModelParameters>
{
internal const string Summary = "LightGBM Multi Class Classifier";


In reply to: 251534510 [](ancestors = 251534510)


/// <summary>
/// Predict a target using a decision tree binary classification model trained with the <see cref="LightGbmRankingTrainer"/>.
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary classification => ranking #Resolved


/// <summary>
/// Predict a target using a decision tree binary classification model trained with the <see cref="LightGbmRankingTrainer"/>.
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary classification => ranking #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(MultiClass)


In reply to: 251536106 [](ancestors = 251536106)

{
Contracts.CheckValue(catalog, nameof(catalog));
var env = CatalogUtils.GetEnvironment(catalog);
return new LightGbmBinaryTrainer(env, options);
}

/// <summary>
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary classification => ranking #Resolved

@@ -781,7 +781,7 @@ public void TestMultiClassEnsembleCombiner()

var predictors = new PredictorModel[]
{
LightGbm.TrainMultiClass(Env, new LightGbmArguments
LightGbm.TrainMultiClass(Env, new Options
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have tests for the catalog entries? I don't see them here — they didn't show up as a change, so it looks like they aren't being tested. #Resolved

Copy link
Member Author

@abgoswam abgoswam Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some tests in FeatureContribution / Onnx which exercise the LightGBM extensions.

In any case, I have updated several tests in this file so they exercise the public API extensions


In reply to: 251537729 [](ancestors = 251537729)

new FastForestRegression.Options {
var trainer = ML.Regression.Trainers.FastForest(
new FastForestRegression.Options
{
Copy link
Contributor

@rogancarr rogancarr Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Usually I'd pack cleanup for a different section in a different commit / PR. #Resolved

Copy link
Member Author

@abgoswam abgoswam Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.. i think i did a control + k + d on this file which did some of the auto cleanups


In reply to: 251538653 [](ancestors = 251538653)

/// if both are present and have different values.
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param>
public LightGbmBinaryTrainer(IHostEnvironment env,
internal LightGbmBinaryTrainer(IHostEnvironment env,
string labelColumn = DefaultColumnNames.Label,
Copy link
Contributor

@artidoro artidoro Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to combine the two constructors?
Or have you added this work to the reconciliation issue #2100? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap. exactly. added to #2100

want to focus on cleanup of public surface first


In reply to: 251541952 [](ancestors = 251541952)

/// if both are present and have different values.
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param>
public LightGbmMulticlassTrainer(IHostEnvironment env,
internal LightGbmMulticlassTrainer(IHostEnvironment env,
string labelColumn = DefaultColumnNames.Label,
Copy link
Contributor

@artidoro artidoro Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, regarding unifying the two constructors? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2100


In reply to: 251542646 [](ancestors = 251542646)

/// if both are present and have different values.
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param>
public LightGbmRankingTrainer(IHostEnvironment env,
internal LightGbmRankingTrainer(IHostEnvironment env,
string labelColumn = DefaultColumnNames.Label,
Copy link
Contributor

@artidoro artidoro Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2100


In reply to: 251542949 [](ancestors = 251542949)

/// if both are present and have different values.
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param>
public LightGbmRegressorTrainer(IHostEnvironment env,
internal LightGbmRegressorTrainer(IHostEnvironment env,
string labelColumn = DefaultColumnNames.Label,
Copy link
Contributor

@artidoro artidoro Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2100


In reply to: 251545552 [](ancestors = 251545552)

/// <summary>
/// Predict a target using a decision tree binary classification model trained with the <see cref="LightGbmRankingTrainer"/>.
/// </summary>
/// <param name="catalog">The <see cref="RankingCatalog"/>.</param>
Copy link
Contributor

@artidoro artidoro Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RankingCatalog [](start = 49, length = 14)

Should be MultiClass catalog I think. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 251548528 [](ancestors = 251548528)

@artidoro
Copy link
Contributor

artidoro commented Jan 28, 2019

    /// <param name="catalog">The <see cref="RankingCatalog"/>.</param>

I know you haven't written this section, but it should also say MultiClass instead of ranking. #Resolved


Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:135 in 33fe723. [](commit_id = 33fe723, deletion_comment = False)

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, otherwise looks good!

@artidoro
Copy link
Contributor

    /// <param name="catalog">The <see cref="RankingCatalog"/>.</param>

It would be great if you could fix it as part of the PR, as it's related.


In reply to: 458257429 [](ancestors = 458257429)


Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:135 in 33fe723. [](commit_id = 33fe723, deletion_comment = False)

@abgoswam
Copy link
Member Author

    /// <param name="catalog">The <see cref="RankingCatalog"/>.</param>

fixed


In reply to: 458258712 [](ancestors = 458258712,458257429)


Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:135 in 33fe723. [](commit_id = 33fe723, deletion_comment = False)

@abgoswam abgoswam merged commit 2180cfb into dotnet:master Jan 28, 2019
@abgoswam abgoswam deleted the abgoswam/action_arguments_lightgbm branch February 20, 2019 16:59
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants