Skip to content

Lockdown of Microsoft.ML.LightGBM public surface. #2476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 15, 2019

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Feb 8, 2019

This PR fixes #2271. Fixes #2534 fixes #2459

The changes included in this PR are as follows.

  1. Changed unnecessary public items to internal.
  2. Created samples in the samples project.
  3. Referenced the samples as an example in public API exposed through MLContext.
  4. Resolved a bug where one of the constructor of LightGbmMulticlassTrainer was using MakeBoolScalarLabel instead of MakeU4ScalarColumn.

@zeahmed
Copy link
Contributor Author

zeahmed commented Feb 8, 2019

@shmoradims, do you know which parameters you were talking about documenting in LightGBM? #Resolved

@zeahmed
Copy link
Contributor Author

zeahmed commented Feb 8, 2019

@sfilipi and @shmoradims, do we have standard explanation for the public Train methods in trainers? #Resolved

@zeahmed
Copy link
Contributor Author

zeahmed commented Feb 8, 2019

@TomFinley, these are the items I see exposed now after my changes.

image
#Resolved

@codecov
Copy link

codecov bot commented Feb 8, 2019

Codecov Report

Merging #2476 into master will decrease coverage by <.01%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master    #2476      +/-   ##
==========================================
- Coverage   71.43%   71.42%   -0.01%     
==========================================
  Files         801      801              
  Lines      141784   141793       +9     
  Branches    16135    16135              
==========================================
+ Hits       101278   101281       +3     
- Misses      36039    36045       +6     
  Partials     4467     4467
Flag Coverage Δ
#Debug 71.42% <50%> (-0.01%) ⬇️
#production 67.72% <50%> (-0.01%) ⬇️
#test 85.51% <ø> (ø) ⬆️

@@ -0,0 +1,103 @@
using System;
using Microsoft.ML.Data;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft [](start = 6, length = 9)

We already have Transforms folder, I'm creating Trainers folder with Recommendation subfolder in #2451
maybe it's worth to put them in Trainers/BinaryClassification, Trainers/MultiClassification, Trainers/Regression?

Overwise it start look like trash pile :) #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the Trainers folder.


In reply to: 255223584 [](ancestors = 255223584)


[TlcModule.ComponentKind("BoosterParameterFunction")]
public interface ISupportBoosterParameterFactory : IComponentFactory<IBoosterParameter>
{
}

public interface IBoosterParameter
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 4, length = 6)

hide this one as well. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do things like this right now:
new Options.DartBooster.Arguments { }.CreateComponent(ML).UpdateParameters(new Dictionary<string,object>);
does they make any sense for you?


In reply to: 255223941 [](ancestors = 255223941)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is ISupportBoosterParameterFactory is public. It is also used as a parameter of Options class which is public too. This does not allow us to make it internal.


In reply to: 255258378 [](ancestors = 255258378,255223941)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mark public ISupportBoosterParameterFactory Booster = new TreeBooster.Arguments(); as internal and put in arguments attribute CmdLine only.
You can also create something like public BoosterParameter BoosterParameters = TreeBooster.Arguments() and mark it EntryPoint only
At least we can try to hide it.


In reply to: 255667107 [](ancestors = 255667107,255258378,255223941)

Copy link
Contributor Author

@zeahmed zeahmed Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateComponent(ML) and UpdateParameters are hidden now.

I am now using

public ISupportBoosterParameterFactory Booster = new TreeBooster.Arguments(); 

in the samples. So I think ISupportBoosterParameterFactory cannot be hidden.


In reply to: 256153739 [](ancestors = 256153739,255667107,255258378,255223941)

Copy link
Contributor

@artidoro artidoro Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Ivan I think you should be able to do as he suggested: make public ISupportBoosterParameterFactory Booster internal, and add a new public field that is of type BoosterParameter, or even TreeBooster.Arguments.


In reply to: 256250752 [](ancestors = 256250752,256153739,255667107,255258378,255223941)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

        internal const int NumBoostRound = 100;

can be public, class is internal anyway. #Closed


Refers to: src/Microsoft.ML.LightGBM/LightGbmArguments.cs:97 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

        public virtual void UpdateParameters(Dictionary<string, object> res)

internal #Closed


Refers to: src/Microsoft.ML.LightGBM/LightGbmArguments.cs:58 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

        public const string FriendlyName = "Tree Booster";

internal #Closed


Refers to: src/Microsoft.ML.LightGBM/LightGbmArguments.cs:103 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

            public virtual IBoosterParameter CreateComponent(IHostEnvironment env) => new TreeBooster(this);

hide it #Closed


Refers to: src/Microsoft.ML.LightGBM/LightGbmArguments.cs:168 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

        public TreeBooster(Arguments args)

not sure it should be public #Closed


Refers to: src/Microsoft.ML.LightGBM/LightGbmArguments.cs:171 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

        public const string FriendlyName = "Tree Dropout Tree Booster";

internal #Closed


Refers to: src/Microsoft.ML.LightGBM/LightGbmArguments.cs:191 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

public delegate void ReduceScatterFunction([MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)]byte[] input, int inputSize, int typeSize,

any reason why it should be public? #Closed


Refers to: src/Microsoft.ML.LightGBM/Parallel/IParallel.cs:25 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

public delegate void AllgatherFunction([MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)]byte[] input, int inputSize,

any reason why it should be public? #Closed


Refers to: src/Microsoft.ML.LightGBM/Parallel/IParallel.cs:33 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

public interface IParallel

Definitely should be internal. #Closed


Refers to: src/Microsoft.ML.LightGBM/Parallel/IParallel.cs:37 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

public interface ISupportParallel : IComponentFactory<IParallel>

internal #Closed


Refers to: src/Microsoft.ML.LightGBM/Parallel/IParallel.cs:71 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

var pipeline = mlContext.Transforms.Conversion.MapValueToKey("LabelIndex", "Label")
.Append(mlContext.MulticlassClassification.Trainers.LightGbm(labelColumn: "LabelIndex"))
.Append(mlContext.Transforms.Conversion.MapValueToKey("PredictedLabelIndex", "PredictedLabel"))
.Append(mlContext.Transforms.CopyColumns("Scores", "Score"));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Append(mlContext.Transforms.CopyColumns("Scores", "Score") [](start = 24, length = 59)

why this one is needed? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the example from


I see there are two scores in dataview after transformation 1) Scores 2) Score. If there is no copy from Score to Scores, all the probabilities are zero. Any reason why?


In reply to: 255259910 [](ancestors = 255259910)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sound like a bug.


In reply to: 255679272 [](ancestors = 255679272,255259910)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2542 to track it


In reply to: 256220736 [](ancestors = 256220736,255679272,255259910)

"hours-per-week",
"native-country"))
.Append(mlContext.Transforms.Normalize("Features"))
.Append(mlContext.BinaryClassification.Trainers.LightGbm());
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LightGbm [](start = 60, length = 8)

it's line 83 and i can finally see example of how to use LightGBM! #Closed

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

        Options options)

I think this method deserves sample more than simple method.
Especially how to set specific booster.
Right now I'm creating Options class, type Booster inside it, and I have no idea what to do next.
Booster = new Options.GossBooster.Arguments { }
it should be something like this but how I have no idea how user would find out what he need to type Options first. And there is zero help from Intellisense. #Closed


Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:99 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False)

{
public class LightGbmBinaryClassification
{
public static void LightGbmBinaryClassificationExample()
Copy link
Member

@sfilipi sfilipi Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LightGbmBinaryClassificationExample [](start = 27, length = 35)

think Shahab started calling the methods "Example" #Resolved

// Creating the ML.Net IHostEnvironment object, needed for the pipeline
var mlContext = new MLContext();

var reader = mlContext.Data.ReadFromTextFile(dataFilePath, new TextLoader.Arguments
Copy link
Member

@sfilipi sfilipi Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader = mlContext.Data.ReadFromTextFile(dataFi [](start = 16, length = 47)

look at Shahab's PR on AP. He's creating a SamplesUtils for this one. #Resolved

Console.WriteLine($"Negative Precision: {metrics.NegativePrecision}"); // 0.88
Console.WriteLine($"Negative Recall: {metrics.NegativeRecall}"); // 0.91
Console.WriteLine($"Positive Precision: {metrics.PositivePrecision}"); // 0.67
Console.WriteLine($"Positive Recall: {metrics.PositiveRecall}"); // 0.58
Copy link
Member

@sfilipi sfilipi Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see : #2483 #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, I am going to change it the way #2483 is using it.

However, I think that this sort of code reusability does not make these sample standalone which I think is more important for documentation samples. What do you think?


In reply to: 255272930 [](ancestors = 255272930)

var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);

// Convert native C# class to IDataView, a consumble format to ML.NET functions.
var dataView = mlContext.Data.ReadFromEnumerable(examples);
Copy link
Member

@sfilipi sfilipi Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataView [](start = 16, length = 8)

preview #Resolved

var model = pipeline.Fit(trainingData);

// Do prediction on the test set.
var dataWithPredictions = model.Transform(testingData);
Copy link
Member

@sfilipi sfilipi Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataWithPredictions [](start = 16, length = 19)

preview #Resolved

var metrics = mlContext.MulticlassClassification.Evaluate(dataWithPredictions, label: "LabelIndex");

// Check if metrics are resonable.
Console.WriteLine("Macro accuracy: {0}, Micro accuracy: {1}.", 0.863482146891263, 0.86309523809523814);
Copy link
Member

@sfilipi sfilipi Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

863482146891263 [](start = 77, length = 15)

metrics[0] ? #Resolved

@artidoro
Copy link
Contributor

artidoro commented Feb 13, 2019

        string labelColumn = DefaultColumnNames.Label,

Are we getting a sample for ranking too? #Resolved


Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:133 in 49509c1. [](commit_id = 49509c1, deletion_comment = False)

public class LightGbmBinaryClassification
{
/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a>
Copy link
Contributor

@artidoro artidoro Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require [](start = 25, length = 7)

Could you replace "requires" instead of "require". Here and in the other samples. #Resolved

@zeahmed
Copy link
Contributor Author

zeahmed commented Feb 13, 2019

        string labelColumn = DefaultColumnNames.Label,

As this PR is already getting very long, I have created an issue for this to address it in my next PR.
#2530


In reply to: 463116505 [](ancestors = 463116505)


Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:133 in 49509c1. [](commit_id = 49509c1, deletion_comment = False)

{
public class LightGbmBinaryClassification
{
/// <summary>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @sfilipi ask me to use regular comments instead of summary.

no need for xml style comments #Resolved

@@ -86,6 +86,38 @@ public static string DownloadSentimentDataset()
public static string DownloadAdultDataset()
=> Download("https://raw.githubusercontent.com/dotnet/machinelearning/244a8c2ac832657af282aa312d568211698790aa/test/data/adult.train", "adult.txt");

public static IDataView LoadAdultDataset(MLContext mlContext)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadAdultDataset [](start = 32, length = 16)

I prefere Shahab approach https://github.com/dotnet/machinelearning/pull/2517/files#diff-eb95ea0c54ebcf8d695d8d73d5849b0c #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually took it from his PR...:)
He then changed. I am updating.


In reply to: 256547096 [](ancestors = 256547096)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

}

public sealed class TreeBooster : BoosterParameter<TreeBooster.Arguments>
{
public const string Name = "gbdt";
public const string FriendlyName = "Tree Booster";
internal const string FriendlyName = "Tree Booster";

[TlcModule.Component(Name = Name, FriendlyName = FriendlyName, Desc = "Traditional Gradient Boosting Decision Tree.")]
public class Arguments : ISupportBoosterParameterFactory
Copy link
Contributor

@artidoro artidoro Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments [](start = 25, length = 9)

Can you rename this to Options? you would also need to rename the local variable of type Arguments, they are usually called args and we usually rename to options. #Resolved

Copy link
Contributor Author

@zeahmed zeahmed Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a class named Options at line 43 above. I am not sure how its going to impact the cmdline.


In reply to: 256654714 [](ancestors = 256654714)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name it with something that's related to Options? Before the general renaming of Arguments to Options both this and the above at line 43 were named Arguments, and it wasn't an issue. But I understand this is a more specific Options class, that's why it could make sense to add a prefix to the name.


In reply to: 256671757 [](ancestors = 256671757,256654714)

@@ -187,7 +191,7 @@ public override void UpdateParameters(Dictionary<string, object> res)
public class DartBooster : BoosterParameter<DartBooster.Arguments>
{
public const string Name = "dart";
public const string FriendlyName = "Tree Dropout Tree Booster";
internal const string FriendlyName = "Tree Dropout Tree Booster";
Copy link
Contributor

@artidoro artidoro Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not making name internal as well? #Resolved

@@ -231,7 +235,7 @@ public override void UpdateParameters(Dictionary<string, object> res)
public class GossBooster : BoosterParameter<GossBooster.Arguments>
{
public const string Name = "goss";
public const string FriendlyName = "Gradient-based One-Size Sampling";
internal const string FriendlyName = "Gradient-based One-Size Sampling";
Copy link
Contributor

@artidoro artidoro Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too? #Resolved

@@ -24,12 +24,13 @@

namespace Microsoft.ML.LightGBM
{
public delegate void SignatureLightGBMBooster();
internal delegate void SignatureLightGBMBooster();

[TlcModule.ComponentKind("BoosterParameterFunction")]
public interface ISupportBoosterParameterFactory : IComponentFactory<IBoosterParameter>
Copy link
Contributor

@artidoro artidoro Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 4, length = 6)

Can you make this interface internal? #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making it internal but it requires more changes than I thought. So, I am not doing it in this PR. I opened an issue #2559


In reply to: 256657661 [](ancestors = 256657661)

@@ -92,14 +95,13 @@ private static string GetArgName(string name)
[BestFriend]
internal static class Defaults
{
[BestFriend]
internal const int NumBoostRound = 100;
public const int NumBoostRound = 100;
}

public sealed class TreeBooster : BoosterParameter<TreeBooster.Arguments>
{
public const string Name = "gbdt";
Copy link
Contributor

@artidoro artidoro Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 12, length = 6)

Why does the name need to be public? #Resolved

@@ -187,7 +191,7 @@ public override void UpdateParameters(Dictionary<string, object> res)
public class DartBooster : BoosterParameter<DartBooster.Arguments>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DartBooster [](start = 21, length = 11)

would be nice to make all this boosters sealed classes. #Resolved

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeahmed zeahmed merged commit 856c7e8 into dotnet:master Feb 15, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants