-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Lockdown of Microsoft.ML.LightGBM public surface. #2476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@shmoradims, do you know which parameters you were talking about documenting in LightGBM? #Resolved |
@sfilipi and @shmoradims, do we have standard explanation for the public |
@TomFinley, these are the items I see exposed now after my changes. |
Codecov Report
@@ Coverage Diff @@
## master #2476 +/- ##
==========================================
- Coverage 71.43% 71.42% -0.01%
==========================================
Files 801 801
Lines 141784 141793 +9
Branches 16135 16135
==========================================
+ Hits 101278 101281 +3
- Misses 36039 36045 +6
Partials 4467 4467
|
@@ -0,0 +1,103 @@ | |||
using System; | |||
using Microsoft.ML.Data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microsoft [](start = 6, length = 9)
We already have Transforms folder, I'm creating Trainers folder with Recommendation subfolder in #2451
maybe it's worth to put them in Trainers/BinaryClassification, Trainers/MultiClassification, Trainers/Regression?
Overwise it start look like trash pile :) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
[TlcModule.ComponentKind("BoosterParameterFunction")] | ||
public interface ISupportBoosterParameterFactory : IComponentFactory<IBoosterParameter> | ||
{ | ||
} | ||
|
||
public interface IBoosterParameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 4, length = 6)
hide this one as well. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do things like this right now:
new Options.DartBooster.Arguments { }.CreateComponent(ML).UpdateParameters(new Dictionary<string,object>);
does they make any sense for you?
In reply to: 255223941 [](ancestors = 255223941)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is ISupportBoosterParameterFactory
is public. It is also used as a parameter of Options
class which is public too. This does not allow us to make it internal.
In reply to: 255258378 [](ancestors = 255258378,255223941)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can mark public ISupportBoosterParameterFactory Booster = new TreeBooster.Arguments();
as internal and put in arguments attribute CmdLine only.
You can also create something like public BoosterParameter BoosterParameters = TreeBooster.Arguments()
and mark it EntryPoint only
At least we can try to hide it.
In reply to: 255667107 [](ancestors = 255667107,255258378,255223941)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateComponent(ML)
and UpdateParameters
are hidden now.
I am now using
public ISupportBoosterParameterFactory Booster = new TreeBooster.Arguments();
in the samples. So I think ISupportBoosterParameterFactory cannot be hidden.
In reply to: 256153739 [](ancestors = 256153739,255667107,255258378,255223941)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Ivan I think you should be able to do as he suggested: make public ISupportBoosterParameterFactory Booster
internal, and add a new public field that is of type BoosterParameter
, or even TreeBooster.Arguments
.
In reply to: 256250752 [](ancestors = 256250752,256153739,255667107,255258378,255223941)
any reason why it should be public? #Closed Refers to: src/Microsoft.ML.LightGBM/Parallel/IParallel.cs:25 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False) |
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("LabelIndex", "Label") | ||
.Append(mlContext.MulticlassClassification.Trainers.LightGbm(labelColumn: "LabelIndex")) | ||
.Append(mlContext.Transforms.Conversion.MapValueToKey("PredictedLabelIndex", "PredictedLabel")) | ||
.Append(mlContext.Transforms.CopyColumns("Scores", "Score")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Append(mlContext.Transforms.CopyColumns("Scores", "Score") [](start = 24, length = 59)
why this one is needed? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the example from
machinelearning/docs/samples/Microsoft.ML.Samples/Static/LightGBMMulticlassWithInMemoryData.cs
Line 11 in b863ac2
class LightGBMMulticlassWithInMemoryData |
I see there are two scores in dataview after transformation 1) Scores 2) Score. If there is no copy from
Score
to Scores
, all the probabilities are zero. Any reason why?
In reply to: 255259910 [](ancestors = 255259910)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"hours-per-week", | ||
"native-country")) | ||
.Append(mlContext.Transforms.Normalize("Features")) | ||
.Append(mlContext.BinaryClassification.Trainers.LightGbm()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LightGbm [](start = 60, length = 8)
it's line 83 and i can finally see example of how to use LightGBM! #Closed
I think this method deserves sample more than simple method. Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:99 in 59d1a08. [](commit_id = 59d1a08, deletion_comment = False) |
{ | ||
public class LightGbmBinaryClassification | ||
{ | ||
public static void LightGbmBinaryClassificationExample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LightGbmBinaryClassificationExample [](start = 27, length = 35)
think Shahab started calling the methods "Example" #Resolved
// Creating the ML.Net IHostEnvironment object, needed for the pipeline | ||
var mlContext = new MLContext(); | ||
|
||
var reader = mlContext.Data.ReadFromTextFile(dataFilePath, new TextLoader.Arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reader = mlContext.Data.ReadFromTextFile(dataFi [](start = 16, length = 47)
look at Shahab's PR on AP. He's creating a SamplesUtils for this one. #Resolved
Console.WriteLine($"Negative Precision: {metrics.NegativePrecision}"); // 0.88 | ||
Console.WriteLine($"Negative Recall: {metrics.NegativeRecall}"); // 0.91 | ||
Console.WriteLine($"Positive Precision: {metrics.PositivePrecision}"); // 0.67 | ||
Console.WriteLine($"Positive Recall: {metrics.PositiveRecall}"); // 0.58 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see : #2483 #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000); | ||
|
||
// Convert native C# class to IDataView, a consumble format to ML.NET functions. | ||
var dataView = mlContext.Data.ReadFromEnumerable(examples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataView [](start = 16, length = 8)
preview #Resolved
var model = pipeline.Fit(trainingData); | ||
|
||
// Do prediction on the test set. | ||
var dataWithPredictions = model.Transform(testingData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataWithPredictions [](start = 16, length = 19)
preview #Resolved
var metrics = mlContext.MulticlassClassification.Evaluate(dataWithPredictions, label: "LabelIndex"); | ||
|
||
// Check if metrics are resonable. | ||
Console.WriteLine("Macro accuracy: {0}, Micro accuracy: {1}.", 0.863482146891263, 0.86309523809523814); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
863482146891263 [](start = 77, length = 15)
metrics[0] ? #Resolved
public class LightGbmBinaryClassification | ||
{ | ||
/// <summary> | ||
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require [](start = 25, length = 7)
Could you replace "requires" instead of "require". Here and in the other samples. #Resolved
As this PR is already getting very long, I have created an issue for this to address it in my next PR. In reply to: 463116505 [](ancestors = 463116505) Refers to: src/Microsoft.ML.LightGBM/LightGbmCatalog.cs:133 in 49509c1. [](commit_id = 49509c1, deletion_comment = False) |
{ | ||
public class LightGbmBinaryClassification | ||
{ | ||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @sfilipi ask me to use regular comments instead of summary.
no need for xml style comments #Resolved
@@ -86,6 +86,38 @@ public static string DownloadSentimentDataset() | |||
public static string DownloadAdultDataset() | |||
=> Download("https://raw.githubusercontent.com/dotnet/machinelearning/244a8c2ac832657af282aa312d568211698790aa/test/data/adult.train", "adult.txt"); | |||
|
|||
public static IDataView LoadAdultDataset(MLContext mlContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoadAdultDataset [](start = 32, length = 16)
I prefere Shahab approach https://github.com/dotnet/machinelearning/pull/2517/files#diff-eb95ea0c54ebcf8d695d8d73d5849b0c #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually took it from his PR...:)
He then changed. I am updating.
In reply to: 256547096 [](ancestors = 256547096)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
public sealed class TreeBooster : BoosterParameter<TreeBooster.Arguments> | ||
{ | ||
public const string Name = "gbdt"; | ||
public const string FriendlyName = "Tree Booster"; | ||
internal const string FriendlyName = "Tree Booster"; | ||
|
||
[TlcModule.Component(Name = Name, FriendlyName = FriendlyName, Desc = "Traditional Gradient Boosting Decision Tree.")] | ||
public class Arguments : ISupportBoosterParameterFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments [](start = 25, length = 9)
Can you rename this to Options
? you would also need to rename the local variable of type Arguments, they are usually called args
and we usually rename to options
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a class named Options at line 43 above. I am not sure how its going to impact the cmdline.
In reply to: 256654714 [](ancestors = 256654714)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you name it with something that's related to Options
? Before the general renaming of Arguments
to Options
both this and the above at line 43 were named Arguments
, and it wasn't an issue. But I understand this is a more specific Options
class, that's why it could make sense to add a prefix to the name.
In reply to: 256671757 [](ancestors = 256671757,256654714)
@@ -187,7 +191,7 @@ public override void UpdateParameters(Dictionary<string, object> res) | |||
public class DartBooster : BoosterParameter<DartBooster.Arguments> | |||
{ | |||
public const string Name = "dart"; | |||
public const string FriendlyName = "Tree Dropout Tree Booster"; | |||
internal const string FriendlyName = "Tree Dropout Tree Booster"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not making name internal as well? #Resolved
@@ -231,7 +235,7 @@ public override void UpdateParameters(Dictionary<string, object> res) | |||
public class GossBooster : BoosterParameter<GossBooster.Arguments> | |||
{ | |||
public const string Name = "goss"; | |||
public const string FriendlyName = "Gradient-based One-Size Sampling"; | |||
internal const string FriendlyName = "Gradient-based One-Size Sampling"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too? #Resolved
@@ -24,12 +24,13 @@ | |||
|
|||
namespace Microsoft.ML.LightGBM | |||
{ | |||
public delegate void SignatureLightGBMBooster(); | |||
internal delegate void SignatureLightGBMBooster(); | |||
|
|||
[TlcModule.ComponentKind("BoosterParameterFunction")] | |||
public interface ISupportBoosterParameterFactory : IComponentFactory<IBoosterParameter> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 4, length = 6)
Can you make this interface internal? #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -92,14 +95,13 @@ private static string GetArgName(string name) | |||
[BestFriend] | |||
internal static class Defaults | |||
{ | |||
[BestFriend] | |||
internal const int NumBoostRound = 100; | |||
public const int NumBoostRound = 100; | |||
} | |||
|
|||
public sealed class TreeBooster : BoosterParameter<TreeBooster.Arguments> | |||
{ | |||
public const string Name = "gbdt"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 12, length = 6)
Why does the name need to be public? #Resolved
@@ -187,7 +191,7 @@ public override void UpdateParameters(Dictionary<string, object> res) | |||
public class DartBooster : BoosterParameter<DartBooster.Arguments> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DartBooster [](start = 21, length = 11)
would be nice to make all this boosters sealed classes. #Resolved
…earning into LightGBM_refact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR fixes #2271. Fixes #2534 fixes #2459
The changes included in this PR are as follows.
public
items tointernal
.LightGbmMulticlassTrainer
was usingMakeBoolScalarLabel
instead ofMakeU4ScalarColumn
.