-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Multiclass Classification Samples Update #3322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
40725ca
to
2bf1485
Compare
// Expected output: | ||
// Micro Accuracy: 0.91 | ||
// Macro Accuracy: 0.91 | ||
// Log Loss: 0.00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that there is no EvaluateNonCalibrated
method in the multiclass classification catalog.
The LogLoss
metric does not makes sense in this case.
I opened an issue #3323 to track this problem.
2bf1485
to
5975856
Compare
@@ -44,7 +51,12 @@ namespace Samples.Dynamic.Trainers.MulticlassClassification | |||
var options = new <#=TrainerOptions#>; | |||
|
|||
// Define the trainer. | |||
var pipeline = mlContext.MulticlassClassification.Trainers.<#=Trainer#>(options); | |||
var pipeline = | |||
// Convert the string labels into key types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a line not aligned. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done on purpose, so that we can comment before the two estimators added to the pipeline.
In reply to: 275097557 [](ancestors = 275097557)
Codecov Report
@@ Coverage Diff @@
## master #3322 +/- ##
=======================================
Coverage 72.69% 72.69%
=======================================
Files 807 807
Lines 145172 145172
Branches 16225 16225
=======================================
Hits 105537 105537
Misses 35220 35220
Partials 4415 4415
|
// Convert the string labels into key types. | ||
mlContext.Transforms.Conversion.MapValueToKey("Label") | ||
// Apply LightGbm multiclass trainer. | ||
.Append(mlContext.MulticlassClassification.Trainers.LightGbm()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LightGbm() [](start = 72, length = 10)
Better to specify column names using nameof. #Resolved
var model = pipeline.Fit(trainingData); | ||
|
||
// Create testing data. Use different random seed to make it different from training data. | ||
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a space here. #Resolved
// Define the trainer. | ||
var pipeline = | ||
// Convert the string labels into key types. | ||
mlContext.Transforms.Conversion.MapValueToKey("Label") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapValueToKey [](start = 52, length = 13)
I'm not sure if this is an issue, there may be other samples for this, but would it make sense to pass the IDataView keyData
argument to this method to show how the user can avoid a pass over the data to get the labels in case the set of labels is known? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to show that use of the method, and I hope that there is a sample for that under MapValueToKey
. However, I don't think it would be the right place here to include such a sample.
What I could do instead is load a keyType directly, instead of using MapValueToKey. Would that be better?
In reply to: 275474158 [](ancestors = 275474158)
5975856
to
ec560ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec560ed
to
7682d7f
Compare
7682d7f
to
41f062f
Compare
Tracked under #2522
This PR adds samples for
LbfgsMaximumEntropy
andSdcaNonCalibrated
trainers.This PR also removes dependency from Samples Utils in other multiclass classification samples and adds .tt files for all multiclass classification samples.
Notice that this PR does not take care of Naive Bayes as it is in progress in #3246.