Skip to content

Multiclass Classification Samples Update #3322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 16, 2019

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Apr 12, 2019

Tracked under #2522

This PR adds samples for LbfgsMaximumEntropy and SdcaNonCalibrated trainers.

This PR also removes dependency from Samples Utils in other multiclass classification samples and adds .tt files for all multiclass classification samples.

Notice that this PR does not take care of Naive Bayes as it is in progress in #3246.

@artidoro artidoro added the documentation Related to documentation of ML.NET label Apr 12, 2019
@artidoro artidoro self-assigned this Apr 12, 2019
// Expected output:
// Micro Accuracy: 0.91
// Macro Accuracy: 0.91
// Log Loss: 0.00
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that there is no EvaluateNonCalibrated method in the multiclass classification catalog.
The LogLoss metric does not makes sense in this case.

I opened an issue #3323 to track this problem.

@@ -44,7 +51,12 @@ namespace Samples.Dynamic.Trainers.MulticlassClassification
var options = new <#=TrainerOptions#>;

// Define the trainer.
var pipeline = mlContext.MulticlassClassification.Trainers.<#=Trainer#>(options);
var pipeline =
// Convert the string labels into key types.
Copy link
Member

@wschin wschin Apr 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a line not aligned. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done on purpose, so that we can comment before the two estimators added to the pipeline.


In reply to: 275097557 [](ancestors = 275097557)

@codecov
Copy link

codecov bot commented Apr 13, 2019

Codecov Report

Merging #3322 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3322   +/-   ##
=======================================
  Coverage   72.69%   72.69%           
=======================================
  Files         807      807           
  Lines      145172   145172           
  Branches    16225    16225           
=======================================
  Hits       105537   105537           
  Misses      35220    35220           
  Partials     4415     4415
Flag Coverage Δ
#Debug 72.69% <ø> (ø) ⬆️
#production 68.23% <ø> (ø) ⬆️
#test 88.97% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs 92.34% <ø> (ø) ⬆️

@wschin
Copy link
Member

wschin commented Apr 13, 2019

                mlContext.Transforms.Conversion.MapValueToKey("Label")

nameof(DataPoint.Label) #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/MulticlassClassification.ttinclude:39 in 5975856. [](commit_id = 5975856, deletion_comment = False)

// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
// Apply LightGbm multiclass trainer.
.Append(mlContext.MulticlassClassification.Trainers.LightGbm());
Copy link
Member

@wschin wschin Apr 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LightGbm() [](start = 72, length = 10)

Better to specify column names using nameof. #Resolved

var model = pipeline.Fit(trainingData);

// Create testing data. Use different random seed to make it different from training data.
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123));
Copy link

@yaeldekel yaeldekel Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a space here. #Resolved

// Define the trainer.
var pipeline =
// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
Copy link

@yaeldekel yaeldekel Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapValueToKey [](start = 52, length = 13)

I'm not sure if this is an issue, there may be other samples for this, but would it make sense to pass the IDataView keyData argument to this method to show how the user can avoid a pass over the data to get the labels in case the set of labels is known? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to show that use of the method, and I hope that there is a sample for that under MapValueToKey. However, I don't think it would be the right place here to include such a sample.
What I could do instead is load a keyType directly, instead of using MapValueToKey. Would that be better?


In reply to: 275474158 [](ancestors = 275474158)

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@artidoro artidoro merged commit 8644b3b into dotnet:master Apr 16, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants