Skip to content

Fix missing ExampleWeightColumnName in the advanced Options for some trainers #3104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 29, 2019

Conversation

abgoswam
Copy link
Member

Fixes #2175

  • Added ExampleWeightColumnName in the advanced Options for SDCA trainers {Regression, LogisticRegression, MaximumEntropy}
  • Added tests

@abgoswam abgoswam changed the title Fix missing ExampleWeightColumnName in the advanced Options for some trainers Fix missing ExampleWeightColumnName in the advanced Options for some trainers Mar 27, 2019
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -154,7 +154,7 @@ public abstract class SdcaTrainerBase<TOptions, TTransformer, TModel> : Stochast
/// <summary>
/// Options for the SDCA-based trainers.
/// </summary>
public abstract class OptionsBase : TrainerInputBaseWithLabel
public abstract class OptionsBase : TrainerInputBaseWithWeight
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TrainerInputBaseWithWeight [](start = 44, length = 26)

I would assume you check all mlContext.*Catalog extensions for SDCA trainers to have exampleWeightColumnName in it? #Resolved

Copy link
Member Author

@abgoswam abgoswam Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap. i did verify all of the "simple" SDCA trainer extensions have the exampleWeightColumnName parameter


In reply to: 269666016 [](ancestors = 269666016)

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

Merging #3104 into master will increase coverage by 0.01%.
The diff coverage is 92.94%.

@@            Coverage Diff             @@
##           master    #3104      +/-   ##
==========================================
+ Coverage   72.52%   72.53%   +0.01%     
==========================================
  Files         808      808              
  Lines      144665   144740      +75     
  Branches    16198    16202       +4     
==========================================
+ Hits       104912   104982      +70     
- Misses      35342    35346       +4     
- Partials     4411     4412       +1
Flag Coverage Δ
#Debug 72.53% <92.94%> (+0.01%) ⬆️
#production 68.12% <85.71%> (ø) ⬆️
#test 88.82% <94.36%> (+0.01%) ⬆️
Impacted Files Coverage Δ
...oft.ML.StandardTrainers/Standard/SdcaMulticlass.cs 90.1% <100%> (ø) ⬆️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 72.95% <100%> (+0.17%) ⬆️
...oft.ML.StandardTrainers/Standard/SdcaRegression.cs 95.83% <100%> (ø) ⬆️
...ts/TrainerEstimators/TreeEnsembleFeaturizerTest.cs 100% <100%> (ø) ⬆️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs 24.46% <100%> (+0.73%) ⬆️
...rc/Microsoft.ML.StaticPipe/SdcaStaticExtensions.cs 81.72% <60%> (-0.61%) ⬇️
.../Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs 97.26% <94.11%> (-2.74%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
... and 3 more

@@ -12,7 +12,7 @@ public static void Example()
var mlContext = new MLContext();

// Get a small dataset as an IEnumerable and them read it as ML.NET's data type.
IEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorSample> enumerableOfData = SamplesUtils.DatasetUtils.GenerateBinaryLabelFloatFeatureVectorSamples(5);
IEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorFloatWeightSample> enumerableOfData = SamplesUtils.DatasetUtils.GenerateBinaryLabelFloatFeatureVectorFloatWeightSamples(5);
Copy link
Member

@wschin wschin Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you examine the effect of having weight? Should it be somehow checked in a test? I feel we need two trainers w/wo weight column and make sure the two trained models are different. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats what i did in the test SdcaLogisticRegressionWithWeight .. Added two trainers w/wo weights and verified it produced different metrics. Is that sufficient ?


In reply to: 270204851 [](ancestors = 270204851)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their scores are similar. I'd like to have a more strict criterion. As you heard from Zeeshan S, tiny changes induced large SDCA regression this morning.


In reply to: 270205745 [](ancestors = 270205745,270204851)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added checks in the other tests where we test thoroughly for this


In reply to: 270207991 [](ancestors = 270207991,270205745,270204851)

var sdcaWithWeightBinary = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(
new SdcaLogisticRegressionBinaryTrainer.Options { ExampleWeightColumnName = "Weight", NumberOfThreads = 1 });

var prediction1 = sdcaWithoutWeightBinary.Fit(data).Transform(data);
Copy link
Member

@wschin wschin Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check that if all (or most) results in prediction1 and prediction2 are different? #Resolved

Copy link
Member Author

@abgoswam abgoswam Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added checks to check that the model parameters for both models are different


In reply to: 270206386 [](ancestors = 270206386)

Assert.Equal(0.3591, metrics2.LogLoss, 4);

// Verify SdcaMaximumEntropy with and without weights.
var sdcaWithoutWeightMulticlass = mlContext.Transforms.Conversion.MapValueToKey("LabelIndex", "Label").
Copy link
Member

@wschin wschin Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we put multi-class test to another independent test (to make test small)? #Resolved

@@ -88,11 +88,66 @@ public void SdcaLogisticRegression()
Assert.InRange(first.Probability, 0.8, 1);
}

[Fact]
public void SdcaLogisticRegressionWithWeight()
Copy link
Member

@wschin wschin Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogisticRegression [](start = 24, length = 18)

This is called LogisticRegression but contains MaximumEntropy trainers. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIll fix by separating into 2 tests one each for binary and multiclass


In reply to: 270208235 [](ancestors = 270208235)

@abgoswam abgoswam merged commit 0a2ec3a into dotnet:master Mar 29, 2019
shauheen pushed a commit to shauheen/machinelearning that referenced this pull request Apr 2, 2019
…trainers (dotnet#3104)

* fixed issue, added tests

* fix review comments

* updating equality checks for floats
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants