Fix missing ExampleWeightColumnName in the advanced Options for some trainers #3104

abgoswam · 2019-03-27T16:32:37Z

Added ExampleWeightColumnName in the advanced Options for SDCA trainers {Regression, LogisticRegression, MaximumEntropy}
Added tests

Ivanidzo4ka

Ivanidzo4ka · 2019-03-27T16:52:31Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaBinary.cs

@@ -154,7 +154,7 @@ public abstract class SdcaTrainerBase<TOptions, TTransformer, TModel> : Stochast
        /// <summary>
        /// Options for the SDCA-based trainers.
        /// </summary>
-        public abstract class OptionsBase : TrainerInputBaseWithLabel
+        public abstract class OptionsBase : TrainerInputBaseWithWeight


TrainerInputBaseWithWeight [](start = 44, length = 26)

I would assume you check all mlContext.*Catalog extensions for SDCA trainers to have exampleWeightColumnName in it? #Resolved

yeap. i did verify all of the "simple" SDCA trainer extensions have the exampleWeightColumnName parameter

In reply to: 269666016 [](ancestors = 269666016)

codecov · 2019-03-27T17:13:15Z

Codecov Report

Merging #3104 into master will increase coverage by 0.01%.
The diff coverage is 92.94%.

@@            Coverage Diff             @@
##           master    #3104      +/-   ##
==========================================
+ Coverage   72.52%   72.53%   +0.01%     
==========================================
  Files         808      808              
  Lines      144665   144740      +75     
  Branches    16198    16202       +4     
==========================================
+ Hits       104912   104982      +70     
- Misses      35342    35346       +4     
- Partials     4411     4412       +1

Flag	Coverage Δ
#Debug	`72.53% <92.94%> (+0.01%)`	⬆️
#production	`68.12% <85.71%> (ø)`	⬆️
#test	`88.82% <94.36%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
...oft.ML.StandardTrainers/Standard/SdcaMulticlass.cs	`90.1% <100%> (ø)`	⬆️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs	`72.95% <100%> (+0.17%)`	⬆️
...oft.ML.StandardTrainers/Standard/SdcaRegression.cs	`95.83% <100%> (ø)`	⬆️
...ts/TrainerEstimators/TreeEnsembleFeaturizerTest.cs	`100% <100%> (ø)`	⬆️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs	`24.46% <100%> (+0.73%)`	⬆️
...rc/Microsoft.ML.StaticPipe/SdcaStaticExtensions.cs	`81.72% <60%> (-0.61%)`	⬇️
.../Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs	`97.26% <94.11%> (-2.74%)`	⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs	`89.26% <0%> (-0.63%)`	⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs	`84.7% <0%> (-0.21%)`	⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs	`86.1% <0%> (-0.16%)`	⬇️
... and 3 more

wschin · 2019-03-28T21:24:03Z

docs/samples/Microsoft.ML.Samples/Dynamic/DataOperations/BootstrapSample.cs

@@ -12,7 +12,7 @@ public static void Example()
            var mlContext = new MLContext();

            // Get a small dataset as an IEnumerable and them read it as ML.NET's data type.
-            IEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorSample> enumerableOfData = SamplesUtils.DatasetUtils.GenerateBinaryLabelFloatFeatureVectorSamples(5);
+            IEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorFloatWeightSample> enumerableOfData = SamplesUtils.DatasetUtils.GenerateBinaryLabelFloatFeatureVectorFloatWeightSamples(5);


How do you examine the effect of having weight? Should it be somehow checked in a test? I feel we need two trainers w/wo weight column and make sure the two trained models are different. #Resolved

Thats what i did in the test SdcaLogisticRegressionWithWeight .. Added two trainers w/wo weights and verified it produced different metrics. Is that sufficient ?

In reply to: 270204851 [](ancestors = 270204851)

Their scores are similar. I'd like to have a more strict criterion. As you heard from Zeeshan S, tiny changes induced large SDCA regression this morning.

In reply to: 270205745 [](ancestors = 270205745,270204851)

added checks in the other tests where we test thoroughly for this

In reply to: 270207991 [](ancestors = 270207991,270205745,270204851)

wschin · 2019-03-28T21:28:57Z

test/Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs

+            var sdcaWithWeightBinary = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(
+                new SdcaLogisticRegressionBinaryTrainer.Options { ExampleWeightColumnName = "Weight", NumberOfThreads = 1 });
+
+            var prediction1 = sdcaWithoutWeightBinary.Fit(data).Transform(data);


Could you check that if all (or most) results in prediction1 and prediction2 are different? #Resolved

added checks to check that the model parameters for both models are different

In reply to: 270206386 [](ancestors = 270206386)

wschin · 2019-03-28T21:30:39Z

test/Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs

+            Assert.Equal(0.3591, metrics2.LogLoss, 4);
+
+            // Verify SdcaMaximumEntropy with and without weights.
+            var sdcaWithoutWeightMulticlass = mlContext.Transforms.Conversion.MapValueToKey("LabelIndex", "Label").


May we put multi-class test to another independent test (to make test small)? #Resolved

wschin · 2019-03-28T21:34:35Z

test/Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs

@@ -88,11 +88,66 @@ public void SdcaLogisticRegression()
            Assert.InRange(first.Probability, 0.8, 1);
        }

+        [Fact]
+        public void SdcaLogisticRegressionWithWeight()


LogisticRegression [](start = 24, length = 18)

This is called LogisticRegression but contains MaximumEntropy trainers. #Resolved

WIll fix by separating into 2 tests one each for binary and multiclass

In reply to: 270208235 [](ancestors = 270208235)

…trainers (dotnet#3104) * fixed issue, added tests * fix review comments * updating equality checks for floats

fixed issue, added tests

c51a39e

abgoswam requested review from wschin, ganik, Ivanidzo4ka and sfilipi March 27, 2019 16:32

abgoswam changed the title ~~Fix missing ExampleWeightColumnName in the advanced Options for some trainers~~ Fix missing ExampleWeightColumnName in the advanced Options for some trainers Mar 27, 2019

Ivanidzo4ka approved these changes Mar 27, 2019

View reviewed changes

Ivanidzo4ka reviewed Mar 27, 2019

View reviewed changes

wschin reviewed Mar 28, 2019

View reviewed changes

abgoswam added 2 commits March 29, 2019 21:27

fix review comments

c1000c0

Merge branch 'master' into abgoswam/sdca_weights

2c36555

wschin approved these changes Mar 29, 2019

View reviewed changes

updating equality checks for floats

d6b84e0

abgoswam merged commit 0a2ec3a into dotnet:master Mar 29, 2019

shauheen pushed a commit to shauheen/machinelearning that referenced this pull request Apr 2, 2019

Fix missing ExampleWeightColumnName in the advanced Options for some …

22d05b6

…trainers (dotnet#3104) * fixed issue, added tests * fix review comments * updating equality checks for floats

ghost locked as resolved and limited conversation to collaborators Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing ExampleWeightColumnName in the advanced Options for some trainers #3104

Fix missing ExampleWeightColumnName in the advanced Options for some trainers #3104

abgoswam commented Mar 27, 2019

Ivanidzo4ka left a comment

Ivanidzo4ka Mar 27, 2019 •

edited by abgoswam

Loading

abgoswam Mar 27, 2019 •

edited

Loading

codecov bot commented Mar 27, 2019 •

edited

Loading

wschin Mar 28, 2019 •

edited by abgoswam

Loading

abgoswam Mar 28, 2019

wschin Mar 28, 2019

abgoswam Mar 29, 2019

wschin Mar 28, 2019 •

edited by abgoswam

Loading

abgoswam Mar 29, 2019 •

edited

Loading

wschin Mar 28, 2019 •

edited by abgoswam

Loading

wschin Mar 28, 2019 •

edited by abgoswam

Loading

abgoswam Mar 28, 2019

Fix missing ExampleWeightColumnName in the advanced Options for some trainers #3104

Fix missing ExampleWeightColumnName in the advanced Options for some trainers #3104

Conversation

abgoswam commented Mar 27, 2019

Ivanidzo4ka left a comment

Choose a reason for hiding this comment

Ivanidzo4ka Mar 27, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

abgoswam Mar 27, 2019 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Mar 27, 2019 • edited Loading

Codecov Report

wschin Mar 28, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

abgoswam Mar 28, 2019

Choose a reason for hiding this comment

wschin Mar 28, 2019

Choose a reason for hiding this comment

abgoswam Mar 29, 2019

Choose a reason for hiding this comment

wschin Mar 28, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

abgoswam Mar 29, 2019 • edited Loading

Choose a reason for hiding this comment

wschin Mar 28, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

wschin Mar 28, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

abgoswam Mar 28, 2019

Choose a reason for hiding this comment

Ivanidzo4ka Mar 27, 2019 •

edited by abgoswam

Loading

abgoswam Mar 27, 2019 •

edited

Loading

codecov bot commented Mar 27, 2019 •

edited

Loading

wschin Mar 28, 2019 •

edited by abgoswam

Loading

wschin Mar 28, 2019 •

edited by abgoswam

Loading

abgoswam Mar 29, 2019 •

edited

Loading

wschin Mar 28, 2019 •

edited by abgoswam

Loading

wschin Mar 28, 2019 •

edited by abgoswam

Loading