-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix missing ExampleWeightColumnName in the advanced Options for some trainers #3104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ExampleWeightColumnName
in the advanced Options
for some trainersThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -154,7 +154,7 @@ public abstract class SdcaTrainerBase<TOptions, TTransformer, TModel> : Stochast | |||
/// <summary> | |||
/// Options for the SDCA-based trainers. | |||
/// </summary> | |||
public abstract class OptionsBase : TrainerInputBaseWithLabel | |||
public abstract class OptionsBase : TrainerInputBaseWithWeight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TrainerInputBaseWithWeight [](start = 44, length = 26)
I would assume you check all mlContext.*Catalog extensions for SDCA trainers to have exampleWeightColumnName in it? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap. i did verify all of the "simple" SDCA trainer extensions have the exampleWeightColumnName
parameter
In reply to: 269666016 [](ancestors = 269666016)
Codecov Report
@@ Coverage Diff @@
## master #3104 +/- ##
==========================================
+ Coverage 72.52% 72.53% +0.01%
==========================================
Files 808 808
Lines 144665 144740 +75
Branches 16198 16202 +4
==========================================
+ Hits 104912 104982 +70
- Misses 35342 35346 +4
- Partials 4411 4412 +1
|
@@ -12,7 +12,7 @@ public static void Example() | |||
var mlContext = new MLContext(); | |||
|
|||
// Get a small dataset as an IEnumerable and them read it as ML.NET's data type. | |||
IEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorSample> enumerableOfData = SamplesUtils.DatasetUtils.GenerateBinaryLabelFloatFeatureVectorSamples(5); | |||
IEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorFloatWeightSample> enumerableOfData = SamplesUtils.DatasetUtils.GenerateBinaryLabelFloatFeatureVectorFloatWeightSamples(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you examine the effect of having weight? Should it be somehow checked in a test? I feel we need two trainers w/wo weight column and make sure the two trained models are different. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats what i did in the test SdcaLogisticRegressionWithWeight
.. Added two trainers w/wo weights and verified it produced different metrics. Is that sufficient ?
In reply to: 270204851 [](ancestors = 270204851)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Their scores are similar. I'd like to have a more strict criterion. As you heard from Zeeshan S, tiny changes induced large SDCA regression this morning.
In reply to: 270205745 [](ancestors = 270205745,270204851)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added checks in the other tests where we test thoroughly for this
In reply to: 270207991 [](ancestors = 270207991,270205745,270204851)
var sdcaWithWeightBinary = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression( | ||
new SdcaLogisticRegressionBinaryTrainer.Options { ExampleWeightColumnName = "Weight", NumberOfThreads = 1 }); | ||
|
||
var prediction1 = sdcaWithoutWeightBinary.Fit(data).Transform(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check that if all (or most) results in prediction1
and prediction2
are different? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added checks to check that the model parameters for both models are different
In reply to: 270206386 [](ancestors = 270206386)
Assert.Equal(0.3591, metrics2.LogLoss, 4); | ||
|
||
// Verify SdcaMaximumEntropy with and without weights. | ||
var sdcaWithoutWeightMulticlass = mlContext.Transforms.Conversion.MapValueToKey("LabelIndex", "Label"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we put multi-class test to another independent test (to make test small)? #Resolved
@@ -88,11 +88,66 @@ public void SdcaLogisticRegression() | |||
Assert.InRange(first.Probability, 0.8, 1); | |||
} | |||
|
|||
[Fact] | |||
public void SdcaLogisticRegressionWithWeight() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogisticRegression [](start = 24, length = 18)
This is called LogisticRegression
but contains MaximumEntropy
trainers. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIll fix by separating into 2 tests one each for binary and multiclass
In reply to: 270208235 [](ancestors = 270208235)
…trainers (dotnet#3104) * fixed issue, added tests * fix review comments * updating equality checks for floats
Fixes #2175
ExampleWeightColumnName
in the advancedOptions
for SDCA trainers {Regression, LogisticRegression, MaximumEntropy}