v.11. How to lower the threshold of the predictor #331

kdcllc · 2019-03-26T18:56:49Z

In version 0.10 we were able to adjust the threshold for StochasticDualCoordinateAscent
here

// The dataset we have is skewed, as there are many more non-spam messages than spam messages.
            // While our model is relatively good at detecting the difference, this skewness leads it to always
            // say the message is not spam. We deal with this by lowering the threshold of the predictor. In reality,
            // it is useful to look at the precision-recall curve to identify the best possible threshold.
            var inPipe = new TransformerChain<ITransformer>(model.Take(model.Count() - 1).ToArray());
            var lastTransformer = new BinaryPredictionTransformer<IPredictorProducing<float>>(mlContext, model.LastTransformer.Model, inPipe.GetOutputSchema(data.Schema), model.LastTransformer.FeatureColumn, threshold: 0.15f, thresholdColumn: DefaultColumnNames.Probability);

            ITransformer[] parts = model.ToArray();
            parts[parts.Length - 1] = lastTransformer;
            ITransformer newModel = new TransformerChain<ITransformer>(parts);

The text was updated successfully, but these errors were encountered:

prathyusha12345 · 2019-03-26T19:08:52Z

@kdcllc An issue was created for this here dotnet/machinelearning#2645 (comment)

Adding @CESARDELATORRE for reference

prathyusha12345 · 2019-04-11T17:10:14Z

We have changed the pipeline as below. So we don't need to use any threshold value.

// Data process configuration with pipeline data transformations 
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label")
                                      .Append(mlContext.Transforms.Text.FeaturizeText("FeaturesText", new Microsoft.ML.Transforms.Text.TextFeaturizingEstimator.Options
                                      {
                                          WordFeatureExtractor = new Microsoft.ML.Transforms.Text.WordBagEstimator.Options { NgramLength = 2, UseAllLengths = true },
                                          CharFeatureExtractor = new Microsoft.ML.Transforms.Text.WordBagEstimator.Options { NgramLength = 3, UseAllLengths = false },
                                      }, "Message"))
                                      .Append(mlContext.Transforms.CopyColumns("Features", "FeaturesText"))
                                      .Append(mlContext.Transforms.NormalizeLpNorm("Features", "Features"))
                                      .AppendCacheCheckpoint(mlContext);

// Set the training algorithm 
var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "Label", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "Label")
                                      .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
var trainingPipeLine = dataProcessPipeline.Append(trainer);

The scores are good when evaluated the model with the above pipeline and no need to set any threshold. We will try to create other sample with label skew and threshold values.

Please find the sample here https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/BinaryClassification_SpamDetection for reference.

Closing this issue. thanks 👍

kdcllc · 2019-04-11T17:14:09Z

@prathyusha12345 Thank you for following up on this issue!

kdcllc mentioned this issue Mar 28, 2019

Normilizaing data for Spam prediction for 0.11.0 version #338

Closed

prathyusha12345 closed this as completed Apr 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v.11. How to lower the threshold of the predictor #331

v.11. How to lower the threshold of the predictor #331

kdcllc commented Mar 26, 2019 •

edited

Loading

prathyusha12345 commented Mar 26, 2019 •

edited

Loading

prathyusha12345 commented Apr 11, 2019

kdcllc commented Apr 11, 2019

v.11. How to lower the threshold of the predictor #331

v.11. How to lower the threshold of the predictor #331

Comments

kdcllc commented Mar 26, 2019 • edited Loading

prathyusha12345 commented Mar 26, 2019 • edited Loading

prathyusha12345 commented Apr 11, 2019

kdcllc commented Apr 11, 2019

kdcllc commented Mar 26, 2019 •

edited

Loading

prathyusha12345 commented Mar 26, 2019 •

edited

Loading