-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Multi classification - Probability #1881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can access |
Yes, @wschin , I have the |
In most cases (99%), it's called // Number of examples
private const int _rowNumber = 1000;
// Number of features
private const int _columnNumber = 5;
// Number of classes
private const int _classNumber = 3;
private class GbmExample
{
[VectorType(_columnNumber)]
public float[] Features;
[KeyType(Contiguous = true, Count =_classNumber, Min = 0)]
public uint Label;
[VectorType(_classNumber)]
public float[] Score;
}
[ConditionalFact(typeof(Environment), nameof(Environment.Is64BitProcess))] // LightGBM is 64-bit only
public void LightGbmMultiClassEstimatorCompare()
{
// Training matrix. It contains all feature vectors.
var dataMatrix = new float[_rowNumber * _columnNumber];
// Labels for multi-class classification
var labels = new uint[_rowNumber];
// Training list, which is equivalent to the training matrix above.
var dataList = new List<GbmExample>();
for (/*row index*/ int i = 0; i < _rowNumber; ++i)
{
int featureSum = 0;
var featureVector = new float[_columnNumber];
for (/*column index*/ int j = 0; j < _columnNumber; ++j)
{
int featureValue = (j + i * _columnNumber) % 10;
featureSum += featureValue;
dataMatrix[j + i * _columnNumber] = featureValue;
featureVector[j] = featureValue;
}
labels[i] = (uint)featureSum % _classNumber;
dataList.Add(new GbmExample { Features = featureVector, Label = labels[i], Score = new float[_classNumber] });
}
var mlContext = new MLContext(seed: 0, conc: 1);
var dataView = ComponentCreation.CreateDataView(mlContext, dataList);
var gbmTrainer = new LightGbmMulticlassTrainer(mlContext, labelColumn: "Label", featureColumn: "Features", numBoostRound: 3,
advancedSettings: s => { s.MinDataPerGroup = 1; s.MinDataPerLeaf = 1; });
var gbm = gbmTrainer.Fit(dataView);
var predicted = gbm.Transform(dataView);
var predictions = new List<GbmExample>(predicted.AsEnumerable<GbmExample>(mlContext, false));
} |
@wschin Here is my sample code: static void Main(string[] args)
{
MLContext mlContext = new MLContext();
string dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "all.csv");
TextLoader textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()
{
Separator = ",",
HasHeader = false,
Column = new[]
{
new TextLoader.Column("Description", DataKind.Text, 1),
new TextLoader.Column("Type", DataKind.Text, 2),
new TextLoader.Column("Category", DataKind.Text, 3),
}
});
var data = textLoader.Read(dataPath);
var (trainData, testData) = mlContext.MulticlassClassification.TrainTestSplit(data, testFraction: 0.2);
var dataProcessingPipeline = mlContext.Transforms.Categorical.OneHotEncoding("Type", "TypeEncoded")
.Append(mlContext.Transforms.Text.FeaturizeText("Description", "DescriptionEncoded"))
.Append(mlContext.Transforms.Concatenate("Features", "Encoded", "DescriptionEncoded"))
.Append(mlContext.Transforms.Conversion.MapValueToKey("Category", "Label"))
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent())
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
var model = dataProcessingPipeline.Fit(trainData);
var metrics = mlContext.MulticlassClassification.Evaluate(model.Transform(testData));
Console.WriteLine($"Accuracy Micro: {metrics.AccuracyMicro}");
Console.WriteLine($"Accuracy Micro: {metrics.AccuracyMacro}");
var predictionFunction = model.MakePredictionFunction<CategoryDetail, CategoryPrediction>(mlContext);
var cat = predictionFunction.Predict(new CategoryDetail
{
Description = "this is a great jacket",
Type = "Women"
});
//Now I need to output to the user the prediction cat.Prediction as well as the probability of that prediction
Console.ReadKey();
}
public class CategoryDetail
{
[Column("1")]
public string Description;
[Column("2")]
public string Type;
[Column("3")]
public string Category;
}
public class CategoryPrediction
{
[ColumnName("PredictedLabel")]
public string Prediction { get; set; }
[ColumnName("Score")]
public float[] Score { get; set; }
}
|
Can you access Score field in |
yes @wschin I can, it's just array for float. how to do I get a percentage out of it? |
|
@wschin Ok, how do I get the list of labels from model. There used to be an extension method from the model in 0.7. OR is it safe to assume that the maximum score was the predicted label? |
I'd expect that the predicted name (column name: |
I would love to know how to do this too. |
IMO there should really be a public API to get the label names out of a model. Since What I'm currently doing is this: public static string[] GetKeyValues(this Schema.Column column)
{
var metadata = column.Metadata;
var labels = default(VBuffer<ReadOnlyMemory<char>>);
metadata.GetValue(MetadataUtils.Kinds.KeyValues, ref labels);
var names = new string[labels.Length];
int index = 0;
foreach (var label in labels.DenseValues())
names[index++] = label.ToString();
return names;
} And then doing something like this to get the label names: var encodedSample = encoder.Transform(ctx.CreateDataView(new[] { new YouInputInstanceClass() }));
var encodedSchema = encodedSample.Schema;
var labelNames = encodedSchema[LabelLabel].GetKeyValues(); Could we get a public API so that we can get the label names back out? Especially once you safe the model to a file and don't have a big training data-set loaded you'd also need to get the label names out of the model somehow. |
Thanks a lot. This sounds a good idea but we need to investigate to see if doable. In your solution, the
in each classifier to attach label names to their scores. |
Just jumping in to agree that returning the top N predictions with matching score is vital, something we've lost with the removal of 'TryGetScoreLabelNames'. I'm able to do it when training the model and have all the data to hand (IDataView dataView parameter below) :
However, I haven't figured out how to do that when I'm loading the model later on, to make a single prediction. At that point, I don't have an IDataView object I can pass into the function, |
I just made an example for extracting label names from learned pipeline. Please take a look at #1953. |
Thanks, wschin - I couldn't translate your example into something that would work for me without a lot of retrofitting, but I did stumble on my own solution, I think. I keep the 'TryGetScoreLabelNames' function as is, but instead of using model.MakePredictionFunction to make my prediction, I broke that back down into its individual steps:
That then gives me the IDataView I can pass into TryGetScoreLabelNames to get the schema etc... |
Just a heads up. |
I looked at the sample code that supposedly closes this issue. It is extremely ugly code. This issue should not be closed. |
Thanks.But how do I know which type of category "i" is? |
Hello,
I have an application that use using ml.net multiclassification trainer to predict a category. However, it seems as though we remove the TryGetScoreLabelNames() method from the library. Basically, the application would like to output as well how confident (in percentage) is the predicted label. How can I achieve that in ML.NET 0.8?
The text was updated successfully, but these errors were encountered: