Skip to content

Commit f0a9163

Browse files
committed
Squashed commit of the following:
commit a5e274ef8869576190bbb794360a5f56d998b470 Merge: b7db4fa d7f9996 Author: Keren Fuentes <[email protected]> Date: Thu Nov 14 14:51:21 2019 -0800 Merge branch 'onnx_bin_classifiers' of https://github.com/Lynx1820/machinelearning into onnx_bin_classifiers commit b7db4fa Author: Harish Kulkarni <[email protected]> Date: Thu Nov 14 17:41:12 2019 +0000 Added onnx export support for KeyToValueMappingTransformer (dotnet#4455) commit f3e0f6b Author: Eric Erhardt <[email protected]> Date: Thu Nov 14 07:22:12 2019 -0600 Fix a flaky Extensions.ML test. (dotnet#4458) * Fix a flaky Extensions.ML test. Make the reload model tests more resistant to timing changes. * PR feedback. commit c1e190a Author: Harish Kulkarni <[email protected]> Date: Thu Nov 14 05:24:14 2019 +0000 Added onnx export support for OptionalColumnTransform (dotnet#4454) * Initial work for adding onnx export support for OptionalColumnTransform * Implemented support for optional initializers in OnnxTranformer to support OptionalColumnTransform * Fixed handling of double values and non-long numeric types * Removed redundant line * Updated review comment commit f96761b Author: Harish Kulkarni <[email protected]> Date: Thu Nov 14 03:17:12 2019 +0000 Fixed model saving and loading of OneVersusAllTrainer to include SoftMax (dotnet#4472) * Fixed model saving and loading of OneVersusAllTrainer to include SoftMax * Modified existing test to include SoftMax option * Modified test to verify both cases: when UseSoftmax is true and false commit d45cc8a Author: Jake <[email protected]> Date: Wed Nov 13 17:26:49 2019 -0800 Add InternalsVisibleTo in AutoML and CodeGenerator for the assembly Microsoft.ML.ModelBuilder.AutoMLService.Gpu (dotnet#4474) commit 5e83e23 Author: Eric Erhardt <[email protected]> Date: Wed Nov 13 16:09:05 2019 -0600 CpuMathNative assembly is not getting copied when using packages.config. (dotnet#4465) When we refactored CpuMath to support netcoreapp3.0, we broke the packages.config support to copy the native assembly. This fixes it again by copying the file from the correct location. Fix dotnet#93 commit 693250b Author: Harish Kulkarni <[email protected]> Date: Wed Nov 13 21:58:07 2019 +0000 Added onnx export support for WordTokenizingTransformer and NgramExtractingTransformer (dotnet#4451) * Added onnx export support for string related transforms * Updated baseline test files A large portion of this commit is upgrading the baseline test files. The rest of the fixes deal with build breaks resulting from the upgrade of ORT version. * Fixed bugs in ValueToKeyMappingTransformer and added additional tests commit 5910910 Author: Antonio Velázquez <[email protected]> Date: Mon Nov 11 17:19:39 2019 -0800 Fixes dotnet#4292 about using PFI with BPT and CMPB (dotnet#4306) *Changes in PredictionTransformer.cs and Calibrator.cs to fix the problem of the create methods not being called, to make CMP load its internal calibrator and predictor first so to assign the correct paramaters types and runtimes, and added a PredictionTransformerLoadTypeAttribute so that the binary prediction transformer knows what type to assign when loading a CMP as its internal model. *Added a working sample for using PFI with BPT and CMPB while loading a model from disk. This is based entirely in the original sample. *Added file CalibratedModelParametersTests.cs with tests that the CMPs modified in this PR are now being correctly loaded from disk. *Changed a couple of tests in LbfgsTests.cs that failed because they used casts that now return 'null'. commit bcdac55 Author: Brian Stark <[email protected]> Date: Mon Nov 11 13:42:42 2019 -0800 Stabilize the LR test (dotnet#4446) * Stabilize the LR test Found issue with how we were using random for our ImageClassificationTrainer. This caused instability in our unit test, as we were not able to control the random seed. Modified the code to now use the same random object throughout, the trainer, thus allowing us to control the seed and therefor have predictable output. commit d7f9996 Author: Keren Fuentes <[email protected]> Date: Mon Nov 11 11:33:17 2019 -0800 workaround Scores commit 7fba31c Merge: 93388b6 c96d690 Author: Keren Fuentes <[email protected]> Date: Mon Nov 11 11:25:28 2019 -0800 merging changes commit 93388b6 Author: Keren Fuentes <[email protected]> Date: Mon Nov 11 11:19:59 2019 -0800 Added extraction of score column before node creation commit ea71828 Author: Keren Fuentes <[email protected]> Date: Fri Nov 8 15:53:11 2019 -0800 fix for binary classification trainers export to onnx commit 6fad293 Author: Keren Fuentes <[email protected]> Date: Thu Oct 31 15:26:43 2019 -0700 Revert "draft regression test" This reverts commit 1ad45c995516b9d39fc05aca855ce2abe96c407b. commit 83c1c80 Author: Keren Fuentes <[email protected]> Date: Thu Oct 31 15:24:23 2019 -0700 draft regression test commit 8884161 Author: frank-dong-ms <[email protected]> Date: Fri Nov 8 20:20:53 2019 -0800 nightly build pipeline (dotnet#4444) * nightly build pipeline commit c96d690 Author: Keren Fuentes <[email protected]> Date: Fri Nov 8 15:53:11 2019 -0800 fix for binary classification trainers export to onnx commit 8100364 Author: Keren Fuentes <[email protected]> Date: Thu Oct 31 15:26:43 2019 -0700 Revert "draft regression test" This reverts commit 1ad45c995516b9d39fc05aca855ce2abe96c407b. commit 81381e2 Author: Keren Fuentes <[email protected]> Date: Thu Oct 31 15:24:23 2019 -0700 draft regression test
1 parent b7db4fa commit f0a9163

File tree

4 files changed

+114
-11
lines changed

4 files changed

+114
-11
lines changed

src/Microsoft.ML.Data/Scorers/BinaryClassifierScorer.cs

+23-5
Original file line numberDiff line numberDiff line change
@@ -197,14 +197,32 @@ private protected override void SaveAsOnnxCore(OnnxContext ctx)
197197
for (int iinfo = 0; iinfo < Bindings.InfoCount; ++iinfo)
198198
outColumnNames[iinfo] = Bindings.GetColumnName(Bindings.MapIinfoToCol(iinfo));
199199

200-
//Check if "Probability" column was generated by the base class, only then
201-
//label can be predicted.
200+
/* If the probability column was generated, then the classification threshold is set to 0.5. Otherwise,
201+
the predicted label is based on the sign of the score.
202+
REVIEW: Binarizer should always have at least two output columns?
203+
*/
204+
string opType = "Binarizer";
205+
var binarizerOutput = ctx.AddIntermediateVariable(null, "BinarizerOutput", true);
206+
202207
if (Bindings.InfoCount >= 3 && ctx.ContainsColumn(outColumnNames[2]))
203208
{
204-
string opType = "Binarizer";
205-
var node = ctx.CreateNode(opType, new[] { ctx.GetVariableName(outColumnNames[2]) },
206-
new[] { ctx.GetVariableName(outColumnNames[0]) }, ctx.GetNodeName(opType));
209+
var node = ctx.CreateNode(opType, ctx.GetVariableName(outColumnNames[2]), binarizerOutput, ctx.GetNodeName(opType));
207210
node.AddAttribute("threshold", 0.5);
211+
212+
opType = "Cast";
213+
node = ctx.CreateNode(opType, binarizerOutput, ctx.GetVariableName(outColumnNames[0]), ctx.GetNodeName(opType), "");
214+
var t = InternalDataKindExtensions.ToInternalDataKind(DataKind.Boolean).ToType();
215+
node.AddAttribute("to", t);
216+
}
217+
else if (Bindings.InfoCount == 2)
218+
{
219+
var node = ctx.CreateNode(opType, ctx.GetVariableName(outColumnNames[1]), binarizerOutput, ctx.GetNodeName(opType));
220+
node.AddAttribute("threshold", 0.0);
221+
222+
opType = "Cast";
223+
node = ctx.CreateNode(opType, binarizerOutput, ctx.GetVariableName(outColumnNames[0]), ctx.GetNodeName(opType), "");
224+
var t = InternalDataKindExtensions.ToInternalDataKind(DataKind.Boolean).ToType();
225+
node.AddAttribute("to", t);
208226
}
209227
}
210228

src/Microsoft.ML.FastTree/FastTree.cs

+2-1
Original file line numberDiff line numberDiff line change
@@ -3111,7 +3111,8 @@ bool ISingleCanSaveOnnx.SaveAsOnnx(OnnxContext ctx, string[] outputNames, string
31113111
}
31123112

31133113
string opType = "TreeEnsembleRegressor";
3114-
var node = ctx.CreateNode(opType, new[] { featureColumn }, outputNames, ctx.GetNodeName(opType));
3114+
string scoreVarName = (Utils.Size(outputNames) == 2) ? outputNames[1] : outputNames[0]; // Get Score from PredictedLabel and/or Score columns
3115+
var node = ctx.CreateNode(opType, new[] { featureColumn }, new[] { scoreVarName }, ctx.GetNodeName(opType));
31153116

31163117
node.AddAttribute("post_transform", PostTransform.None.GetDescription());
31173118
node.AddAttribute("n_targets", 1);

src/Microsoft.ML.StandardTrainers/Standard/LinearModelParameters.cs

+3-3
Original file line numberDiff line numberDiff line change
@@ -240,10 +240,10 @@ JToken ISingleCanSavePfa.SaveAsPfa(BoundPfaContext ctx, JToken input)
240240
bool ISingleCanSaveOnnx.SaveAsOnnx(OnnxContext ctx, string[] outputs, string featureColumn)
241241
{
242242
Host.CheckValue(ctx, nameof(ctx));
243-
Host.Check(Utils.Size(outputs) == 1);
244-
245243
string opType = "LinearRegressor";
246-
var node = ctx.CreateNode(opType, new[] { featureColumn }, outputs, ctx.GetNodeName(opType));
244+
string scoreVarName = (Utils.Size(outputs) == 2) ? outputs[1] : outputs[0]; // Get Score from PredictedLabel and/or Score columns
245+
246+
var node = ctx.CreateNode(opType, new[] { featureColumn }, new[] { scoreVarName }, ctx.GetNodeName(opType));
247247
// Selection of logit or probit output transform. enum {'NONE', 'LOGIT', 'PROBIT}
248248
node.AddAttribute("post_transform", "NONE");
249249
node.AddAttribute("targets", 1);

test/Microsoft.ML.Tests/OnnxConversionTest.cs

+86-2
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,14 @@ private class BreastCancerMulticlassExample
132132
[LoadColumn(2, 9), VectorType(8)]
133133
public float[] Features;
134134
}
135+
private class BreastCancerBinaryClassification
136+
{
137+
[LoadColumn(0)]
138+
public bool Label;
139+
140+
[LoadColumn(2, 9), VectorType(8)]
141+
public float[] Features;
142+
}
135143

136144
[LessThanNetCore30OrNotNetCoreFact("netcoreapp3.0 output differs from Baseline. Tracked by https://github.com/dotnet/machinelearning/issues/2087")]
137145
public void KmeansOnnxConversionTest()
@@ -188,6 +196,54 @@ public void KmeansOnnxConversionTest()
188196
Done();
189197
}
190198

199+
[Fact]
200+
public void binaryClassificationTrainersOnnxConversionTest()
201+
{
202+
var mlContext = new MLContext(seed: 1);
203+
string dataPath = GetDataPath("breast-cancer.txt");
204+
// Now read the file (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
205+
var dataView = mlContext.Data.LoadFromTextFile<BreastCancerBinaryClassification>(dataPath, separatorChar: '\t', hasHeader: true);
206+
IEstimator<ITransformer>[] estimators = {
207+
mlContext.BinaryClassification.Trainers.SymbolicSgdLogisticRegression(),
208+
mlContext.BinaryClassification.Trainers.SgdCalibrated(),
209+
mlContext.BinaryClassification.Trainers.AveragedPerceptron(),
210+
mlContext.BinaryClassification.Trainers.FastForest(),
211+
mlContext.BinaryClassification.Trainers.LinearSvm(),
212+
mlContext.BinaryClassification.Trainers.SdcaNonCalibrated(),
213+
mlContext.BinaryClassification.Trainers.SgdNonCalibrated(),
214+
mlContext.BinaryClassification.Trainers.FastTree(),
215+
mlContext.BinaryClassification.Trainers.LbfgsLogisticRegression(),
216+
mlContext.BinaryClassification.Trainers.LightGbm(),
217+
mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(),
218+
mlContext.BinaryClassification.Trainers.SgdCalibrated(),
219+
mlContext.BinaryClassification.Trainers.SymbolicSgdLogisticRegression(),
220+
};
221+
var initialPipeline = mlContext.Transforms.ReplaceMissingValues("Features").
222+
Append(mlContext.Transforms.NormalizeMinMax("Features"));
223+
foreach (var estimator in estimators)
224+
{
225+
var pipeline = initialPipeline.Append(estimator);
226+
var model = pipeline.Fit(dataView);
227+
var transformedData = model.Transform(dataView);
228+
var onnxModel = mlContext.Model.ConvertToOnnxProtobuf(model, dataView);
229+
// Compare model scores produced by ML.NET and ONNX's runtime.
230+
if (IsOnnxRuntimeSupported())
231+
{
232+
var onnxFileName = $"{estimator.ToString()}.onnx";
233+
var onnxModelPath = GetOutputPath(onnxFileName);
234+
SaveOnnxModel(onnxModel, onnxModelPath, null);
235+
// Evaluate the saved ONNX model using the data used to train the ML.NET pipeline.
236+
string[] inputNames = onnxModel.Graph.Input.Select(valueInfoProto => valueInfoProto.Name).ToArray();
237+
string[] outputNames = onnxModel.Graph.Output.Select(valueInfoProto => valueInfoProto.Name).ToArray();
238+
var onnxEstimator = mlContext.Transforms.ApplyOnnxModel(outputNames, inputNames, onnxModelPath);
239+
var onnxTransformer = onnxEstimator.Fit(dataView);
240+
var onnxResult = onnxTransformer.Transform(dataView);
241+
CompareSelectedR4ScalarColumns(transformedData.Schema[5].Name, outputNames[3], transformedData, onnxResult, 3);
242+
CompareSelectedScalarColumns<Boolean>(transformedData.Schema[4].Name, outputNames[2], transformedData, onnxResult);
243+
}
244+
}
245+
Done();
246+
}
191247
private class DataPoint
192248
{
193249
[VectorType(3)]
@@ -1081,7 +1137,8 @@ private void CreateDummyExamplesToMakeComplierHappy()
10811137
var dummyExample = new BreastCancerFeatureVector() { Features = null };
10821138
var dummyExample1 = new BreastCancerCatFeatureExample() { Label = false, F1 = 0, F2 = "Amy" };
10831139
var dummyExample2 = new BreastCancerMulticlassExample() { Label = "Amy", Features = null };
1084-
var dummyExample3 = new SmallSentimentExample() { Tokens = null };
1140+
var dummyExample3 = new BreastCancerBinaryClassification() { Label = false, Features = null };
1141+
var dummyExample4 = new SmallSentimentExample() { Tokens = null };
10851142
}
10861143

10871144
private void CompareResults(string leftColumnName, string rightColumnName, IDataView left, IDataView right)
@@ -1243,7 +1300,34 @@ private void CompareSelectedR4ScalarColumns(string leftColumnName, string rightC
12431300

12441301
// Scalar such as R4 (float) is converted to [1, 1]-tensor in ONNX format for consitency of making batch prediction.
12451302
Assert.Equal(1, actual.Length);
1246-
Assert.Equal(expected, actual.GetItemOrDefault(0), precision);
1303+
CompareNumbersWithTolerance(expected, actual.GetItemOrDefault(0), null, precision);
1304+
}
1305+
}
1306+
}
1307+
private void CompareSelectedScalarColumns<T>(string leftColumnName, string rightColumnName, IDataView left, IDataView right)
1308+
{
1309+
var leftColumn = left.Schema[leftColumnName];
1310+
var rightColumn = right.Schema[rightColumnName];
1311+
1312+
using (var expectedCursor = left.GetRowCursor(leftColumn))
1313+
using (var actualCursor = right.GetRowCursor(rightColumn))
1314+
{
1315+
T expected = default;
1316+
VBuffer<T> actual = default;
1317+
var expectedGetter = expectedCursor.GetGetter<T>(leftColumn);
1318+
var actualGetter = actualCursor.GetGetter<VBuffer<T>>(rightColumn);
1319+
while (expectedCursor.MoveNext() && actualCursor.MoveNext())
1320+
{
1321+
expectedGetter(ref expected);
1322+
actualGetter(ref actual);
1323+
var actualVal = actual.GetItemOrDefault(0);
1324+
1325+
Assert.Equal(1, actual.Length);
1326+
1327+
if (typeof(T) == typeof(ReadOnlyMemory<Char>))
1328+
Assert.Equal(expected.ToString(), actualVal.ToString());
1329+
else
1330+
Assert.Equal(expected, actualVal);
12471331
}
12481332
}
12491333
}

0 commit comments

Comments
 (0)