-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add a project for functional tests without visibility into internals of ML.NET #2470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7557a83
e27e20f
22c2f1f
363b083
2843389
f9f03c8
c1b7ffc
56ae524
8dd776e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using Microsoft.Data.DataView; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.SamplesUtils; | ||
using Microsoft.ML.Trainers.HalLearners; | ||
using Xunit; | ||
|
||
namespace Microsoft.ML.Functional.Tests | ||
{ | ||
internal static class Common | ||
{ | ||
public static void CheckMetrics(RegressionMetrics metrics) | ||
{ | ||
// Perform sanity checks on the metrics | ||
Assert.True(metrics.Rms >= 0); | ||
Assert.True(metrics.L1 >= 0); | ||
Assert.True(metrics.L2 >= 0); | ||
Assert.True(metrics.RSquared <= 1); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
|
||
<PropertyGroup> | ||
<!-- We are turning off strong naming to ensure we never add `InternalsVisibleTo` for these tests --> | ||
<SignAssembly>false</SignAssembly> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You probably need to set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the tip! I couldn't figure out why those builds broke! In reply to: 255168732 [](ancestors = 255168732) |
||
<PublicSign>false</PublicSign> | ||
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.Data\Microsoft.ML.Data.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.Ensemble\Microsoft.ML.Ensemble.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.EntryPoints\Microsoft.ML.EntryPoints.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.HalLearners\Microsoft.ML.HalLearners.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.ImageAnalytics\Microsoft.ML.ImageAnalytics.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.KMeansClustering\Microsoft.ML.KMeansClustering.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.SamplesUtils\Microsoft.ML.SamplesUtils.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.LightGBM\Microsoft.ML.LightGBM.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.Maml\Microsoft.ML.Maml.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.OnnxTransform\Microsoft.ML.OnnxTransform.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.PCA\Microsoft.ML.PCA.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.KMeansClustering\Microsoft.ML.KMeansClustering.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.Recommender\Microsoft.ML.Recommender.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.Onnx\Microsoft.ML.Onnx.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.StaticPipe\Microsoft.ML.StaticPipe.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.TensorFlow.StaticPipe\Microsoft.ML.TensorFlow.StaticPipe.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.TensorFlow\Microsoft.ML.TensorFlow.csproj" /> | ||
<ProjectReference Include="..\..\src\Microsoft.ML.TimeSeries\Microsoft.ML.TimeSeries.csproj" /> | ||
<ProjectReference Include="..\Microsoft.ML.TestFramework\Microsoft.ML.TestFramework.csproj" /> | ||
</ItemGroup> | ||
|
||
<ItemGroup> | ||
<NativeAssemblyReference Include="CpuMathNative" /> | ||
<NativeAssemblyReference Include="FastTreeNative" /> | ||
<NativeAssemblyReference Include="FactorizationMachineNative" /> | ||
<NativeAssemblyReference Include="MatrixFactorizationNative" /> | ||
<NativeAssemblyReference Include="LdaNative" /> | ||
<NativeAssemblyReference Include="SymSgdNative" /> | ||
<NativeAssemblyReference Include="MklProxyNative" /> | ||
<NativeAssemblyReference Include="MklImports" /> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are those all needed a this point in time? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
</ItemGroup> | ||
|
||
<!-- TensorFlow is 64-bit only --> | ||
<ItemGroup Condition="'$(NativeTargetArchitecture)' == 'x64'"> | ||
<NativeAssemblyReference Include="tensorflow" /> | ||
<NativeAssemblyReference Condition="'$(OS)' != 'Windows_NT'" Include="tensorflow_framework" /> | ||
</ItemGroup> | ||
<ItemGroup> | ||
<PackageReference Include="Microsoft.ML.TensorFlow.TestModels" Version="$(MicrosoftMLTensorFlowTestModelsVersion)" /> | ||
<PackageReference Include="Microsoft.ML.Onnx.TestModels" Version="$(MicrosoftMLOnnxTestModelsVersion)" /> | ||
</ItemGroup> | ||
</Project> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using Microsoft.ML.RunTests; | ||
using Microsoft.ML.TestFramework; | ||
using Xunit; | ||
|
||
namespace Microsoft.ML.Functional.Tests | ||
{ | ||
public class PredictionScenarios | ||
{ | ||
/// <summary> | ||
/// Reconfigurable predictions: The following should be possible: A user trains a binary classifier, | ||
/// and through the test evaluator gets a PR curve, the based on the PR curve picks a new threshold | ||
/// and configures the scorer (or more precisely instantiates a new scorer over the same model parameters) | ||
/// with some threshold derived from that. | ||
/// </summary> | ||
[Fact] | ||
public void ReconfigurablePrediction() | ||
{ | ||
var mlContext = new MLContext(seed: 789); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
this intentional? #Resolved |
||
|
||
// Get the dataset, create a train and test | ||
var data = mlContext.Data.CreateTextLoader(TestDatasets.housing.GetLoaderColumns(), hasHeader: true) | ||
.Read(BaseTestClass.GetDataPath(TestDatasets.housing.trainFilename)); | ||
(var train, var test) = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.2); | ||
|
||
// Create a pipeline to train on the housing data | ||
var pipeline = mlContext.Transforms.Concatenate("Features", new string[] { | ||
"CrimesPerCapita", "PercentResidental", "PercentNonRetail", "CharlesRiver", "NitricOxides", "RoomsPerDwelling", | ||
"PercentPre40s", "EmploymentDistance", "HighwayDistance", "TaxRate", "TeacherRatio"}) | ||
.Append(mlContext.Transforms.CopyColumns("Label", "MedianHomeValue")) | ||
.Append(mlContext.Regression.Trainers.OrdinaryLeastSquares()); | ||
|
||
var model = pipeline.Fit(train); | ||
|
||
var scoredTest = model.Transform(test); | ||
var metrics = mlContext.Regression.Evaluate(scoredTest); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we be asserting the metrics are in a certain range? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call on checking valid ranges. I added a Common library to add those sorts of checks to. In reply to: 255167478 [](ancestors = 255167478) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you missed checking that new function in. In reply to: 255228907 [](ancestors = 255228907,255167478) |
||
|
||
Common.CheckMetrics(metrics); | ||
|
||
// Todo #2465: Allow the setting of threshold and thresholdColumn for scoring. | ||
// This is no longer possible in the API | ||
//var newModel = new BinaryPredictionTransformer<IPredictorProducing<float>>(ml, model.Model, trainData.Schema, model.FeatureColumn, threshold: 0.01f, thresholdColumn: DefaultColumnNames.Probability); | ||
//var newScoredTest = newModel.Transform(pipeline.Transform(testData)); | ||
//var newMetrics = mlContext.BinaryClassification.Evaluate(scoredTest); | ||
// And the Threshold and ThresholdColumn properties are not settable. | ||
//var predictor = model.LastTransformer; | ||
//predictor.Threshold = 0.01; // Not possible | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using Microsoft.Data.DataView; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.RunTests; | ||
using Microsoft.ML.TestFramework; | ||
using Microsoft.ML.Trainers.HalLearners; | ||
using Xunit; | ||
|
||
namespace Microsoft.ML.Functional.Tests | ||
{ | ||
public class ValidationScenarios | ||
{ | ||
/// <summary> | ||
/// Cross-validation: Have a mechanism to do cross validation, that is, you come up with | ||
/// a data source (optionally with stratification column), come up with an instantiable transform | ||
/// and trainer pipeline, and it will handle (1) splitting up the data, (2) training the separate | ||
/// pipelines on in-fold data, (3) scoring on the out-fold data, (4) returning the set of | ||
/// metrics, trained pipelines, and scored test data for each fold. | ||
/// </summary> | ||
[Fact] | ||
void CrossValidation() | ||
{ | ||
var mlContext = new MLContext(seed: 789); | ||
|
||
// Get the dataset | ||
var data = mlContext.Data.CreateTextLoader(TestDatasets.housing.GetLoaderColumns(), hasHeader: true) | ||
.Read(BaseTestClass.GetDataPath(TestDatasets.housing.trainFilename)); | ||
|
||
// Create a pipeline to train on the sentiment data | ||
var pipeline = mlContext.Transforms.Concatenate("Features", new string[] { | ||
"CrimesPerCapita", "PercentResidental", "PercentNonRetail", "CharlesRiver", "NitricOxides", "RoomsPerDwelling", | ||
"PercentPre40s", "EmploymentDistance", "HighwayDistance", "TaxRate", "TeacherRatio"}) | ||
.Append(mlContext.Transforms.CopyColumns("Label", "MedianHomeValue")) | ||
.Append(mlContext.Regression.Trainers.OrdinaryLeastSquares()); | ||
|
||
// Compute the CV result | ||
var cvResult = mlContext.Regression.CrossValidate(data, pipeline, numFolds: 5); | ||
|
||
// Check that the results are valid | ||
Assert.IsType<RegressionMetrics>(cvResult[0].metrics); | ||
Assert.IsType<TransformerChain<RegressionPredictionTransformer<OlsLinearRegressionModelParameters>>>(cvResult[0].model); | ||
Assert.True(cvResult[0].scoredTestData is IDataView); | ||
Assert.Equal(5, cvResult.Length); | ||
|
||
// And validate the metrics | ||
foreach (var result in cvResult) | ||
Common.CheckMetrics(result.metrics); | ||
} | ||
} | ||
} |
This file was deleted.
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change necessary in this PR?
I don't think our tests should be calling this method - BTW #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is necessary if we want to use
LoadHousingRegressionDataset
in our tests because there is a race condition on the file lock, so tests will sometimes fail.Can you explain a bit more why you don't want to use this in tests? Is it that we don't want to use the SamplesUtils project in Tests, or that we shouldn't be downloading data for tests?
If it's the former, check out issue #2420 . We're going to make this a standalone Datasets/ (or some such name) outside of the NuGet project to use in Samples and Tests.
If it's the latter, we are already downloading datasets for tests. But now that I mention it, we can actually add an optional input to LoadHousingRegressionDataset and friends that can load the file from the tests/data/ directory. I'll add this capability now.
In reply to: 255236393 [](ancestors = 255236393)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
In reply to: 255256550 [](ancestors = 255256550,255236393)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 datasets our tests should use.
test\data
.test\data\external
through theDownloadExternalTestFiles
build step.We shouldn't have the test code be downloading random things. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. This has been updated to use the local dataset in the
test\data
folder. I'll chase down any other tests using theseDownload
commands as I migrate API-Scenario tests to Functional.Tests/In reply to: 255264819 [](ancestors = 255264819)