Skip to content

Adding functional tests for all training and evaluation tasks #2646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 24, 2019
68 changes: 68 additions & 0 deletions test/Microsoft.ML.Functional.Tests/Common.cs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
using System.Linq;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Data.Evaluators.Metrics;
using Microsoft.ML.Functional.Tests.Datasets;
using Xunit;

Expand Down Expand Up @@ -160,6 +161,73 @@ public static void AssertEqual(TypeTestData testType1, TypeTestData testType2)
Assert.True(testType1.Ug.Equals(testType2.Ug));
}

/// <summary>
/// Check that a <see cref="AnomalyDetectionMetrics"/> object is valid.
/// </summary>
/// <param name="metrics">The metrics object.</param>
public static void CheckMetrics(AnomalyDetectionMetrics metrics)
{
// Perform sanity checks on the metrics.
Assert.InRange(metrics.Auc, 0, 1);
Assert.InRange(metrics.DrAtK, 0, 1);
}

/// <summary>
/// Check that a <see cref="BinaryClassificationMetrics"/> object is valid.
/// </summary>
/// <param name="metrics">The metrics object.</param>
public static void CheckMetrics(BinaryClassificationMetrics metrics)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryClassificationMetrics [](start = 40, length = 27)

we added CalibratedBinaryClassificationMetrics recently. Strangely you don't have them in your PR. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I hadn't noticed that.


In reply to: 259454652 [](ancestors = 259454652)

{
// Perform sanity checks on the metrics.
Assert.InRange(metrics.Accuracy, 0, 1);
Assert.InRange(metrics.Auc, 0, 1);
Assert.InRange(metrics.Auprc, 0, 1);
Assert.InRange(metrics.F1Score, 0, 1);
Assert.InRange(metrics.NegativePrecision, 0, 1);
Assert.InRange(metrics.NegativeRecall, 0, 1);
Assert.InRange(metrics.PositivePrecision, 0, 1);
Assert.InRange(metrics.PositiveRecall, 0, 1);
}

/// <summary>
/// Check that a <see cref="ClusteringMetrics"/> object is valid.
/// </summary>
/// <param name="metrics">The metrics object.</param>
public static void CheckMetrics(ClusteringMetrics metrics)
{
// Perform sanity checks on the metrics.
Assert.True(metrics.AvgMinScore >= 0);
Assert.True(metrics.Dbi >= 0);
if (!double.IsNaN(metrics.Nmi))
Assert.True(metrics.Nmi >= 0 && metrics.Nmi <= 1);
}

/// <summary>
/// Check that a <see cref="MultiClassClassifierMetrics"/> object is valid.
/// </summary>
/// <param name="metrics">The metrics object.</param>
public static void CheckMetrics(MultiClassClassifierMetrics metrics)
{
// Perform sanity checks on the metrics.
Assert.InRange(metrics.AccuracyMacro, 0, 1);
Assert.InRange(metrics.AccuracyMicro, 0, 1);
Assert.True(metrics.LogLoss >= 0);
Assert.InRange(metrics.TopKAccuracy, 0, 1);
}

/// <summary>
/// Check that a <see cref="RankerMetrics"/> object is valid.
/// </summary>
/// <param name="metrics">The metrics object.</param>
public static void CheckMetrics(RankerMetrics metrics)
{
// Perform sanity checks on the metrics.
foreach (var dcg in metrics.Dcg)
Assert.True(dcg >= 0);
foreach (var ndcg in metrics.Ndcg)
Assert.InRange(ndcg, 0, 100);
}

/// <summary>
/// Check that a <see cref="RegressionMetrics"/> object is valid.
/// </summary>
Expand Down
78 changes: 78 additions & 0 deletions test/Microsoft.ML.Functional.Tests/Datasets/Iris.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.


using System;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;

namespace Microsoft.ML.Functional.Tests.Datasets
{
/// <summary>
/// A class for the Iris test dataset.
/// </summary>
internal sealed class Iris
{
[LoadColumn(0)]
public float Label { get; set; }

[LoadColumn(1)]
public float SepalLength { get; set; }

[LoadColumn(2)]
public float SepalWidth { get; set; }

[LoadColumn(4)]
public float PetalLength { get; set; }

[LoadColumn(5)]
public float PetalWidth { get; set; }

/// <summary>
/// The list of columns commonly used as features.
/// </summary>
public static readonly string[] Features = new string[] { "SepalLength", "SepalWidth", "PetalLength", "PetalWidth" };

public static IDataView LoadAsRankingProblem(MLContext mlContext, string filePath, bool hasHeader, char separatorChar, int seed = 1)
{
// Load the Iris data.
var data = mlContext.Data.ReadFromTextFile<Iris>(filePath, hasHeader: hasHeader, separatorChar: separatorChar);

// Create a function that generates a random groupId.
var rng = new Random(seed);
Action<Iris, IrisWithGroup> generateGroupId = (input, output) =>
{
output.Label = input.Label;
// The standard set used in tests has 150 rows
output.GroupId = (ushort)rng.Next(0, 30);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ushort [](start = 34, length = 6)

any reason why it's ushort? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remnants of when I was trying to read directly as a KeyType. (see #2642)


In reply to: 259509894 [](ancestors = 259509894)

output.PetalLength = input.PetalLength;
output.PetalWidth = input.PetalWidth;
output.SepalLength = input.SepalLength;
output.SepalWidth = input.SepalWidth;
};

// Describe a pipeline that generates a groupId and converts it to a key.
var pipeline = mlContext.Transforms.CustomMapping(generateGroupId, null)
.Append(mlContext.Transforms.Conversion.MapValueToKey("GroupId"));

// Transform the data
var transformedData = pipeline.Fit(data).Transform(data);

return transformedData;
}
}

/// <summary>
/// A class for the Iris dataset with a GroupId column.
/// </summary>
internal sealed class IrisWithGroup
{
public float Label { get; set; }
public ushort GroupId { get; set; }
public float SepalLength { get; set; }
public float SepalWidth { get; set; }
public float PetalLength { get; set; }
public float PetalWidth { get; set; }
}
}
20 changes: 20 additions & 0 deletions test/Microsoft.ML.Functional.Tests/Datasets/MnistOneClass.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Data;

namespace Microsoft.ML.Functional.Tests.Datasets
{
/// <summary>
/// A class for reading in the MNIST One Class test dataset.
/// </summary>
internal sealed class MnistOneClass
{
[LoadColumn(0)]
public float Label { get; set; }

[LoadColumn(1, 784), VectorType(784)]
public float[] Features { get; set; }
}
}
20 changes: 20 additions & 0 deletions test/Microsoft.ML.Functional.Tests/Datasets/Sentiment.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Data;

namespace Microsoft.ML.Functional.Tests.Datasets
{
/// <summary>
/// A class for reading in the Sentiment test dataset.
/// </summary>
internal sealed class TweetSentiment
{
[LoadColumn(0), ColumnName("Label")]
public bool Sentiment { get; set; }

[LoadColumn(1)]
public string SentimentText { get; set; }
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.


using System;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;

namespace Microsoft.ML.Functional.Tests.Datasets
{
/// <summary>
/// A class containing one property per <see cref="DataKind"/>.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// A class containing one property per . [](start = 4, length = 63)

I don't get this sentence. What DataKind has to do with MF? Your data is row+col+value triplets for matrix. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/Paste error, unfortunately.


In reply to: 259511254 [](ancestors = 259511254)

/// </summary>
/// <remarks>
/// This class has annotations for automatic deserialization from a file, and contains helper methods
/// for reading from a file and for generating a random dataset as an IEnumerable.
/// </remarks>
internal sealed class TrivialMatrixFactorization
{
[LoadColumn(0)]
public float Label { get; set; }

[LoadColumn(1)]
public uint MatrixColumnIndex { get; set; }

[LoadColumn(2)]
public uint MatrixRowIndex { get; set; }

public static IDataView LoadAndFeaturizeFromTextFile(MLContext mlContext, string filePath, bool hasHeader, char separatorChar)
{
// Load the data from a textfile.
var data = mlContext.Data.ReadFromTextFile<TrivialMatrixFactorization>(filePath, hasHeader: hasHeader, separatorChar: separatorChar);

// Describe a pipeline to translate the uints to keys.
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("MatrixColumnIndex")
.Append(mlContext.Transforms.Conversion.MapValueToKey("MatrixRowIndex"));

// Transform the data.
var transformedData = pipeline.Fit(data).Transform(data);

return transformedData;
}
}
}
Loading