Skip to content

Commit aae84eb

Browse files
committed
merge with latest master; fix merge conflicts
2 parents a81c3d2 + 2d351eb commit aae84eb

File tree

113 files changed

+2070
-944
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

113 files changed

+2070
-944
lines changed

docs/code/MlNetCookBook.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -885,7 +885,7 @@ IEstimator<ITransformer> dynamicPipe = learningPipeline.AsDynamic;
885885
var binaryTrainer = mlContext.BinaryClassification.Trainers.AveragedPerceptron("Label", "Features");
886886

887887
// Append the OVA learner to the pipeline.
888-
dynamicPipe = dynamicPipe.Append(new Ova(mlContext, binaryTrainer));
888+
dynamicPipe = dynamicPipe.Append(mlContext.MulticlassClassification.Trainers.OneVersusAll(binaryTrainer));
889889

890890
// At this point, we have a choice. We could continue working with the dynamically-typed pipeline, and
891891
// ultimately call dynamicPipe.Fit(data.AsDynamic) to get the model, or we could go back into the static world.
+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# ML.NET 0.10 Release Notes
2+
3+
[ML.NET](https://aka.ms/mlnet) 0.10 brings us one step closer to the stable v1 release. We understand that the API surface has been changing rapidly and we deeply appreciate the amazing support from [ML.NET](https://aka.ms/mlnet) community. These changes are necessary for the support of the stable API for many years to come. In the upcoming 0.11, and 0.12 releases before we reach 1.0, we will continue on refining the API and improving documentation.
4+
5+
We have also instrumented [code coverage](https://codecov.io/gh/dotnet/machinelearning) tools as part of our CI systems and will continue to push for stability and quality in the code.
6+
7+
One of the milestones that we have achieved in this release is moving `IDataView` into a new and separate assembly under `Microsoft.Data.DataView` namespace. For detailed documentation on `IDataView` please take a look at [IDataView design principles](https://github.com/dotnet/machinelearning/blob/master/docs/code/IDataViewDesignPrinciples.md).
8+
9+
### Installation
10+
11+
ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
12+
Core
13+
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
14+
for more details.
15+
16+
You can install ML.NET NuGet from the CLI using:
17+
```
18+
dotnet add package Microsoft.ML
19+
```
20+
21+
From package manager:
22+
```
23+
Install-Package Microsoft.ML
24+
```
25+
26+
### Release Notes
27+
28+
Below are a few of the highlights from this release. There are many other improvements in the API.
29+
30+
* DataView moved into a separate assembly and NuGet package
31+
([#2220](https://github.com/dotnet/machinelearning/pull/2220))
32+
33+
* Improvements in the API for prediction engine
34+
([#2250](https://github.com/dotnet/machinelearning/pull/2250))
35+
36+
* Introducing Microsoft.ML.Recommender NuGet name instead of Microsoft.ML.MatrixFactorization name
37+
([#2081](https://github.com/dotnet/machinelearning/pull/2081))
38+
- Better naming for NuGet packages based on the scenario (Recommendations) instead of the trainer's name
39+
40+
* Support multiple 'feature columns' in FFM (Field-aware Factorization Machines)
41+
([#2205](https://github.com/dotnet/machinelearning/pull/2205))
42+
- Allows multiple feature column names in advanced trainer arguments so certain FFM trainers can support multiple multiple feature columns as explained in [#2179](https://github.com/dotnet/machinelearning/issues/2179) issue
43+
44+
* Added support for loading map from file through dataview by using ValueMapperTransformer
45+
([#2232](https://github.com/dotnet/machinelearning/pull/2232))
46+
- This provides support for additional scenarios like a Text/NLP scenario ([#747](https://github.com/dotnet/machinelearning/issues/747)) in TensorFlowTransform where model's expected input is vector of integers
47+
48+
* Added support for running benchmarks on .NET Framework in addition to .NET Core.
49+
([#2157](https://github.com/dotnet/machinelearning/pull/2157))
50+
- Benchmarks can be based on [Microsoft.ML.Benchmarks](https://github.com/dotnet/machinelearning/tree/master/test/Microsoft.ML.Benchmarks)
51+
- This fixes issues like [#1945](https://github.com/dotnet/machinelearning/issues/1945)
52+
53+
* Added Tensorflow unfrozen models support in GetModelSchema
54+
([#2112](https://github.com/dotnet/machinelearning/pull/2112))
55+
- Fixes issue [#2102](https://github.com/dotnet/machinelearning/issues/2102)
56+
57+
* Providing API for properly inspecting trees ([#2243](https://github.com/dotnet/machinelearning/pull/2243))
58+
59+
### Acknowledgements
60+
61+
Shoutout to [endintiers](https://github.com/endintiers),
62+
[hvitved](https://github.com/hvitved),
63+
[mareklinka](https://github.com/mareklinka), [kilick](https://github.com/kilick), and the [ML.NET](https://aka.ms/mlnet) team for their
64+
contributions as part of this release!
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using Microsoft.ML.Data;
4+
5+
namespace Microsoft.ML.Samples.Dynamic
6+
{
7+
public static class BootstrapSample
8+
{
9+
public static void Example()
10+
{
11+
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
12+
// as a catalog of available operations and as the source of randomness.
13+
var mlContext = new MLContext();
14+
15+
// Get a small dataset as an IEnumerable and them read it as ML.NET's data type.
16+
IEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorSample> enumerableOfData = SamplesUtils.DatasetUtils.GenerateBinaryLabelFloatFeatureVectorSamples(5);
17+
var data = mlContext.Data.ReadFromEnumerable(enumerableOfData);
18+
19+
// Look at the original dataset
20+
Console.WriteLine($"Label\tFeatures[0]");
21+
foreach (var row in enumerableOfData)
22+
{
23+
Console.WriteLine($"{row.Label}\t{row.Features[0]}");
24+
}
25+
Console.WriteLine();
26+
// Expected output:
27+
// Label Features[0]
28+
// True 1.017325
29+
// False 0.6326591
30+
// False 0.0326252
31+
// True 0.8426974
32+
// True 0.9947656
33+
34+
// Now take a bootstrap sample of this dataset to create a new dataset. The bootstrap is a resampling technique that
35+
// creates a training set of the same size by picking with replacement from the original dataset. With the bootstrap,
36+
// we expect that the resampled dataset will have about 63% of the rows of the original dataset (i.e. 1-e^-1), with some
37+
// rows represented more than once.
38+
// BootstrapSample is a streaming implementation of the boostrap that enables sampling from a dataset too large to hold in memory.
39+
// To enable streaming, BootstrapSample approximates the bootstrap by sampling each row according to a Poisson(1) distribution.
40+
// Note that this streaming approximation treats each row independently, thus the resampled dataset is not guaranteed to be the
41+
// same length as the input dataset.
42+
// Let's take a look at the behavior of the BootstrapSample by examining a few draws:
43+
for (int i = 0; i < 3; i++)
44+
{
45+
var resample = mlContext.Data.BootstrapSample(data, seed: (uint) i);
46+
47+
var enumerable = mlContext.CreateEnumerable<SamplesUtils.DatasetUtils.BinaryLabelFloatFeatureVectorSample>(resample, reuseRowObject: false);
48+
Console.WriteLine($"Label\tFeatures[0]");
49+
foreach (var row in enumerable)
50+
{
51+
Console.WriteLine($"{row.Label}\t{row.Features[0]}");
52+
}
53+
Console.WriteLine();
54+
}
55+
// Expected output:
56+
// Label Features[0]
57+
// True 1.017325
58+
// False 0.6326591
59+
// False 0.6326591
60+
// False 0.6326591
61+
// False 0.0326252
62+
// False 0.0326252
63+
// True 0.8426974
64+
// True 0.8426974
65+
66+
// Label Features[0]
67+
// True 1.017325
68+
// True 1.017325
69+
// False 0.6326591
70+
// False 0.6326591
71+
// False 0.0326252
72+
// False 0.0326252
73+
// False 0.0326252
74+
// True 0.9947656
75+
76+
// Label Features[0]
77+
// False 0.6326591
78+
// False 0.0326252
79+
// True 0.8426974
80+
// True 0.8426974
81+
// True 0.8426974
82+
}
83+
}
84+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using Microsoft.ML.Data;
4+
5+
namespace Microsoft.ML.Samples.Dynamic
6+
{
7+
/// <summary>
8+
/// Sample class showing how to use <see cref="DataOperationsCatalog.FilterByColumn"/>.
9+
/// </summary>
10+
public static class FilterByColumn
11+
{
12+
public static void Example()
13+
{
14+
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
15+
// as a catalog of available operations and as the source of randomness.
16+
var mlContext = new MLContext();
17+
18+
// Get a small dataset as an IEnumerable.
19+
IEnumerable<SamplesUtils.DatasetUtils.SampleTemperatureData> enumerableOfData = SamplesUtils.DatasetUtils.GetSampleTemperatureData(10);
20+
var data = mlContext.Data.ReadFromEnumerable(enumerableOfData);
21+
22+
// Before we apply a filter, examine all the records in the dataset.
23+
Console.WriteLine($"Date\tTemperature");
24+
foreach (var row in enumerableOfData)
25+
{
26+
Console.WriteLine($"{row.Date.ToString("d")}\t{row.Temperature}");
27+
}
28+
Console.WriteLine();
29+
// Expected output:
30+
// Date Temperature
31+
// 1/2/2012 36
32+
// 1/3/2012 36
33+
// 1/4/2012 34
34+
// 1/5/2012 35
35+
// 1/6/2012 35
36+
// 1/7/2012 39
37+
// 1/8/2012 40
38+
// 1/9/2012 35
39+
// 1/10/2012 30
40+
// 1/11/2012 29
41+
42+
// Filter the data by the values of the temperature. The lower bound is inclusive, the upper exclusive.
43+
var filteredData = mlContext.Data.FilterByColumn(data, columnName: "Temperature", lowerBound: 34, upperBound: 37);
44+
45+
// Look at the filtered data and observe that values outside [34,37) have been dropped.
46+
var enumerable = mlContext.CreateEnumerable<SamplesUtils.DatasetUtils.SampleTemperatureData>(filteredData, reuseRowObject: true);
47+
Console.WriteLine($"Date\tTemperature");
48+
foreach (var row in enumerable)
49+
{
50+
Console.WriteLine($"{row.Date.ToString("d")}\t{row.Temperature}");
51+
}
52+
53+
// Expected output:
54+
// Date Temperature
55+
// 1/2/2012 36
56+
// 1/3/2012 36
57+
// 1/4/2012 34
58+
// 1/5/2012 35
59+
// 1/6/2012 35
60+
// 1/9/2012 35
61+
}
62+
}
63+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using Microsoft.ML.Data;
4+
using Microsoft.ML.SamplesUtils;
5+
6+
namespace Microsoft.ML.Samples.Dynamic
7+
{
8+
using MulticlassClassificationExample = DatasetUtils.MulticlassClassificationExample;
9+
10+
/// <summary>
11+
/// Sample class showing how to use <see cref="DataOperationsCatalog.FilterByKeyColumnFraction"/>.
12+
/// </summary>
13+
public static class FilterByKeyColumnFraction
14+
{
15+
public static void Example()
16+
{
17+
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
18+
// as a catalog of available operations and as the source of randomness.
19+
var mlContext = new MLContext();
20+
21+
// Get a small dataset as an IEnumerable.
22+
IEnumerable<MulticlassClassificationExample> enumerableOfData = DatasetUtils.GenerateRandomMulticlassClassificationExamples(10);
23+
var data = mlContext.Data.ReadFromEnumerable(enumerableOfData);
24+
25+
// Convert the string labels to keys
26+
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label");
27+
var transformedData = pipeline.Fit(data).Transform(data);
28+
29+
// Before we apply a filter, examine all the records in the dataset.
30+
var enumerable = mlContext.CreateEnumerable<MulticlassWithKeyLabel>(transformedData, reuseRowObject: true);
31+
Console.WriteLine($"Label\tFeatures");
32+
foreach (var row in enumerable)
33+
{
34+
Console.WriteLine($"{row.Label}\t({string.Join(", ", row.Features)})");
35+
}
36+
Console.WriteLine();
37+
// Expected output:
38+
// 1 (0.7262433, 0.8173254, 0.7680227, 0.5581612, 0.2060332, 0.5588848, 0.9060271, 0.4421779, 0.9775497, 0.2737045)
39+
// 2 (0.4919063, 0.6673147, 0.8326591, 0.6695119, 1.182151, 0.230367, 1.06237, 1.195347, 0.8771811, 0.5145918)
40+
// 3 (1.216908, 1.248052, 1.391902, 0.4326252, 1.099942, 0.9262842, 1.334019, 1.08762, 0.9468155, 0.4811099)
41+
// 4 (0.7871246, 1.053327, 0.8971719, 1.588544, 1.242697, 1.362964, 0.6303943, 0.9810045, 0.9431419, 1.557455)
42+
// 1 (0.5051292, 0.7159725, 0.1189577, 0.2734515, 0.9070979, 0.7947656, 0.3371603, 0.4572088, 0.146825, 0.2213147)
43+
// 2 (0.6100733, 0.9187268, 0.8198303, 0.6879681, 0.3949134, 1.078192, 1.025423, 0.9353975, 1.058219, 0.879749)
44+
// 3 (1.024866, 0.6184068, 1.295362, 1.29644, 0.4865799, 1.238579, 0.5701429, 1.044115, 1.226814, 0.6191877)
45+
// 4 (1.599973, 1.081366, 1.252205, 1.319726, 1.409463, 0.7009354, 1.329094, 1.318451, 0.7255273, 1.505176)
46+
// 1 (0.1891238, 0.4768099, 0.5407953, 0.3255007, 0.6710367, 0.4683977, 0.8334969, 0.8092038, 0.7936304, 0.764506)
47+
// 2 (1.13754, 0.4949968, 0.7227853, 0.8633928, 0.532589, 0.4867224, 1.02061, 0.4225179, 0.3868716, 0.2685189)
48+
49+
// Now filter down to half the keys, choosing the lower half of values
50+
var filteredData = mlContext.Data.FilterByKeyColumnFraction(transformedData, columnName: "Label", lowerBound: 0, upperBound: 0.5);
51+
52+
// Look at the data and observe that values above 2 have been filtered out
53+
var filteredEnumerable = mlContext.CreateEnumerable<MulticlassWithKeyLabel>(filteredData, reuseRowObject: true);
54+
Console.WriteLine($"Label\tFeatures");
55+
foreach (var row in filteredEnumerable)
56+
{
57+
Console.WriteLine($"{row.Label}\t({string.Join(", ", row.Features)})");
58+
}
59+
// Expected output:
60+
// 1 (0.7262433, 0.8173254, 0.7680227, 0.5581612, 0.2060332, 0.5588848, 0.9060271, 0.4421779, 0.9775497, 0.2737045)
61+
// 2 (0.4919063, 0.6673147, 0.8326591, 0.6695119, 1.182151, 0.230367, 1.06237, 1.195347, 0.8771811, 0.5145918)
62+
// 1 (0.5051292, 0.7159725, 0.1189577, 0.2734515, 0.9070979, 0.7947656, 0.3371603, 0.4572088, 0.146825, 0.2213147)
63+
// 2 (0.6100733, 0.9187268, 0.8198303, 0.6879681, 0.3949134, 1.078192, 1.025423, 0.9353975, 1.058219, 0.879749)
64+
// 1 (0.1891238, 0.4768099, 0.5407953, 0.3255007, 0.6710367, 0.4683977, 0.8334969, 0.8092038, 0.7936304, 0.764506)
65+
// 2 (1.13754, 0.4949968, 0.7227853, 0.8633928, 0.532589, 0.4867224, 1.02061, 0.4225179, 0.3868716, 0.2685189)
66+
}
67+
68+
private class MulticlassWithKeyLabel
69+
{
70+
public uint Label { get; set; }
71+
[VectorType(10)]
72+
public float[] Features { get; set; }
73+
}
74+
}
75+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
using System.IO;
2+
using Microsoft.ML.Data;
3+
4+
namespace Microsoft.ML.Samples.Dynamic
5+
{
6+
public class ConvertToGrayscaleExample
7+
{
8+
// Sample that loads images from the file system, and converts them to grayscale.
9+
public static void ConvertToGrayscale()
10+
{
11+
var mlContext = new MLContext();
12+
13+
// Downloading a few images, and an images.tsv file, which contains a list of the files from the dotnet/machinelearning/test/data/images/.
14+
// If you inspect the fileSystem, after running this line, an "images" folder will be created, containing 4 images, and a .tsv file
15+
// enumerating the images.
16+
var imagesDataFile = SamplesUtils.DatasetUtils.DownloadImages();
17+
18+
// Preview of the content of the images.tsv file
19+
//
20+
// imagePath imageType
21+
// tomato.bmp tomato
22+
// banana.jpg banana
23+
// hotdog.jpg hotdog
24+
// tomato.jpg tomato
25+
26+
var data = mlContext.Data.CreateTextLoader(new TextLoader.Arguments()
27+
{
28+
Columns = new[]
29+
{
30+
new TextLoader.Column("ImagePath", DataKind.TX, 0),
31+
new TextLoader.Column("Name", DataKind.TX, 1),
32+
}
33+
}).Read(imagesDataFile);
34+
35+
var imagesFolder = Path.GetDirectoryName(imagesDataFile);
36+
// Image loading pipeline.
37+
var pipeline = mlContext.Transforms.LoadImages(imagesFolder, ("ImageObject", "ImagePath"))
38+
.Append(mlContext.Transforms.ConvertToGrayscale(("Grayscale", "ImageObject")));
39+
40+
var transformedData = pipeline.Fit(data).Transform(data);
41+
42+
// The transformedData IDataView contains the loaded images column, and the grayscaled column.
43+
// Preview of the transformedData
44+
var transformedDataPreview = transformedData.Preview();
45+
46+
// Preview of the content of the images.tsv file
47+
// The actual images, in the Grayscale column are of type System.Drawing.Bitmap.
48+
//
49+
// ImagePath Name ImageObject Grayscale
50+
// tomato.bmp tomato {System.Drawing.Bitmap} {System.Drawing.Bitmap}
51+
// banana.jpg banana {System.Drawing.Bitmap} {System.Drawing.Bitmap}
52+
// hotdog.jpg hotdog {System.Drawing.Bitmap} {System.Drawing.Bitmap}
53+
// tomato.jpg tomato {System.Drawing.Bitmap} {System.Drawing.Bitmap}
54+
55+
}
56+
}
57+
}

0 commit comments

Comments
 (0)