Skip to content

Commit b5f9592

Browse files
committed
Merge branch 'master' into docsSamples
2 parents 2236dbe + 875ef00 commit b5f9592

File tree

72 files changed

+3262
-348
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+3262
-348
lines changed

.vsts-dotnet-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ phases:
1010
name: Windows_NT
1111
buildScript: build.cmd
1212
queue:
13-
name: DotNetCore-Windows
13+
name: Hosted VS2017
1414

1515
- template: /build/ci/phase-template.yml
1616
parameters:

build/Dependencies.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
<MicrosoftCodeAnalysisCSharpVersion>2.9.0</MicrosoftCodeAnalysisCSharpVersion>
1818
<MicrosoftCSharpVersion>4.5.0</MicrosoftCSharpVersion>
1919
<SystemCompositionVersion>1.2.0</SystemCompositionVersion>
20-
<MicrosoftMLScoring>1.0.4-dev48825</MicrosoftMLScoring>
20+
<MicrosoftMLScoring>1.1.0</MicrosoftMLScoring>
2121
<SystemIOFileSystemAccessControl>4.5.0</SystemIOFileSystemAccessControl>
2222
<SystemSecurityPrincipalWindows>4.5.0</SystemSecurityPrincipalWindows>
2323
</PropertyGroup>

docs/code/MlNetCookBook.md

Lines changed: 985 additions & 0 deletions
Large diffs are not rendered by default.

docs/code/MlNetHighLevelConcepts.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# ML.NET high-level concepts
2+
3+
In this document, we give a brief overview of the ML.NET high-level concepts. This document is mainly intended to describe the *model training* scenarios in ML.NET, since not all these concepts are relevant for the more simple scenario of *prediction with existing model*.
4+
5+
## List of high-level concepts
6+
7+
This document is going to cover the following ML.NET concepts:
8+
9+
- *Data*, represented as an `IDataView` interface.
10+
- In ML.NET, data is very similar to a SQL view: it's a lazily-evaluated, immutable, cursorable, heterogenous, schematized dataset.
11+
- An excellent document about the data interface is [IDataView Design Principles](IDataViewDesignPrinciples.md).
12+
- *Transformer*, represented as `ITransformer` interface.
13+
- In one sentence, a transformer is a component that takes data, does some work on it, and return new 'transformed' data.
14+
- For example, you can think of a machine learning model as a transformer that takes features and returns predictions.
15+
- Another example, 'text tokenizer' would take a single text column and output a vector column with individual 'words' extracted out of the texts.
16+
- *Data reader*, represented as an `IDataReader<T>` interface.
17+
- The data reader is ML.NET component to 'create' data: it takes an instance of `T` and returns data out of it.
18+
- For example, a *TextLoader* is an `IDataReader<FileSource>`: it takes the file source and produces data.
19+
- *Estimator*, represented as an `IEstimator<T>` interface.
20+
- This is an object that learns from data. The result of the learning is a *transformer*.
21+
- You can think of a machine learning *algorithm* as an estimator that learns on data and produces a machine learning *model* (which is a transformer).
22+
- *Prediction function*, represented as a `PredictionFunction<TSrc, TDst>` class.
23+
- The prediction function can be seen as a machine that applies a transformer to one 'row', such as at prediction time.
24+
25+
## Data
26+
27+
In ML.NET, data is very similar to a SQL view: it's a lazily-evaluated, cursorable, heterogenous, schematized dataset.
28+
29+
- It has *Schema* (an instance of an `ISchema` interface), that contains the information about the data view's columns.
30+
- Each column has a *Name*, a *Type*, and an arbitrary set of *metadata* associated with it.
31+
- It is important to note that one of the types is the `vector<T, N>` type, which means that the column's values are *vectors of items of type T, with the size of N*. This is a recommended way to represent multi-dimensional data associated with every row, like pixels in an image, or tokens in a text.
32+
- The column's *metadata* contains information like 'slot names' of a vector column and suchlike. The metadata itself is actually represented as another one-row *data*, that is unique to each column.
33+
- The data view is a source of *cursors*. Think SQL cursors: a cursor is an object that iterates through the data, one row at a time, and presents the available data.
34+
- Naturally, data can have as many active cursors over it as needed: since data itself is immutable, cursors are truly independent.
35+
- Note that cursors typically access only a subset of columns: for efficiency, we do not compute the values of columns that are not 'needed' by the cursor.
36+
37+
## Transformer
38+
39+
A transformer is a component that takes data, does some work on it, and return new 'transformed' data.
40+
41+
Here's the interface of `ITransformer`:
42+
```c#
43+
public interface ITransformer
44+
{
45+
IDataView Transform(IDataView input);
46+
ISchema GetOutputSchema(ISchema inputSchema);
47+
}
48+
```
49+
50+
As you can see, the transformer can `Transform` an input data to produce the output data. The other method, `GetOutputSchema`, is a mechanism of *schema propagation*: it allows you to see how the output data will look like for a given shape of the input data without actually performing the transformation.
51+
52+
Most transformers in ML.NET tend to operate on one *input column* at a time, and produce the *output column*. For example a `new HashTransformer("foo", "bar")` would take the values from column "foo", hash them and put them into column "bar".
53+
54+
It is also common that the input and output column names are the same. In this case, the old column is 'replaced' with the new one. For example, a `new HashTransformer("foo")` would take the values from column "foo", hash them and 'put them back' into "foo".
55+
56+
Any transformer will, of course, produce a new data view when `Transform` is called: remember, data views are immutable.
57+
58+
Another important consideration is that, because data is lazily evaluated, *transformers are lazy too*. Essentially, after you call
59+
```c#
60+
var newData = transformer.Transform(oldData)
61+
```
62+
no actual computation will happen: only after you get a cursor from `newData` and start consuming the value will `newData` invoke the `transformer`'s transformation logic (and even that only if `transformer` in question is actually needed to produce the requested columns).
63+
64+
### Transformer chains
65+
66+
A useful property of a transformer is that *you can phrase a sequential application of transformers as yet another transformer*:
67+
68+
```c#
69+
var fullTransformer = transformer1.Append(transformer2).Append(transformer3);
70+
```
71+
72+
We utilize this property a lot in ML.NET: typically, the trained ML.NET model is a 'chain of transformers', which is, for all intents and purposes, a *transformer*.
73+
74+
## Data reader
75+
76+
The data reader is ML.NET component to 'create' data: it takes an instance of `T` and returns data out of it.
77+
78+
Here's the exact interface of `IDataReader<T>`:
79+
```c#
80+
public interface IDataReader<in TSource>
81+
{
82+
IDataView Read(TSource input);
83+
ISchema GetOutputSchema();
84+
}
85+
```
86+
As you can see, the reader is capable of reading data (potentially multiple times, and from different 'inputs'), but the resulting data will always have the same schema, denoted by `GetOutputSchema`.
87+
88+
An interesting property to note is that you can create a new data reader by 'attaching' a transformer to an existing data reader. This way you can have 'reader' with transformation behavior baked in:
89+
```c#
90+
var newReader = reader.Append(transformer1).Append(transformer2)
91+
```
92+
93+
Another similarity to transformers is that, since data is lazily evaluated, *readers are lazy*: no (or minimal) actual 'reading' happens when you call `dataReader.Read()`: only when a cursor is requested on the resulting data does the reader begin to work.
94+
95+
## Estimator
96+
97+
The *estimator* is an object that learns from data. The result of the learning is a *transformer*.
98+
Here is the interface of `IEstimator<T>`:
99+
```c#
100+
public interface IEstimator<out TTransformer>
101+
where TTransformer : ITransformer
102+
{
103+
TTransformer Fit(IDataView input);
104+
SchemaShape GetOutputSchema(SchemaShape inputSchema);
105+
}
106+
```
107+
108+
You can easily imagine how *a sequence of estimators can be phrased as an estimator* of its own. In ML.NET, we rely on this property to create 'learning pipelines' that chain together different estimators:
109+
110+
```c#
111+
var env = new LocalEnvironment(); // Initialize the ML.NET environment.
112+
var estimator = new ConcatEstimator(env, "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
113+
.Append(new ToKeyEstimator(env, "Label"))
114+
.Append(new SdcaMultiClassTrainer(env, "Features", "Label")) // This is the actual 'machine learning algorithm'.
115+
.Append(new ToValueEstimator(env, "PredictedLabel"));
116+
117+
var endToEndModel = estimator.Fit(data); // This now contains all the transformers that were used at training.
118+
```
119+
120+
One important property of estimators is that *estimators are eager, not lazy*: every call to `Fit` is causing 'learning' to happen, which is potentially a time-consuming operation.
121+
122+
## Prediction function
123+
124+
The prediction function can be seen as a machine that applies a transformer to one 'row', such as at prediction time.
125+
126+
Once we obtain the model (which is a *transformer* that we either trained via `Fit()`, or loaded from somewhere), we can use it to make 'predictions' using the normal calls to `model.Transform(data)`. However, when we use this model in a real life scenario, we often don't have a whole 'batch' of examples to predict on. Instead, we have one example at a time, and we need to make timely predictions on them immediately.
127+
128+
Of course, we can reduce this to the batch prediction:
129+
- Create a data view with exactly one row.
130+
- Call `model.Transform(data)` to obtain the 'predicted data view'.
131+
- Get a cursor over the resulting data.
132+
- Advance the cursor one step to get to the first (and only) row.
133+
- Extract the predicted values out of it.
134+
135+
The above algorithm can be implemented using the [schema comprehension](SchemaComprehension.md), with two user-defined objects `InputExample` and `OutputPrediction` as follows:
136+
137+
```c#
138+
var inputData = env.CreateDataView(new InputExample[] { example });
139+
var outputData = model.Transform(inputData);
140+
var output = outputData.AsEnumerable<OutputPrediction>(env, reuseRowObject: false).Single();
141+
```
142+
143+
But this would be cumbersome, and would incur performance costs.
144+
Instead, we have a 'prediction function' object that performs the same work, but faster and more convenient, via an extension method `MakePredictionFunction`:
145+
146+
```c#
147+
var predictionFunc = model.MakePredictionFunction<InputExample, OutputPrediction>(env);
148+
var output = predictionFunc.Predict(example);
149+
```
150+
151+
The same `predictionFunc` can (and should!) be used multiple times, thus amortizing the initial cost of `MakePredictionFunction` call.
152+
153+
The prediction function is *not re-entrant / thread-safe*: if you want to conduct predictions simultaneously with multiple threads, you need to have a prediction function per thread.

docs/images/DCG.png

1.2 KB
Loading

docs/images/NDCG.png

1.03 KB
Loading

docs/release-notes/0.6/release-0.6.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# ML.NET 0.6 Release Notes
2+
3+
Today we are excited to release ML.NET 0.6, the biggest release of ML.NET ever (or at least since 0.5)! This release unveils the first iteration of new ML.NET APIs. These APIs enable various new tasks that weren't possible with the old APIs. Furthermore, we have added a transform to get predictions from [ONNX](http://onnx.ai/) models, expanded functionality of the TensorFlow scoring transform, aligned various ML.NET types with .NET types, and more!
4+
5+
### Installation
6+
7+
ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
8+
Core
9+
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
10+
for more details.
11+
12+
You can install ML.NET NuGet from the CLI using:
13+
```
14+
dotnet add package Microsoft.ML
15+
```
16+
17+
From package manager:
18+
```
19+
Install-Package Microsoft.ML
20+
```
21+
22+
### Release Notes
23+
24+
Below are some of the highlights from this release.
25+
26+
* New APIs for ML.NET
27+
28+
* While the `LearningPipeline` APIs that were released with ML.NET 0.1 were easy to get started with, they had obvious limitations in functionality. Certain tasks that were possible with the internal version of ML.NET like inspecting model weights, creating a transform-only pipeline, and training from an initial predictor could not be done with `LearningPipeline`.
29+
* The important concepts for understanding the new API are introduced [here](https://github.com/dotnet/machinelearning/blob/3cdd3c8b32705e91dcf46c429ee34196163af6da/docs/code/MlNetHighLevelConcepts.md).
30+
* A cookbook that shows how to use these APIs for a variety of existing and new scenarios can be found [here](https://github.com/dotnet/machinelearning/blob/3cdd3c8b32705e91dcf46c429ee34196163af6da/docs/code/MlNetCookBook.md).
31+
* These APIs are still evolving, so we would love to hear any feedback or questions.
32+
* The `LearningPipeline` APIs have moved to the `Microsoft.ML.Legacy` namespace.
33+
34+
* Added a transform to score ONNX models ([#942](https://github.com/dotnet/machinelearning/pull/942))
35+
36+
* [ONNX](http://onnx.ai/) is an open model format that enables developers to more easily move models between different tools.
37+
* There are various [collections of ONNX models](https://github.com/onnx/models) that can be used for tasks like image classification, emotion recognition, and object detection.
38+
* The [ONNX transform](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.transforms.onnxtransform?view=ml-dotnet) in ML.NET enables providing some data to an existing ONNX model (such as the models above) and getting the score (prediction) from it.
39+
40+
* Enhanced TensorFlow model scoring functionality ([#853](https://github.com/dotnet/machinelearning/pull/853), [#862](https://github.com/dotnet/machinelearning/pull/862))
41+
42+
* The [TensorFlow scoring transform](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.transforms.tensorflowtransform?view=ml-dotnet) released in ML.NET 0.5 enabled using 'frozen' TensorFlow models. In ML.NET 0.6, 'saved' TensorFlow models can also be used.
43+
* An API was added to extract information about the nodes in a TensorFlow model. This can help identifying the input and output of a TensorFlow model. Example usage can be found [here](https://github.com/dotnet/machinelearning/blob/3cdd3c8b32705e91dcf46c429ee34196163af6da/src/Microsoft.ML.DnnAnalyzer/Microsoft.ML.DnnAnalyzer/DnnAnalyzer.cs).
44+
45+
* Replaced ML.NET's Dv type system with .NET's standard type system ([#863](https://github.com/dotnet/machinelearning/pull/863))
46+
47+
* ML.NET previously had its own type system which helped it more efficiently deal with things like missing values (a common case in ML). This type system required users to work with types like `DvText`, `DvBool`, `DvInt4`, etc.
48+
* This update replaces the Dv type system with .NET's standard type system to make ML.NET easier to use and to take advantage of innovation in .NET.
49+
* One effect of this change is that only floats and doubles have missing values, represented by NaN. More information can be found [here](https://github.com/dotnet/machinelearning/issues/673).
50+
51+
* Up to ~200x speedup in prediction engine performance for single records ([#973](https://github.com/dotnet/machinelearning/pull/973))
52+
53+
* Improved approach to dependency injection enables ML.NET to be used in additional .NET app models without messy workarounds (e.g. Azure Functions) ([#970](https://github.com/dotnet/machinelearning/pull/970), [#1022](https://github.com/dotnet/machinelearning/pull/1022))
54+
55+
Additional issues closed in this milestone can be found
56+
[here](https://github.com/dotnet/machinelearning/milestone/5?closed=1).
57+
58+
### Acknowledgements
59+
60+
Shoutout to [feiyun0112](https://github.com/feiyun0112), [jwood803](https://github.com/jwood803), [adamsitnik](https://github.com/adamsitnik), and the ML.NET team for their contributions as part of this release!

src/Microsoft.ML.Console/Microsoft.ML.Console.csproj

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
<Project Sdk="Microsoft.NET.Sdk">
22

33
<PropertyGroup>
4-
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
5-
<DefineConstants>CORECLR</DefineConstants>
6-
<TargetFramework>netcoreapp2.0</TargetFramework>
4+
<TargetFramework>netcoreapp2.1</TargetFramework>
75
<OutputType>Exe</OutputType>
86
<AssemblyName>MML</AssemblyName>
97
<StartupObject>Microsoft.ML.Runtime.Tools.Console.Console</StartupObject>
@@ -20,18 +18,21 @@
2018
<ProjectReference Include="..\Microsoft.ML.KMeansClustering\Microsoft.ML.KMeansClustering.csproj" />
2119
<ProjectReference Include="..\Microsoft.ML.LightGBM\Microsoft.ML.LightGBM.csproj" />
2220
<ProjectReference Include="..\Microsoft.ML.Maml\Microsoft.ML.Maml.csproj" />
21+
<ProjectReference Include="..\Microsoft.ML.OnnxTransform\Microsoft.ML.OnnxTransform.csproj" />
2322
<ProjectReference Include="..\Microsoft.ML.Onnx\Microsoft.ML.Onnx.csproj" />
2423
<ProjectReference Include="..\Microsoft.ML.PCA\Microsoft.ML.PCA.csproj" />
2524
<ProjectReference Include="..\Microsoft.ML.PipelineInference\Microsoft.ML.PipelineInference.csproj" />
2625
<ProjectReference Include="..\Microsoft.ML.ResultProcessor\Microsoft.ML.ResultProcessor.csproj" />
2726
<ProjectReference Include="..\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
2827
<ProjectReference Include="..\Microsoft.ML.Sweeper\Microsoft.ML.Sweeper.csproj" />
28+
<ProjectReference Include="..\Microsoft.ML.TensorFlow\Microsoft.ML.TensorFlow.csproj" />
2929
<ProjectReference Include="..\Microsoft.ML.Transforms\Microsoft.ML.Transforms.csproj" />
3030

3131
<NativeAssemblyReference Include="FastTreeNative" />
3232
<NativeAssemblyReference Include="CpuMathNative" />
3333
<NativeAssemblyReference Include="FactorizationMachineNative" />
3434
<NativeAssemblyReference Include="LdaNative" />
35+
<NativeAssemblyReference Include="SymSgdNative" />
3536
</ItemGroup>
3637

3738
</Project>

src/Microsoft.ML.Data/Evaluators/EvaluatorStaticExtensions.cs

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,5 +200,40 @@ public static RegressionEvaluator.Result Evaluate<T>(
200200
args.LossFunction = new TrivialRegressionLossFactory(loss);
201201
return new RegressionEvaluator(env, args).Evaluate(data.AsDynamic, labelName, scoreName);
202202
}
203+
204+
/// <summary>
205+
/// Evaluates scored ranking data.
206+
/// </summary>
207+
/// <typeparam name="T">The shape type for the input data.</typeparam>
208+
/// <typeparam name="TVal">The type of data, before being converted to a key.</typeparam>
209+
/// <param name="ctx">The ranking context.</param>
210+
/// <param name="data">The data to evaluate.</param>
211+
/// <param name="label">The index delegate for the label column.</param>
212+
/// <param name="groupId">The index delegate for the groupId column. </param>
213+
/// <param name="score">The index delegate for predicted score column.</param>
214+
/// <returns>The evaluation metrics.</returns>
215+
public static RankerEvaluator.Result Evaluate<T, TVal>(
216+
this RankerContext ctx,
217+
DataView<T> data,
218+
Func<T, Scalar<float>> label,
219+
Func<T, Key<uint, TVal>> groupId,
220+
Func<T, Scalar<float>> score)
221+
{
222+
Contracts.CheckValue(data, nameof(data));
223+
var env = StaticPipeUtils.GetEnvironment(data);
224+
Contracts.AssertValue(env);
225+
env.CheckValue(label, nameof(label));
226+
env.CheckValue(groupId, nameof(groupId));
227+
env.CheckValue(score, nameof(score));
228+
229+
var indexer = StaticPipeUtils.GetIndexer(data);
230+
string labelName = indexer.Get(label(indexer.Indices));
231+
string scoreName = indexer.Get(score(indexer.Indices));
232+
string groupIdName = indexer.Get(groupId(indexer.Indices));
233+
234+
var args = new RankerEvaluator.Arguments() { };
235+
236+
return new RankerEvaluator(env, args).Evaluate(data.AsDynamic, labelName, groupIdName, scoreName);
237+
}
203238
}
204239
}

0 commit comments

Comments
 (0)