Skip to content

Added dynamic API snippets to cookbook #1538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 7, 2018
Merged
335 changes: 330 additions & 5 deletions docs/code/MlNetCookBook.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,12 +148,12 @@ Label Workclass education marital-status

This is how you can read this data:
```csharp
// Create a new environment for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var env = new LocalEnvironment();
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Create the reader: define the data columns and where to find them in the text file.
var reader = TextLoader.CreateReader(env, ctx => (
var reader = mlContext.Data.TextReader(ctx => (
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlContext [](start = 13, length = 9)

the examples of this file are actually tests in the CookbookExamples.cs. I believe we want to keep them in sync. #Closed

Copy link
Author

@shmoradims shmoradims Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the cookbook to show the preferred way to load data. It would be confusing for users to see both mlContext.Data.TextReader and TextLoader.CreateReader. I'd rather change the test instead.


In reply to: 231016690 [](ancestors = 231016690)

// A boolean column depicting the 'target label'.
IsOver50K: ctx.LoadBool(14),
// Three text columns.
Expand Down Expand Up @@ -303,6 +303,52 @@ private class InspectedRow
}
```

You can also use the dynamic API to create the equivalent of the previous pipeline.
```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();
Copy link
Contributor

@Zruty0 Zruty0 Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var [](start = 0, length = 3)

Please also add these snippets as tests to CookbookSamples: this way we ensure that they compile, and are updated when the API gets updated. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. already created a new set of tests for dynamic api.


In reply to: 231221675 [](ancestors = 231221675)


// Create the reader: define the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TextLoader [](start = 17, length = 10)

mlContext.Data.TextReader #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated them all.


In reply to: 231017639 [](ancestors = 231017639)

{
Column = new[] {
// A boolean column depicting the 'label'.
Copy link

@JRAlexander JRAlexander Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be DataKind.Text and DataKind.Boolean instead? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some equivalent names. We tend prefer the short two letter version.

TX = 11,
TXT = TX,
Text = TX,

BL = 12,
Bool = BL,


In reply to: 231213230 [](ancestors = 231213230)

Copy link

@JRAlexander JRAlexander Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because not every type has an equivalent name? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[John are you using codeflow? If not, your comments will come as different threads.]

I clarified with Pete. We are planning to rename DataKind to be closer to .NET types. We'll need to update all of these once that change happens.


In reply to: 231290740 [](ancestors = 231290740)

new TextLoader.Column("IsOver50k", DataKind.BL, 0),
// Three text columns.
new TextLoader.Column("Workclass", DataKind.TX, 1),
new TextLoader.Column("Education", DataKind.TX, 2),
new TextLoader.Column("MaritalStatus", DataKind.TX, 3)
},
// First line of the file is a header, not a data row.
HasHeader = true
});

// Start creating our processing pipeline. For now, let's just concatenate all the text columns
// together into one.
var dynamicPipeline = mlContext.Transforms.Concatenate("AllFeatures", "Education", "MaritalStatus");

// Let's verify that the data has been read correctly.
// First, we read the data file.
var data = reader.Read(dataPath);

// Fit our data pipeline and transform data with it.
var transformedData = dynamicPipeline.Fit(data).Transform(data);

// 'transformedData' is a 'promise' of data. Let's actually read it.
var someRows = transformedData
// Convert to an enumerable of user-defined type.
.AsEnumerable<InspectedRow>(mlContext, reuseRowObject: false)
// Take a couple values as an array.
.Take(4).ToArray();

// Extract the 'AllFeatures' column.
// This will give the entire dataset: make sure to only take several row
// in case the dataset is huge. The is similar to the static API, except
// you have to specify the column name and type.
var featureColumns = transformedData.GetColumn<string[]>(mlContext, "AllFeatures")
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string [](start = 47, length = 6)

does this work? I believe i had to do ReadOnlyMemoryOf #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprisingly yes. I added actual tests for it.


In reply to: 231018069 [](ancestors = 231018069)

.Take(20).ToArray();
```
## How do I train a regression model?

Generally, in order to train any model in ML.NET, you will go through three steps:
Expand Down Expand Up @@ -359,6 +405,45 @@ var learningPipeline = reader.MakeNewEstimator()
var model = learningPipeline.Fit(trainData);
```

You can also use the dynamic API to create the equivalent of the previous pipeline.
```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Step one: read the data as an IDataView.
// First, we define the reader: specify the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
{
Column = new[] {
// We read the first 11 values as a single float vector.
new TextLoader.Column("FeatureVector", DataKind.R4, 0, 10),

// Separately, read the target variable.
new TextLoader.Column("Target", DataKind.R4, 11),
},
// First line of the file is a header, not a data row.
HasHeader = true,
// Default separator is tab, but we need a semicolon.
Separator = ";"
});

// Now read the file (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
var trainData = reader.Read(trainDataPath);

// Step two: define the learning pipeline.

// We 'start' the pipeline with the output of the reader.
var dynamicPipeline =
// First 'normalize' the data (rescale to be
// between -1 and 1 for all examples), and then train the model.
Copy link
Contributor

@Zruty0 Zruty0 Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, and then train the model. [](start = 41, length = 27)

not needed #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed from both static and dynamic snippets.


In reply to: 231222135 [](ancestors = 231222135)

mlContext.Transforms.Normalize("FeatureVector")
// Add the SDCA regression trainer.
.Append(mlContext.Regression.Trainers.StochasticDualCoordinateAscent(label: "Target", features: "FeatureVector"))

// Step three. Train the pipeline.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Train the [](start = 15, length = 9)

Fit the training data to the pipeline. Did you want to do transform? #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the text.
Not doing any transforms. Just writing equivalent of the static version.


In reply to: 231018287 [](ancestors = 231018287)

var model = dynamicPipeline.Fit(trainData);
```
## How do I verify the model quality?

This is the first question that arises after you train the model: how good it actually is?
Expand Down Expand Up @@ -447,6 +532,46 @@ var learningPipeline = reader.MakeNewEstimator()
var model = learningPipeline.Fit(trainData).AsDynamic;
```

You can also use the dynamic API to create the equivalent of the previous pipeline.
```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Step one: read the data as an IDataView.
// First, we define the reader: specify the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
{
Column = new[] {
new TextLoader.Column("SepalLength", DataKind.R4, 0),
new TextLoader.Column("SepalWidth", DataKind.R4, 1),
new TextLoader.Column("PetalLength", DataKind.R4, 2),
new TextLoader.Column("PetalWidth", DataKind.R4, 3),
// Label: kind of iris.
new TextLoader.Column("Label", DataKind.TX, 4),
},
// Default separator is tab, but the dataset has comma.
Separator = ","
});

// Retrieve the training data.
var trainData = reader.Read(irisDataPath);

// Build the training pipeline.
var dynamicPipeline =
// Concatenate all the features together into one column 'Features'.
mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
// Note that the label is text, so it needs to be converted to key.
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest)
Copy link
Contributor

@Zruty0 Zruty0 Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueToKeyMappingEstimator [](start = 16, length = 26)

mlContext.Transforms.Categorical.MapValueToKey #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.


In reply to: 231225091 [](ancestors = 231225091)

// Use the multi-class SDCA model to predict the label using features.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent())
// Apply the inverse conversion from 'PredictedLabel' key back to string value.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key [](start = 58, length = 3)

column #Closed

.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

// Train the model.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Train the model. [](start = 3, length = 16)

technically you'd have to Transform if you say "train the model". #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping it the same as Pete's words from static pipeline.


In reply to: 231290065 [](ancestors = 231290065)

var model = dynamicPipeline.Fit(trainData);
```

Now, in order to use [schema comprehension](SchemaComprehension.md) for prediction, we define a pair of classes like following:
```csharp
private class IrisInput
Expand Down Expand Up @@ -538,7 +663,7 @@ var trainData = mlContext.CreateStreamingDataView(churnData);
// We apply our FastTree binary classifier to predict the 'HasChurned' label.

var dynamicLearningPipeline = mlContext.Transforms.Categorical.OneHotEncoding("DemographicCategory")
.Append(new ColumnConcatenatingEstimator(mlContext, "Features", "DemographicCategory", "LastVisits"))
.Append(mlContext.Transforms.Concatenate("Features", "DemographicCategory", "LastVisits"))
.Append(mlContext.BinaryClassification.Trainers.FastTree("HasChurned", "Features", numTrees: 20));

var dynamicModel = dynamicLearningPipeline.Fit(trainData);
Expand Down Expand Up @@ -701,6 +826,42 @@ var normalizedData = pipeline.Fit(trainData).Transform(trainData);
var meanVarValues = normalizedData.GetColumn(r => r.MeanVarNormalized).ToArray();
```

You can achieve the same results using the dynamic API.
```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Define the reader: specify the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
{
Column = new[] {
// The four features of the Iris dataset will be grouped together as one Features column.
new TextLoader.Column("Features", DataKind.R4, 0, 3),
// Label: kind of iris.
new TextLoader.Column("Label", DataKind.TX, 4),
},
// Default separator is tab, but the dataset has comma.
Separator = ","
});

// Read the training data.
var trainData = reader.Read(dataPath);

// Apply all kinds of standard ML.NET normalization to the raw features.
var pipeline =
mlContext.Transforms.Normalize(
new NormalizingEstimator.MinMaxColumn("Features", "MinMaxNormalized", fixZero: true),
new NormalizingEstimator.MeanVarColumn("Features", "MeanVarNormalized", fixZero: true),
new NormalizingEstimator.BinningColumn("Features", "BinNormalized", numBins: 256));

// Let's train our pipeline of normalizers, and then apply it to the same data.
var normalizedData = pipeline.Fit(trainData).Transform(trainData);

// Inspect one column of the resulting dataset.
var meanVarValues = normalizedData.GetColumn<float[]>(mlContext, "MeanVarNormalized").ToArray();
```

## How do I train my model on categorical data?

Generally speaking, *all ML.NET learners expect the features as a float vector*. So, if some of your data is not natively a float, you will need to convert to floats.
Expand Down Expand Up @@ -782,6 +943,63 @@ var fullLearningPipeline = learningPipeline
var model = fullLearningPipeline.Fit(data);
```

You can achieve the same results using the dynamic API.
```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Define the reader: specify the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
{
Column = new[] {
new TextLoader.Column("Label", DataKind.BL, 0),
// We will load all the categorical features into one vector column of size 8.
new TextLoader.Column("CategoricalFeatures", DataKind.TX, 1, 8),
// Similarly, load all numerical features into one vector of size 6.
new TextLoader.Column("NumericalFeatures", DataKind.R4, 9, 14),
// Let's also separately load the 'Workclass' column.
new TextLoader.Column("Workclass", DataKind.TX, 1),
},
HasHeader = true
});

// Read the data.
var data = reader.Read(dataPath);

// Inspect the categorical columns to check that they are correctly read.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

categorical columns [](start = 15, length = 19)

by looking at the first 10 records. #Closed

var catColumns = data.GetColumn<string[]>(mlContext, "CategoricalFeatures").Take(10).ToArray();

// Build several alternative featurization pipelines.
var dynamicPipeline =
// Convert each categorical feature into one-hot encoding independently.
mlContext.Transforms.Categorical.OneHotEncoding("CategoricalFeatures", "CategoricalOneHot")
// Convert all categorical features into indices, and build a 'word bag' of these.
.Append(mlContext.Transforms.Categorical.OneHotEncoding("CategoricalFeatures", "CategoricalOneHot", CategoricalTransform.OutputKind.Bag))
// One-hot encode the workclass column, then drop all the categories that have fewer than 10 instances in the train set.
.Append(mlContext.Transforms.Categorical.OneHotEncoding("Workclass", "WorkclassOneHot"))
.Append(new CountFeatureSelector(mlContext, "WorkclassOneHot", "WorkclassOneHotTrimmed", count: 10));

// Let's train our pipeline, and then apply it to the same data.
var transformedData = dynamicPipeline.Fit(data).Transform(data);

// Inspect some columns of the resulting dataset.
var categoricalBags = transformedData.GetColumn<float[]>(mlContext, "CategoricalBag").Take(10).ToArray();
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Take(10) [](start = 85, length = 9)

should we remove those #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good: otherwise we'll materializer the entire column of data, which may be large


In reply to: 231019258 [](ancestors = 231019258)

var workclasses = transformedData.GetColumn<float[]>(mlContext, "WorkclassOneHotTrimmed").Take(10).ToArray();

// Of course, if we want to train the model, we will need to compose a single float vector of all the features.
// Here's how we could do this:

var fullLearningPipeline = dynamicPipeline
// Concatenate two of the 3 categorical pipelines, and the numeric features.
.Append(mlContext.Transforms.Concatenate("Features", "NumericalFeatures", "CategoricalBag", "WorkclassOneHotTrimmed"))
// Now we're ready to train. We chose our FastTree trainer for this classification task.
.Append(mlContext.BinaryClassification.Trainers.FastTree(numTrees: 50));

// Train the model.
var model = fullLearningPipeline.Fit(data);
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Fit(data) [](start = 32, length = 10)

same here, Transform. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved offline.


In reply to: 231290355 [](ancestors = 231290355)

```

## How do I train my model on textual data?

Generally speaking, *all ML.NET learners expect the features as a float vector*. So, if some of your data is not natively a float, you will need to convert to floats.
Expand Down Expand Up @@ -852,6 +1070,60 @@ var embeddings = transformedData.GetColumn(x => x.Embeddings).Take(10).ToArray()
var unigrams = transformedData.GetColumn(x => x.BagOfWords).Take(10).ToArray();
```

You can achieve the same results using the dynamic API.
```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Define the reader: specify the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
{
Column = new[] {
new TextLoader.Column("IsToxic", DataKind.BL, 0),
new TextLoader.Column("Message", DataKind.TX, 1),
},
HasHeader = true
});

// Read the data.
var data = reader.Read(dataPath);

// Inspect the message texts that are read from the file.
var messageTexts = data.GetColumn<string>(mlContext, "Message").Take(20).ToArray();

// Apply various kinds of text operations supported by ML.NET.
var dynamicPipeline =
// One-stop shop to run the full text featurization.
mlContext.Transforms.Text.FeaturizeText("Message", "TextFeatures")

// Normalize the message for later transforms
.Append(mlContext.Transforms.Text.NormalizeText("Message", "NormalizedMessage"))

// NLP pipeline 1: bag of words.
.Append(new WordBagEstimator(mlContext, "NormalizedMessage", "BagOfWords"))
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WordBagEstimator [](start = 16, length = 16)

is this not part of the Text catalog? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


In reply to: 231019528 [](ancestors = 231019528)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the catalog yet. Senja will add it soon.


In reply to: 231225526 [](ancestors = 231225526,231019528)


// NLP pipeline 2: bag of bigrams, using hashes instead of dictionary indices.
.Append(new WordHashBagEstimator(mlContext, "NormalizedMessage", "BagOfBigrams",
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WordHashBagEstimator [](start = 16, length = 20)

i think this is part of the text catalog. #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the catalog yet. Senja will add it soon.


In reply to: 231019583 [](ancestors = 231019583)

ngramLength: 2, allLengths: false))

// NLP pipeline 3: bag of tri-character sequences with TF-IDF weighting.
.Append(mlContext.Transforms.Text.TokenizeCharacters("Message", "MessageChars"))
.Append(new WordBagEstimator(mlContext, "MessageChars", "BagOfTrichar",
ngramLength: 3, weighting: NgramTransform.WeightingCriteria.TfIdf))

// NLP pipeline 4: word embeddings.
.Append(mlContext.Transforms.Text.ExtractWordEmbeedings("NormalizedMessage", "Embeddings",
WordEmbeddingsTransform.PretrainedModelKind.GloVeTwitter25D));

// Let's train our pipeline, and then apply it to the same data.
// Note that even on a small dataset of 70KB the pipeline above can take up to a minute to completely train.
var transformedData = dynamicPipeline.Fit(data).Transform(data);

// Inspect some columns of the resulting dataset.
var embeddings = transformedData.GetColumn<float[]>(mlContext, "Embeddings").Take(10).ToArray();
var unigrams = transformedData.GetColumn<float[]>(mlContext, "BagOfWords").Take(10).ToArray();
```
## How do I train using cross-validation?

[Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is a useful technique for ML applications. It helps estimate the variance of the model quality from one run to another and also eliminates the need to extract a separate test set for evaluation.
Expand Down Expand Up @@ -916,6 +1188,59 @@ var microAccuracies = cvResults.Select(r => r.metrics.AccuracyMicro);
Console.WriteLine(microAccuracies.Average());
```

You can achieve the same results using the dynamic API.
```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Step one: read the data as an IDataView.
// First, we define the reader: specify the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
{
Column = new[] {
// We read the first 11 values as a single float vector.
new TextLoader.Column("SepalLength", DataKind.R4, 0),
new TextLoader.Column("SepalWidth", DataKind.R4, 1),
new TextLoader.Column("PetalLength", DataKind.R4, 2),
new TextLoader.Column("PetalWidth", DataKind.R4, 3),
// Label: kind of iris.
new TextLoader.Column("Label", DataKind.TX, 4),
},
// Default separator is tab, but the dataset has comma.
Separator = ","
});

// Read the data.
var data = reader.Read(dataPath);

// Build the training pipeline.
var dynamicPipeline =
// Concatenate all the features together into one column 'Features'.
mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
// Note that the label is text, so it needs to be converted to key.
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest)
// Use the multi-class SDCA model to predict the label using features.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent());
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Append(m [](start = 4, length = 9)

does it need KeyToVal at the end? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the static pipeline version above it doesn't have the KeyToVal for this example, so I'm not adding it. This one is looking at averaged accuracies, so the string value of the label is not as important.
The example in "How do I use the model to make one prediction" add the ToKey, which looks at the transformed data.


In reply to: 231019826 [](ancestors = 231019826)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are looking for an example of how to multiclass , and they use this, the users will have trouble making sense of their predictions. I think we have had several bus along those lines.


In reply to: 231210991 [](ancestors = 231210991,231019826)


// Split the data 90:10 into train and test sets, train and evaluate.
var (trainData, testData) = mlContext.MulticlassClassification.TrainTestSplit(data, testFraction: 0.1);

// Train the model.
var model = dynamicPipeline.Fit(trainData);
// Compute quality metrics on the test set.
var metrics = mlContext.MulticlassClassification.Evaluate(model.Transform(testData));
Console.WriteLine(metrics.AccuracyMicro);

// Now run the 5-fold cross-validation experiment, using the same pipeline.
var cvResults = mlContext.MulticlassClassification.CrossValidate(data, dynamicPipeline, numFolds: 5);

// The results object is an array of 5 elements. For each of the 5 folds, we have metrics, model and scored test data.
// Let's compute the average micro-accuracy.
var microAccuracies = cvResults.Select(r => r.metrics.AccuracyMicro);
Console.WriteLine(microAccuracies.Average());

```
## Can I mix and match static and dynamic pipelines?

Yes, we can have both of them in our codebase. The static pipelines are just a statically-typed way to build dynamic pipelines.
Expand Down