Skip to content

Added dynamic API snippets to cookbook #1538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 7, 2018

Conversation

shmoradims
Copy link

Added the dynamic API equivalent for all the snippets that were using static API, except for the snippet that uses onFit, which is not supported by dynamic API yet.

@shmoradims shmoradims changed the title Updated cookbook with dynamic API Added dynamic API snippets to cookbook Nov 6, 2018

// Create the reader: define the data columns and where to find them in the text file.
var reader = TextLoader.CreateReader(env, ctx => (
var reader = mlContext.Data.TextReader(ctx => (
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlContext [](start = 13, length = 9)

the examples of this file are actually tests in the CookbookExamples.cs. I believe we want to keep them in sync. #Closed

Copy link
Author

@shmoradims shmoradims Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the cookbook to show the preferred way to load data. It would be confusing for users to see both mlContext.Data.TextReader and TextLoader.CreateReader. I'd rather change the test instead.


In reply to: 231016690 [](ancestors = 231016690)

var mlContext = new MLContext();

// Create the reader: define the data columns and where to find them in the text file.
var reader = new TextLoader(mlContext, new TextLoader.Arguments
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TextLoader [](start = 17, length = 10)

mlContext.Data.TextReader #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated them all.


In reply to: 231017639 [](ancestors = 231017639)

// This will give the entire dataset: make sure to only take several row
// in case the dataset is huge. The is similar to the static API, except
// you have to specify the column name and type.
var featureColumns = transformedData.GetColumn<string[]>(mlContext, "AllFeatures")
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string [](start = 47, length = 6)

does this work? I believe i had to do ReadOnlyMemoryOf #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprisingly yes. I added actual tests for it.


In reply to: 231018069 [](ancestors = 231018069)

// Add the SDCA regression trainer.
.Append(mlContext.Regression.Trainers.StochasticDualCoordinateAscent(label: "Target", features: "FeatureVector"))

// Step three. Train the pipeline.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Train the [](start = 15, length = 9)

Fit the training data to the pipeline. Did you want to do transform? #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the text.
Not doing any transforms. Just writing equivalent of the static version.


In reply to: 231018287 [](ancestors = 231018287)

.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest)
// Use the multi-class SDCA model to predict the label using features.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent())
// Apply the inverse conversion from 'PredictedLabel' key back to string value.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key [](start = 58, length = 3)

column #Closed

// Read the data.
var data = reader.Read(dataPath);

// Inspect the categorical columns to check that they are correctly read.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

categorical columns [](start = 15, length = 19)

by looking at the first 10 records. #Closed

var transformedData = dynamicPipeline.Fit(data).Transform(data);

// Inspect some columns of the resulting dataset.
var categoricalBags = transformedData.GetColumn<float[]>(mlContext, "CategoricalBag").Take(10).ToArray();
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Take(10) [](start = 85, length = 9)

should we remove those #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good: otherwise we'll materializer the entire column of data, which may be large


In reply to: 231019258 [](ancestors = 231019258)

.Append(mlContext.Transforms.Text.NormalizeText("Message", "NormalizedMessage"))

// NLP pipeline 1: bag of words.
.Append(new WordBagEstimator(mlContext, "NormalizedMessage", "BagOfWords"))
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WordBagEstimator [](start = 16, length = 16)

is this not part of the Text catalog? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


In reply to: 231019528 [](ancestors = 231019528)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the catalog yet. Senja will add it soon.


In reply to: 231225526 [](ancestors = 231225526,231019528)

.Append(new WordBagEstimator(mlContext, "NormalizedMessage", "BagOfWords"))

// NLP pipeline 2: bag of bigrams, using hashes instead of dictionary indices.
.Append(new WordHashBagEstimator(mlContext, "NormalizedMessage", "BagOfBigrams",
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WordHashBagEstimator [](start = 16, length = 20)

i think this is part of the text catalog. #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the catalog yet. Senja will add it soon.


In reply to: 231019583 [](ancestors = 231019583)

// Note that the label is text, so it needs to be converted to key.
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest)
// Use the multi-class SDCA model to predict the label using features.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent());
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Append(m [](start = 4, length = 9)

does it need KeyToVal at the end? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the static pipeline version above it doesn't have the KeyToVal for this example, so I'm not adding it. This one is looking at averaged accuracies, so the string value of the label is not as important.
The example in "How do I use the model to make one prediction" add the ToKey, which looks at the transformed data.


In reply to: 231019826 [](ancestors = 231019826)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are looking for an example of how to multiclass , and they use this, the users will have trouble making sense of their predictions. I think we have had several bus along those lines.


In reply to: 231210991 [](ancestors = 231210991,231019826)

Copy link

@JRAlexander JRAlexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmoradims @Zruty0 - Will the following be updated with dynamic examples?

  • How do I verify the model quality?
  • How do I load data from multiple files?
  • What if my training data is not in a text file?

var reader = mlContext.Data.TextReader(new TextLoader.Arguments
{
Column = new[] {
// A boolean column depicting the 'label'.
Copy link

@JRAlexander JRAlexander Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be DataKind.Text and DataKind.Boolean instead? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some equivalent names. We tend prefer the short two letter version.

TX = 11,
TXT = TX,
Text = TX,

BL = 12,
Bool = BL,


In reply to: 231213230 [](ancestors = 231213230)

Copy link

@JRAlexander JRAlexander Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because not every type has an equivalent name? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[John are you using codeflow? If not, your comments will come as different threads.]

I clarified with Pete. We are planning to rename DataKind to be closer to .NET types. We'll need to update all of these once that change happens.


In reply to: 231290740 [](ancestors = 231290740)

```csharp
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();
Copy link
Contributor

@Zruty0 Zruty0 Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var [](start = 0, length = 3)

Please also add these snippets as tests to CookbookSamples: this way we ensure that they compile, and are updated when the API gets updated. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. already created a new set of tests for dynamic api.


In reply to: 231221675 [](ancestors = 231221675)

// We 'start' the pipeline with the output of the reader.
var dynamicPipeline =
// First 'normalize' the data (rescale to be
// between -1 and 1 for all examples), and then train the model.
Copy link
Contributor

@Zruty0 Zruty0 Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, and then train the model. [](start = 41, length = 27)

not needed #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed from both static and dynamic snippets.


In reply to: 231222135 [](ancestors = 231222135)

// Concatenate all the features together into one column 'Features'.
mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
// Note that the label is text, so it needs to be converted to key.
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest)
Copy link
Contributor

@Zruty0 Zruty0 Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueToKeyMappingEstimator [](start = 16, length = 26)

mlContext.Transforms.Categorical.MapValueToKey #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.


In reply to: 231225091 [](ancestors = 231225091)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

// Apply the inverse conversion from 'PredictedLabel' column back to string value.
.Append(mlContext.Transforms.Conversion.MapKeyToValue(("PredictedLabel", "Data")));

// Train the model.
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Train the model. [](start = 3, length = 16)

technically you'd have to Transform if you say "train the model". #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping it the same as Pete's words from static pipeline.


In reply to: 231290065 [](ancestors = 231290065)

.Append(mlContext.BinaryClassification.Trainers.FastTree(numTrees: 50));

// Train the model.
var model = fullLearningPipeline.Fit(data);
Copy link
Member

@sfilipi sfilipi Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Fit(data) [](start = 32, length = 10)

same here, Transform. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved offline.


In reply to: 231290355 [](ancestors = 231290355)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shmoradims shmoradims merged commit 88ad2c2 into dotnet:master Nov 7, 2018
@shmoradims shmoradims deleted the update_cookbook branch December 12, 2018 22:07
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants