-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added dynamic API snippets to cookbook #1538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
// Create the reader: define the data columns and where to find them in the text file. | ||
var reader = TextLoader.CreateReader(env, ctx => ( | ||
var reader = mlContext.Data.TextReader(ctx => ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlContext [](start = 13, length = 9)
the examples of this file are actually tests in the CookbookExamples.cs. I believe we want to keep them in sync. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the cookbook to show the preferred way to load data. It would be confusing for users to see both mlContext.Data.TextReader and TextLoader.CreateReader. I'd rather change the test instead.
In reply to: 231016690 [](ancestors = 231016690)
docs/code/MlNetCookBook.md
Outdated
var mlContext = new MLContext(); | ||
|
||
// Create the reader: define the data columns and where to find them in the text file. | ||
var reader = new TextLoader(mlContext, new TextLoader.Arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TextLoader [](start = 17, length = 10)
mlContext.Data.TextReader #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This will give the entire dataset: make sure to only take several row | ||
// in case the dataset is huge. The is similar to the static API, except | ||
// you have to specify the column name and type. | ||
var featureColumns = transformedData.GetColumn<string[]>(mlContext, "AllFeatures") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string [](start = 47, length = 6)
does this work? I believe i had to do ReadOnlyMemoryOf #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/code/MlNetCookBook.md
Outdated
// Add the SDCA regression trainer. | ||
.Append(mlContext.Regression.Trainers.StochasticDualCoordinateAscent(label: "Target", features: "FeatureVector")) | ||
|
||
// Step three. Train the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Train the [](start = 15, length = 9)
Fit the training data to the pipeline. Did you want to do transform? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the text.
Not doing any transforms. Just writing equivalent of the static version.
In reply to: 231018287 [](ancestors = 231018287)
docs/code/MlNetCookBook.md
Outdated
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest) | ||
// Use the multi-class SDCA model to predict the label using features. | ||
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent()) | ||
// Apply the inverse conversion from 'PredictedLabel' key back to string value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key [](start = 58, length = 3)
column #Closed
docs/code/MlNetCookBook.md
Outdated
// Read the data. | ||
var data = reader.Read(dataPath); | ||
|
||
// Inspect the categorical columns to check that they are correctly read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
categorical columns [](start = 15, length = 19)
by looking at the first 10 records. #Closed
var transformedData = dynamicPipeline.Fit(data).Transform(data); | ||
|
||
// Inspect some columns of the resulting dataset. | ||
var categoricalBags = transformedData.GetColumn<float[]>(mlContext, "CategoricalBag").Take(10).ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Take(10) [](start = 85, length = 9)
should we remove those #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good: otherwise we'll materializer the entire column of data, which may be large
In reply to: 231019258 [](ancestors = 231019258)
.Append(mlContext.Transforms.Text.NormalizeText("Message", "NormalizedMessage")) | ||
|
||
// NLP pipeline 1: bag of words. | ||
.Append(new WordBagEstimator(mlContext, "NormalizedMessage", "BagOfWords")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WordBagEstimator [](start = 16, length = 16)
is this not part of the Text catalog? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in the catalog yet. Senja will add it soon.
In reply to: 231225526 [](ancestors = 231225526,231019528)
docs/code/MlNetCookBook.md
Outdated
.Append(new WordBagEstimator(mlContext, "NormalizedMessage", "BagOfWords")) | ||
|
||
// NLP pipeline 2: bag of bigrams, using hashes instead of dictionary indices. | ||
.Append(new WordHashBagEstimator(mlContext, "NormalizedMessage", "BagOfBigrams", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WordHashBagEstimator [](start = 16, length = 20)
i think this is part of the text catalog. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note that the label is text, so it needs to be converted to key. | ||
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest) | ||
// Use the multi-class SDCA model to predict the label using features. | ||
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Append(m [](start = 4, length = 9)
does it need KeyToVal at the end? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the static pipeline version above it doesn't have the KeyToVal for this example, so I'm not adding it. This one is looking at averaged accuracies, so the string value of the label is not as important.
The example in "How do I use the model to make one prediction" add the ToKey, which looks at the transformed data.
In reply to: 231019826 [](ancestors = 231019826)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they are looking for an example of how to multiclass , and they use this, the users will have trouble making sense of their predictions. I think we have had several bus along those lines.
In reply to: 231210991 [](ancestors = 231210991,231019826)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shmoradims @Zruty0 - Will the following be updated with dynamic examples?
- How do I verify the model quality?
- How do I load data from multiple files?
- What if my training data is not in a text file?
var reader = mlContext.Data.TextReader(new TextLoader.Arguments | ||
{ | ||
Column = new[] { | ||
// A boolean column depicting the 'label'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be DataKind.Text and DataKind.Boolean instead? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some equivalent names. We tend prefer the short two letter version.
TX = 11,
TXT = TX,
Text = TX,
BL = 12,
Bool = BL,
In reply to: 231213230 [](ancestors = 231213230)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that because not every type has an equivalent name? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[John are you using codeflow? If not, your comments will come as different threads.]
I clarified with Pete. We are planning to rename DataKind to be closer to .NET types. We'll need to update all of these once that change happens.
In reply to: 231290740 [](ancestors = 231290740)
```csharp | ||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
var mlContext = new MLContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var [](start = 0, length = 3)
Please also add these snippets as tests to CookbookSamples
: this way we ensure that they compile, and are updated when the API gets updated. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. already created a new set of tests for dynamic api.
In reply to: 231221675 [](ancestors = 231221675)
docs/code/MlNetCookBook.md
Outdated
// We 'start' the pipeline with the output of the reader. | ||
var dynamicPipeline = | ||
// First 'normalize' the data (rescale to be | ||
// between -1 and 1 for all examples), and then train the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, and then train the model. [](start = 41, length = 27)
not needed #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/code/MlNetCookBook.md
Outdated
// Concatenate all the features together into one column 'Features'. | ||
mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") | ||
// Note that the label is text, so it needs to be converted to key. | ||
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"), TransformerScope.TrainTest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueToKeyMappingEstimator [](start = 16, length = 26)
mlContext.Transforms.Categorical.MapValueToKey
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Apply the inverse conversion from 'PredictedLabel' column back to string value. | ||
.Append(mlContext.Transforms.Conversion.MapKeyToValue(("PredictedLabel", "Data"))); | ||
|
||
// Train the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Train the model. [](start = 3, length = 16)
technically you'd have to Transform if you say "train the model". #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping it the same as Pete's words from static pipeline.
In reply to: 231290065 [](ancestors = 231290065)
.Append(mlContext.BinaryClassification.Trainers.FastTree(numTrees: 50)); | ||
|
||
// Train the model. | ||
var model = fullLearningPipeline.Fit(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Fit(data) [](start = 32, length = 10)
same here, Transform. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the dynamic API equivalent for all the snippets that were using static API, except for the snippet that uses onFit, which is not supported by dynamic API yet.