-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ValueMappingEstimator example #2222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
bfdbc4f
9ad04ff
9159ae5
beb48eb
b3d1df2
a50a29f
8542762
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
using System; | ||
using System.Collections.Generic; | ||
using Microsoft.Data.DataView; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.Transforms.Conversions; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingExample | ||
{ | ||
class SampleInfertDataWithFeatures | ||
{ | ||
public float Age = 0; | ||
public string Education = default; | ||
public string EducationCategory = default; | ||
} | ||
|
||
///<summary> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sample is missing the "So what?" aspect. Why would I use this transform, and where is it helpful? Can you give an example of a problem you can't solve without using this? #Resolved |
||
/// This example demonstrates the use of the ValueMappingEstimator by mapping string-to-string values. The ValueMappingEstimator uses | ||
/// level of education as keys to a respective string label which is the value. | ||
/// The mapping looks like the following: | ||
/// <list> | ||
/// <item>0-5yrs -> Cat1</item> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For XML documentation, those need to be wrapped in a tag. It doesn't matter much in your case, because this will display inside a code block. #Resolved |
||
/// <item>6-11yrs -> Cat2</item> | ||
/// <item>12+yrs -> Cat3</item> | ||
/// </list> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should typically go in a section. you can use simple, non-xml comments here. #Resolved |
||
/// </summary> | ||
public static void Run() | ||
{ | ||
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var ml = new MLContext(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
// Get a small dataset as an IEnumerable. | ||
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
var #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so the feedback was to not use var in this case because its more explicit about what we are doing. #Resolved |
||
var trainData = ml.Data.ReadFromEnumerable(data); | ||
|
||
// Preview of the data. | ||
// | ||
// Age Case Education induced parity pooled.stratum row_num ... | ||
// 26.0 1.0 0-5yrs 1.0 6.0 3.0 1.0 ... | ||
// 42.0 1.0 0-5yrs 1.0 1.0 1.0 2.0 ... | ||
// 39.0 1.0 12+yrs 2.0 6.0 4.0 3.0 ... | ||
// 34.0 1.0 0-5yrs 2.0 4.0 2.0 4.0 ... | ||
// 35.0 1.0 6-11yrs 1.0 3.0 32.0 5.0 ... | ||
|
||
// Creating a list of keys based on the Education values from the dataset | ||
// These lists are created by hand for the demonstration, but the ValueMappingEstimator does take an IEnumerable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the users might wonder why we are doing this, although it is somewhat explained. Maybe try adding: if the list of keys and the list of values is known, they can be passed to the API. #Resolved |
||
var educationKeys = new List<string>() | ||
{ | ||
"0-5yrs", | ||
"6-11yrs", | ||
"12+yrs" | ||
}; | ||
|
||
// Creating a list of associated values that will map respectively to each educationKey | ||
var educationValues = new List<string>() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explain what these are, just as you explained what the keys were. #Resolved |
||
{ | ||
"Cat1", | ||
"Cat2", | ||
"Cat3" | ||
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, ("EducationCategory", "Education")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
i think we want to create those through the mlContext. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes totally agree, I have updated these to now be called from the mlContext. However I was not able to do that for the KeyType example due to the extension methods not taking in the boolean for treatValuesAsKeyTypes. I have created an issue to track this #2346 In reply to: 252779161 [](ancestors = 252779161) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you call this from |
||
|
||
// Fits the ValueMappingEstimator and transforms the data converting the Education to EducationCategory. | ||
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData); | ||
|
||
// Getting the resulting data as an IEnumerable of SampleInfertDataWithFeatures. This will contain the newly created column EducationCategory | ||
IEnumerable<SampleInfertDataWithFeatures> featureRows = ml.CreateEnumerable<SampleInfertDataWithFeatures>(transformedData, reuseRowObject: false); | ||
|
||
Console.WriteLine($"Example of mapping string->string"); | ||
Console.WriteLine($"Age\tEducation\tEducationLabel"); | ||
foreach (var featureRow in featureRows) | ||
{ | ||
Console.WriteLine($"{featureRow.Age}\t{featureRow.Education} \t{featureRow.EducationCategory}"); | ||
} | ||
|
||
// Features column obtained post-transformation. | ||
// | ||
// Age Education EducationLabel | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'd expect a column with name "EducationCategory" #Resolved |
||
// 26 0-5yrs Cat1 | ||
// 42 0-5yrs Cat1 | ||
// 39 12+yrs Cat3 | ||
// 34 0-5yrs Cat1 | ||
// 35 6-11yrs Cat2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to my comment above, what is the use of changing it from a string to a different string? Can you show what this can then be used for? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a remarks section and added additional verbage to each example. In reply to: 252778545 [](ancestors = 252778545) |
||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
using System; | ||
using System.Collections.Generic; | ||
using Microsoft.Data.DataView; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.Transforms.Conversions; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingFloatToStringExample | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above — what problem does this solve? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
this is not referenced from anything. #Resolved |
||
{ | ||
/// <summary> | ||
/// Helper class for retrieving the resulting data | ||
/// </summary> | ||
class SampleInfertDataWithInducedCategory | ||
{ | ||
public float Age = 0; | ||
public float Induced = 0.0f; | ||
public string InducedCategory = default; | ||
} | ||
|
||
///<summary> | ||
/// This example demonstrates the use of floating types as the key type for ValueMappingEstimator by mapping a float-to-string value. | ||
/// The mapping looks like the following: | ||
/// <list> | ||
/// <item>1.0 -> Cat1</item> | ||
/// <item>2.0 -> Cat2</item> | ||
/// </list> | ||
/// </summary> | ||
public static void Run() | ||
{ | ||
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var ml = new MLContext(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mlContext #Resolved |
||
|
||
// Get a small dataset as an IEnumerable. | ||
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData(); | ||
var trainData = ml.Data.ReadFromEnumerable(data); | ||
|
||
// Creating a list of keys based on the induced value from the dataset | ||
// These lists are created by hand for the demonstration, but the ValueMappingEstimator does take an IEnumerable. | ||
var inducedKeys = new List<float>() | ||
{ | ||
1.0f, | ||
2.0f | ||
}; | ||
|
||
// Creating a list of values, these strings will map accordingly to each key. | ||
var inducedValues = new List<string>() | ||
{ | ||
"Cat1", | ||
"Cat2" | ||
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<float, string>(ml, inducedKeys, inducedValues, ("InducedCategory", "Induced")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you call this from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be updated -- there are two cases that I can not call from mlContext. I have created issues for that. In reply to: 252782089 [](ancestors = 252782089) |
||
|
||
// Fits the ValueMappingEstimator and transforms the data adding the InducedCategory column. | ||
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData); | ||
|
||
// Getting the resulting data as an IEnumerable of SampleInfertDataWithInducedCategory. This will contain the newly created column InducedCategory | ||
IEnumerable<SampleInfertDataWithInducedCategory> featureRows = ml.CreateEnumerable<SampleInfertDataWithInducedCategory>(transformedData, reuseRowObject: false); | ||
|
||
Console.WriteLine($"Example of mapping float->string"); | ||
Console.WriteLine($"Age\tInduced\tInducedCategory"); | ||
foreach (var featureRow in featureRows) | ||
{ | ||
Console.WriteLine($"{featureRow.Age}\t{featureRow.Induced}\t{featureRow.InducedCategory}"); | ||
} | ||
|
||
// Features column obtained post-transformation. | ||
// | ||
// Example of mapping float->string | ||
// Age Induced InducedCategory | ||
// 26 1 Cat1 | ||
// 42 1 Cat1 | ||
// 39 2 Cat2 | ||
// 34 2 Cat2 | ||
// 35 1 Cat1 | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
using System; | ||
using System.Collections.Generic; | ||
using Microsoft.Data.DataView; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.Transforms.Conversions; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingStringToArrayExample | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above — what problem does this solve? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
this is not referenced anywhere. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RIght - this is just another example that is available to show how this can be used. There is another example that is not used either. Does they all need to be referenced? In reply to: 252891445 [](ancestors = 252891445) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should remove them. The purpose of this project is to populate the API reference website. the machinelearning-samples repo is the one where we can put more samples, if needed. If we keep them here, unreferenced, it will be hard to see what is used for what, overtime. In reply to: 252938208 [](ancestors = 252938208,252891445) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to add multiple links? I initially had these in one file but moved out into separate files to not have to update line number reference. In reply to: 252938844 [](ancestors = 252938844,252938208,252891445) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added multiple links so they are all referenced now. In reply to: 252939988 [](ancestors = 252939988,252938844,252938208,252891445) |
||
{ | ||
/// <summary> | ||
/// Helper class for retrieving the resulting data | ||
/// </summary> | ||
class SampleInfertDataWithIntArray | ||
{ | ||
public float Age = 0; | ||
public string Education = default; | ||
public int[] EducationCategory = default; | ||
} | ||
|
||
///<summary> | ||
/// This example demonstrates the use arrays as the values for the ValueMappingEstimator. It maps a set of keys that are type string | ||
/// to a integer arrays of variable length. | ||
/// The mapping looks like the following: | ||
/// <list> | ||
/// <item>0-5yrs -> 1,2,3,4</item> | ||
/// <item>6-11yrs -> 5,6,7</item> | ||
/// <item>12+yrs -> 42, 32</item> | ||
/// </list> | ||
/// </summary> | ||
public static void Run() | ||
{ | ||
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var ml = new MLContext(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mlContext #Resolved |
||
|
||
// Get a small dataset as an IEnumerable. | ||
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData(); | ||
var trainData = ml.Data.ReadFromEnumerable(data); | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Empty line #Resolved |
||
// Creating a list of keys based on the Education values from the dataset | ||
var educationKeys = new List<string>() | ||
{ | ||
"0-5yrs", | ||
"6-11yrs", | ||
"12+yrs" | ||
}; | ||
|
||
// Sample list of associated array values | ||
var educationValues = new List<int[]>() | ||
{ | ||
new int[] { 1,2,3,4 }, | ||
new int[] { 5,6,7 }, | ||
new int[] { 42, 32 } | ||
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<string, int>(ml, educationKeys, educationValues, ("EducationCategory", "Education")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you call this from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
can we merge this sample with the other one? maybe have a pipeline with the two estimators? can we merge all this samples into one, with a pipeline with more than one estimator? #ByDesign |
||
|
||
// Fits the ValueMappingEstimator and transforms the data adding the EducationCategory column. | ||
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData); | ||
|
||
// Getting the resulting data as an IEnumerable of SampleInfertDataWithIntArray. This will contain the newly created column EducationCategory | ||
IEnumerable<SampleInfertDataWithIntArray> featuresColumn = ml.CreateEnumerable<SampleInfertDataWithIntArray>(transformedData, reuseRowObject: false); | ||
|
||
Console.WriteLine($"Example of mapping string->array"); | ||
Console.WriteLine($"Age\tEducation\tEducationLabel"); | ||
foreach (var featureRow in featuresColumn) | ||
{ | ||
Console.WriteLine($"{featureRow.Age}\t{featureRow.Education} \t{string.Join(",", featureRow.EducationCategory)}"); | ||
} | ||
|
||
// Features column obtained post-transformation. | ||
// | ||
// Example of mapping string->array | ||
// Age Education EducationLabel | ||
// 26 0 - 5yrs 1,2,3,4 | ||
// 42 0 - 5yrs 1,2,3,4 | ||
// 39 12 + yrs 42,32 | ||
// 34 0 - 5yrs 1,2,3,4 | ||
// 35 6 - 11yrs 5,6,7 | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
using System; | ||
using System.Collections.Generic; | ||
using Microsoft.Data.DataView; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.Transforms.Conversions; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingStringToKeyTypeExample | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above — what problem does this solve? #Resolved |
||
{ | ||
/// <summary> | ||
/// Helper class for retrieving the resulting data | ||
/// </summary> | ||
class SampleInfertDataWithFeatures | ||
|
||
{ | ||
public float Age = 0; | ||
public string Education = default; | ||
public string EducationCategory = default; | ||
} | ||
|
||
///<summary> | ||
/// This example demonstrates the use of KeyTypes by setting treatValuesAsKeyTypes to true, | ||
/// <see cref="ValueMappingEstimator.ValueMappingEstimator(IHostEnvironment, IEnumerable{TKey}, IEnumerable{TValue}, bool, (string input, string output)[])")/> to true. | ||
/// This is useful in cases where you want the output to be integer based rather than the actual value. | ||
/// | ||
/// When using KeyTypes as a Value, the ValueMappingEstimator will do one of the following: | ||
/// 1) If the Value type is an unsigned int or unsigned long, the specified values are used directly as the KeyType values. | ||
/// 2) If the Value type is not an unsigned int or unsigned long, new KeyType values are generated for each unique value. | ||
/// | ||
/// In this example, the Value type is a string. Since we are setting treatValueAsKeyTypes to true, | ||
/// the ValueMappingEstimator will generate its own KeyType values for each unique string. | ||
/// As with KeyTypes, they contain the actual Value information as part of the metadata, therefore | ||
/// we can convert a KeyType back to the actual value the KeyType represents. To demonstrate | ||
/// the reverse lookup and to confirm the correct value is mapped, a KeyToValueEstimator is added | ||
/// to the pipeline to convert back to the original value. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is great, but what does having a KeyType do for me? (Explain this here.) #Resolved |
||
/// </summary> | ||
public static void Run() | ||
{ | ||
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var ml = new MLContext(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mlContext #Resolved |
||
|
||
// Get a small dataset as an IEnumerable. | ||
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData(); | ||
var trainData = ml.Data.ReadFromEnumerable(data); | ||
|
||
// Creating a list of keys based on the Education values from the dataset | ||
// These lists are created by hand for the demonstration, but the ValueMappingEstimator does take an IEnumerable. | ||
var educationKeys = new List<string>() | ||
{ | ||
"0-5yrs", | ||
"6-11yrs", | ||
"12+yrs" | ||
}; | ||
|
||
// Creating a list of values that are sample strings. These will be converted to KeyTypes | ||
var educationValues = new List<string>() | ||
{ | ||
"Cat1", | ||
"Cat2", | ||
"Cat3" | ||
}; | ||
|
||
// Generate the ValueMappingEstimator that will output KeyTypes even though our values are strings. | ||
// The KeyToValueMappingEstimator is added to provide a reverse lookup of the KeyType, converting the KeyType value back | ||
// to the original value. | ||
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, true, ("EducationKeyType", "Education")) | ||
.Append(new KeyToValueMappingEstimator(ml, ("EducationCategory", "EducationKeyType"))); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use |
||
|
||
// Fits the ValueMappingEstimator and transforms the data adding the EducationKeyType column. | ||
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData); | ||
|
||
// Getting the resulting data as an IEnumerable of SampleInfertDataWithFeatures. | ||
IEnumerable<SampleInfertDataWithFeatures> featureRows = ml.CreateEnumerable<SampleInfertDataWithFeatures>(transformedData, reuseRowObject: false); | ||
|
||
Console.WriteLine($"Example of mapping string->keytype"); | ||
Console.WriteLine($"Age\tEducation\tEducationLabel"); | ||
foreach (var featureRow in featureRows) | ||
{ | ||
Console.WriteLine($"{featureRow.Age}\t{featureRow.Education} \t{featureRow.EducationCategory}"); | ||
} | ||
|
||
// Features column obtained post-transformation. | ||
// | ||
// Age Education EducationLabel | ||
// 26 0-5yrs Cat1 | ||
// 42 0-5yrs Cat1 | ||
// 39 12+yrs Cat3 | ||
// 34 0-5yrs Cat1 | ||
// 35 6-11yrs Cat2 | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to reference your samples from the extensions, otherwise they won't show up anywhere.
For how to link it, see: https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs#L115
The XML of the extensions should have this on it:
///
///
///
///
#Resolved