Skip to content

ValueMappingEstimator example #2222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 1, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions docs/samples/Microsoft.ML.Samples/Dynamic/ValueMapping.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
using System;
using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Conversions;

namespace Microsoft.ML.Samples.Dynamic
{
public class ValueMappingExample
Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueMappingExample [](start = 17, length = 19)

you need to reference your samples from the extensions, otherwise they won't show up anywhere.

For how to link it, see: https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs#L115

The XML of the extensions should have this on it:
///
///
///
///

#Resolved

{
class SampleInfertDataWithFeatures
{
public float Age = 0;
public string Education = default;
public string EducationCategory = default;
}

///<summary>
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sample is missing the "So what?" aspect. Why would I use this transform, and where is it helpful?

Can you give an example of a problem you can't solve without using this? #Resolved

/// This example demonstrates the use of the ValueMappingEstimator by mapping string-to-string values. The ValueMappingEstimator uses
/// level of education as keys to a respective string label which is the value.
/// The mapping looks like the following:
/// <list>
/// <item>0-5yrs -> Cat1</item>
Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0-5yrs -> Cat1 [](start = 22, length = 14)

For XML documentation, those need to be wrapped in a tag.

It doesn't matter much in your case, because this will display inside a code block. #Resolved

/// <item>6-11yrs -> Cat2</item>
/// <item>12+yrs -> Cat3</item>
/// </list>
Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should typically go in a section.

you can use simple, non-xml comments here. #Resolved

/// </summary>
public static void Run()
{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var ml = new MLContext();
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlContext #Resolved


// Get a small dataset as an IEnumerable.
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData();
Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> [](start = 12, length = 55)

var #Resolved

Copy link
Member Author

@singlis singlis Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the feedback was to not use var in this case because its more explicit about what we are doing. #Resolved

var trainData = ml.Data.ReadFromEnumerable(data);

// Preview of the data.
//
// Age Case Education induced parity pooled.stratum row_num ...
// 26.0 1.0 0-5yrs 1.0 6.0 3.0 1.0 ...
// 42.0 1.0 0-5yrs 1.0 1.0 1.0 2.0 ...
// 39.0 1.0 12+yrs 2.0 6.0 4.0 3.0 ...
// 34.0 1.0 0-5yrs 2.0 4.0 2.0 4.0 ...
// 35.0 1.0 6-11yrs 1.0 3.0 32.0 5.0 ...

// Creating a list of keys based on the Education values from the dataset
// These lists are created by hand for the demonstration, but the ValueMappingEstimator does take an IEnumerable.
Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the users might wonder why we are doing this, although it is somewhat explained.

Maybe try adding: if the list of keys and the list of values is known, they can be passed to the API. #Resolved

var educationKeys = new List<string>()
{
"0-5yrs",
"6-11yrs",
"12+yrs"
};

// Creating a list of associated values that will map respectively to each educationKey
var educationValues = new List<string>()
Copy link
Contributor

@rogancarr rogancarr Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what these are, just as you explained what the keys were. #Resolved

{
"Cat1",
"Cat2",
"Cat3"
};

// Constructs the ValueMappingEstimator making the ML.net pipeline
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, ("EducationCategory", "Education"));
Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new ValueMappingEstimator [](start = 27, length = 25)

i think we want to create those through the mlContext. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes totally agree, I have updated these to now be called from the mlContext. However I was not able to do that for the KeyType example due to the extension methods not taking in the boolean for treatValuesAsKeyTypes. I have created an issue to track this #2346


In reply to: 252779161 [](ancestors = 252779161)

Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call this from mlContext? #Resolved


// Fits the ValueMappingEstimator and transforms the data converting the Education to EducationCategory.
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData);

// Getting the resulting data as an IEnumerable of SampleInfertDataWithFeatures. This will contain the newly created column EducationCategory
IEnumerable<SampleInfertDataWithFeatures> featureRows = ml.CreateEnumerable<SampleInfertDataWithFeatures>(transformedData, reuseRowObject: false);

Console.WriteLine($"Example of mapping string->string");
Console.WriteLine($"Age\tEducation\tEducationLabel");
foreach (var featureRow in featureRows)
{
Console.WriteLine($"{featureRow.Age}\t{featureRow.Education} \t{featureRow.EducationCategory}");
}

// Features column obtained post-transformation.
//
// Age Education EducationLabel
Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Age Education EducationLabel [](start = 14, length = 32)

I'd expect a column with name "EducationCategory" #Resolved

// 26 0-5yrs Cat1
// 42 0-5yrs Cat1
// 39 12+yrs Cat3
// 34 0-5yrs Cat1
// 35 6-11yrs Cat2
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my comment above, what is the use of changing it from a string to a different string? Can you show what this can then be used for? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a remarks section and added additional verbage to each example.


In reply to: 252778545 [](ancestors = 252778545)

}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
using System;
using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Conversions;

namespace Microsoft.ML.Samples.Dynamic
{
public class ValueMappingFloatToStringExample
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above — what problem does this solve? #Resolved

Copy link
Member

@sfilipi sfilipi Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueMappingFloatToStringExample [](start = 17, length = 32)

this is not referenced from anything. #Resolved

{
/// <summary>
/// Helper class for retrieving the resulting data
/// </summary>
class SampleInfertDataWithInducedCategory
{
public float Age = 0;
public float Induced = 0.0f;
public string InducedCategory = default;
}

///<summary>
/// This example demonstrates the use of floating types as the key type for ValueMappingEstimator by mapping a float-to-string value.
/// The mapping looks like the following:
/// <list>
/// <item>1.0 -> Cat1</item>
/// <item>2.0 -> Cat2</item>
/// </list>
/// </summary>
public static void Run()
{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var ml = new MLContext();
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlContext #Resolved


// Get a small dataset as an IEnumerable.
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData();
var trainData = ml.Data.ReadFromEnumerable(data);

// Creating a list of keys based on the induced value from the dataset
// These lists are created by hand for the demonstration, but the ValueMappingEstimator does take an IEnumerable.
var inducedKeys = new List<float>()
{
1.0f,
2.0f
};

// Creating a list of values, these strings will map accordingly to each key.
var inducedValues = new List<string>()
{
"Cat1",
"Cat2"
};

// Constructs the ValueMappingEstimator making the ML.net pipeline
var pipeline = new ValueMappingEstimator<float, string>(ml, inducedKeys, inducedValues, ("InducedCategory", "Induced"));
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call this from mlContext? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated -- there are two cases that I can not call from mlContext. I have created issues for that.


In reply to: 252782089 [](ancestors = 252782089)


// Fits the ValueMappingEstimator and transforms the data adding the InducedCategory column.
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData);

// Getting the resulting data as an IEnumerable of SampleInfertDataWithInducedCategory. This will contain the newly created column InducedCategory
IEnumerable<SampleInfertDataWithInducedCategory> featureRows = ml.CreateEnumerable<SampleInfertDataWithInducedCategory>(transformedData, reuseRowObject: false);

Console.WriteLine($"Example of mapping float->string");
Console.WriteLine($"Age\tInduced\tInducedCategory");
foreach (var featureRow in featureRows)
{
Console.WriteLine($"{featureRow.Age}\t{featureRow.Induced}\t{featureRow.InducedCategory}");
}

// Features column obtained post-transformation.
//
// Example of mapping float->string
// Age Induced InducedCategory
// 26 1 Cat1
// 42 1 Cat1
// 39 2 Cat2
// 34 2 Cat2
// 35 1 Cat1
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
using System;
using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Conversions;

namespace Microsoft.ML.Samples.Dynamic
{
public class ValueMappingStringToArrayExample
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above — what problem does this solve? #Resolved

Copy link
Member

@sfilipi sfilipi Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueMappingStringToArrayExample [](start = 17, length = 32)

this is not referenced anywhere. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIght - this is just another example that is available to show how this can be used. There is another example that is not used either. Does they all need to be referenced?


In reply to: 252891445 [](ancestors = 252891445)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove them. The purpose of this project is to populate the API reference website.

the machinelearning-samples repo is the one where we can put more samples, if needed.

If we keep them here, unreferenced, it will be hard to see what is used for what, overtime.


In reply to: 252938208 [](ancestors = 252938208,252891445)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add multiple links? I initially had these in one file but moved out into separate files to not have to update line number reference.


In reply to: 252938844 [](ancestors = 252938844,252938208,252891445)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added multiple links so they are all referenced now.


In reply to: 252939988 [](ancestors = 252939988,252938844,252938208,252891445)

{
/// <summary>
/// Helper class for retrieving the resulting data
/// </summary>
class SampleInfertDataWithIntArray
{
public float Age = 0;
public string Education = default;
public int[] EducationCategory = default;
}

///<summary>
/// This example demonstrates the use arrays as the values for the ValueMappingEstimator. It maps a set of keys that are type string
/// to a integer arrays of variable length.
/// The mapping looks like the following:
/// <list>
/// <item>0-5yrs -> 1,2,3,4</item>
/// <item>6-11yrs -> 5,6,7</item>
/// <item>12+yrs -> 42, 32</item>
/// </list>
/// </summary>
public static void Run()
{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var ml = new MLContext();
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlContext #Resolved


// Get a small dataset as an IEnumerable.
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData();
var trainData = ml.Data.ReadFromEnumerable(data);

Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Empty line #Resolved

// Creating a list of keys based on the Education values from the dataset
var educationKeys = new List<string>()
{
"0-5yrs",
"6-11yrs",
"12+yrs"
};

// Sample list of associated array values
var educationValues = new List<int[]>()
{
new int[] { 1,2,3,4 },
new int[] { 5,6,7 },
new int[] { 42, 32 }
};

// Constructs the ValueMappingEstimator making the ML.net pipeline
var pipeline = new ValueMappingEstimator<string, int>(ml, educationKeys, educationValues, ("EducationCategory", "Education"));
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call this from mlContext ? #Resolved

Copy link
Member

@sfilipi sfilipi Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r<string, int> [](start = 51, length = 14)

can we merge this sample with the other one? maybe have a pipeline with the two estimators?

can we merge all this samples into one, with a pipeline with more than one estimator? #ByDesign


// Fits the ValueMappingEstimator and transforms the data adding the EducationCategory column.
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData);

// Getting the resulting data as an IEnumerable of SampleInfertDataWithIntArray. This will contain the newly created column EducationCategory
IEnumerable<SampleInfertDataWithIntArray> featuresColumn = ml.CreateEnumerable<SampleInfertDataWithIntArray>(transformedData, reuseRowObject: false);

Console.WriteLine($"Example of mapping string->array");
Console.WriteLine($"Age\tEducation\tEducationLabel");
foreach (var featureRow in featuresColumn)
{
Console.WriteLine($"{featureRow.Age}\t{featureRow.Education} \t{string.Join(",", featureRow.EducationCategory)}");
}

// Features column obtained post-transformation.
//
// Example of mapping string->array
// Age Education EducationLabel
// 26 0 - 5yrs 1,2,3,4
// 42 0 - 5yrs 1,2,3,4
// 39 12 + yrs 42,32
// 34 0 - 5yrs 1,2,3,4
// 35 6 - 11yrs 5,6,7
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
using System;
using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Conversions;

namespace Microsoft.ML.Samples.Dynamic
{
public class ValueMappingStringToKeyTypeExample
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above — what problem does this solve? #Resolved

{
/// <summary>
/// Helper class for retrieving the resulting data
/// </summary>
class SampleInfertDataWithFeatures

{
public float Age = 0;
public string Education = default;
public string EducationCategory = default;
}

///<summary>
/// This example demonstrates the use of KeyTypes by setting treatValuesAsKeyTypes to true,
/// <see cref="ValueMappingEstimator.ValueMappingEstimator(IHostEnvironment, IEnumerable{TKey}, IEnumerable{TValue}, bool, (string input, string output)[])")/> to true.
/// This is useful in cases where you want the output to be integer based rather than the actual value.
///
/// When using KeyTypes as a Value, the ValueMappingEstimator will do one of the following:
/// 1) If the Value type is an unsigned int or unsigned long, the specified values are used directly as the KeyType values.
/// 2) If the Value type is not an unsigned int or unsigned long, new KeyType values are generated for each unique value.
///
/// In this example, the Value type is a string. Since we are setting treatValueAsKeyTypes to true,
/// the ValueMappingEstimator will generate its own KeyType values for each unique string.
/// As with KeyTypes, they contain the actual Value information as part of the metadata, therefore
/// we can convert a KeyType back to the actual value the KeyType represents. To demonstrate
/// the reverse lookup and to confirm the correct value is mapped, a KeyToValueEstimator is added
/// to the pipeline to convert back to the original value.
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but what does having a KeyType do for me? (Explain this here.) #Resolved

/// </summary>
public static void Run()
{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var ml = new MLContext();
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlContext #Resolved


// Get a small dataset as an IEnumerable.
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData();
var trainData = ml.Data.ReadFromEnumerable(data);

// Creating a list of keys based on the Education values from the dataset
// These lists are created by hand for the demonstration, but the ValueMappingEstimator does take an IEnumerable.
var educationKeys = new List<string>()
{
"0-5yrs",
"6-11yrs",
"12+yrs"
};

// Creating a list of values that are sample strings. These will be converted to KeyTypes
var educationValues = new List<string>()
{
"Cat1",
"Cat2",
"Cat3"
};

// Generate the ValueMappingEstimator that will output KeyTypes even though our values are strings.
// The KeyToValueMappingEstimator is added to provide a reverse lookup of the KeyType, converting the KeyType value back
// to the original value.
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, true, ("EducationKeyType", "Education"))
.Append(new KeyToValueMappingEstimator(ml, ("EducationCategory", "EducationKeyType")));
Copy link
Contributor

@rogancarr rogancarr Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use mlContext. #Resolved


// Fits the ValueMappingEstimator and transforms the data adding the EducationKeyType column.
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData);

// Getting the resulting data as an IEnumerable of SampleInfertDataWithFeatures.
IEnumerable<SampleInfertDataWithFeatures> featureRows = ml.CreateEnumerable<SampleInfertDataWithFeatures>(transformedData, reuseRowObject: false);

Console.WriteLine($"Example of mapping string->keytype");
Console.WriteLine($"Age\tEducation\tEducationLabel");
foreach (var featureRow in featureRows)
{
Console.WriteLine($"{featureRow.Age}\t{featureRow.Education} \t{featureRow.EducationCategory}");
}

// Features column obtained post-transformation.
//
// Age Education EducationLabel
// 26 0-5yrs Cat1
// 42 0-5yrs Cat1
// 39 12+yrs Cat3
// 34 0-5yrs Cat1
// 35 6-11yrs Cat2
}
}
}
2 changes: 1 addition & 1 deletion src/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ public static IEnumerable<SampleInfertData> GetInfertData()
data.Add(new SampleInfertData
{
RowNum = 2,
Education = "0-5yrs",
Education = "12+yrs",
Age = 39,
Parity = 6,
Induced = 2,
Expand Down