Skip to content

Multi column MapKeyToValue and MapValueToKey #3187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 7, 2019

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Apr 3, 2019

Towards #1209 more samples for MapKeyToValue and MapValueToKey

@sfilipi sfilipi added the documentation Related to documentation of ML.NET label Apr 3, 2019
@sfilipi sfilipi self-assigned this Apr 3, 2019
@codecov
Copy link

codecov bot commented Apr 3, 2019

Codecov Report

Merging #3187 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3187      +/-   ##
==========================================
- Coverage   72.54%   72.53%   -0.01%     
==========================================
  Files         807      807              
  Lines      144774   144774              
  Branches    16208    16208              
==========================================
- Hits       105021   105017       -4     
- Misses      35339    35343       +4     
  Partials     4414     4414
Flag Coverage Δ
#Debug 72.53% <ø> (-0.01%) ⬇️
#production 68.12% <ø> (-0.01%) ⬇️
#test 88.82% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 44.87% <ø> (ø) ⬆️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.26% <0%> (+0.15%) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 85.11% <0%> (+0.4%) ⬆️

@codecov
Copy link

codecov bot commented Apr 3, 2019

Codecov Report

Merging #3187 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3187      +/-   ##
==========================================
+ Coverage    72.6%   72.62%   +0.01%     
==========================================
  Files         807      807              
  Lines      145077   145080       +3     
  Branches    16213    16213              
==========================================
+ Hits       105337   105366      +29     
+ Misses      35322    35296      -26     
  Partials     4418     4418
Flag Coverage Δ
#Debug 72.62% <ø> (+0.01%) ⬆️
#production 68.17% <ø> (+0.02%) ⬆️
#test 88.92% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 64.07% <ø> (ø) ⬆️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...osoft.ML.Recommender/SafeTrainingAndModelBuffer.cs 78.87% <0%> (ø) ⬆️
...osoft.ML.Recommender/MatrixFactorizationTrainer.cs 70.39% <0%> (ø) ⬆️
...oft.ML.Recommender/MatrixFactorizationPredictor.cs 86.21% <0%> (ø) ⬆️
.../Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs 97.31% <0%> (+0.05%) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️
...oft.ML.StandardTrainers/Standard/SdcaMulticlass.cs 91.12% <0%> (+1.02%) ⬆️
...LogisticRegression/MulticlassLogisticRegression.cs 67.61% <0%> (+1.74%) ⬆️
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs 92.34% <0%> (+3.27%) ⬆️
... and 1 more


// at this point, the Label colum is tranformed from strings, to DataViewKeyType and
// the transformation has added the PredictedLabel column, with
var newPipeline = mlContext.Transforms.Conversion.MapKeyToValue(new[]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newPipeline [](start = 16, length = 11)

i am confused by this -- the newPipeline has no interaction with the previous pipeline.

So how does the newPipeline know about the mapping that pipeline generated in MapValueToKey #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what MapKeyToValue does. The mapping is saved in the Annotations of the column.


In reply to: 271831815 [](ancestors = 271831815)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see.. makes sense..

we may want to add that as comment in the sample .. am sure users will have the same question


In reply to: 271866086 [](ancestors = 271866086,271831815)

new LookupMap { Key = "6-11yrs" },
new LookupMap { Key = "25+yrs" }

};
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}; [](start = 12, length = 2)

whitespace #Resolved

@@ -100,8 +100,9 @@ internal static TypeConvertingEstimator ConvertType(this TransformsCatalog.Conve
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[KeyToValueMappingEstimator](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/ValueMappingStringToKeyType.cs)]
/// ]]></format>
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValueToKeyInputOutputPair.cs)]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyToValueToKeyInputOutputPair [](start = 98, length = 30)

this API isnt for IOPair..also path looks incorrect

perhaps a case of misplaced example ? #Resolved

@@ -173,7 +181,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValueValueToKey.cs)]
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/MapValueToKeyInputOutputPair.cs)]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueToKey [](start = 26, length = 10)

seems incorrect.. #Closed

@@ -173,7 +181,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValueValueToKey.cs)]
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/MapValueToKeyInputOutputPair.cs)]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapValueToKeyInputOutputPair [](start = 98, length = 28)

same comment as above #Resolved

/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Conversion/MapKeyToValueInputOutputPair.cs)]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueToKey [](start = 26, length = 10)

seems incorrect #Resolved


namespace Microsoft.ML.Samples.Dynamic
{
public class MapKeyToValueInputOutputPair
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapKeyToValueInputOutputPair [](start = 17, length = 28)

class and file name should be MapKeyToValueMultiColumn #Resolved


namespace Microsoft.ML.Samples.Dynamic
{
public static class MapValueToKeyInputOutputPair
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapValueToKeyInputOutputPair [](start = 24, length = 28)

we decided to use suffix MultiColumn for these APIs #Resolved

@@ -173,7 +181,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValueValueToKey.cs)]
/// [!code-csharp[MapValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Conversion/MapValueToKeyManyColumn.cs)]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many [](start = 136, length = 4)

why fixing it in this PR .. this is not the multi column example ?

also.. file name is Multi not Many .. #Resolved

/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[MapValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Conversions/MapKeyToValueMultiColumn.cs)]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyToValue [](start = 127, length = 10)

ValueToKey #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had them on the same file, at some point :)


In reply to: 271927396 [](ancestors = 271927396)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not see the change .. i.e use of MapValueToKeyMultiColumn.cs for this API instead of MapKeyToValueMultiColumn.cs


In reply to: 271930498 [](ancestors = 271930498,271927396)

@@ -100,8 +100,9 @@ internal static TypeConvertingEstimator ConvertType(this TransformsCatalog.Conve
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[KeyToValueMappingEstimator](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/ValueMappingStringToKeyType.cs)]
/// ]]></format>
/// [!code-csharp[MapKeyToValue](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValueToKey.cs)]
Copy link
Member

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyToValueToKey.cs [](start = 101, length = 18)

I do not see this file in the codebase #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating it.


In reply to: 271959976 [](ancestors = 271959976,271935868)

Copy link
Member

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:


namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
Copy link

@shmoradims shmoradims Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Trainers.MulticlassClassification [](start = 38, length = 34)

why are we removing this? we're using the long namespace for trainers to prevent name conflicts #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought every class had its own distinctive name?


In reply to: 271977575 [](ancestors = 271977575)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic, we call them the same between tasks. Reverting.


In reply to: 272288927 [](ancestors = 272288927,271977575)

var mlContext = new MLContext(seed: 0);

// Create a list of data examples.
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);
Copy link

@shmoradims shmoradims Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have inline data like the other sample? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the most common use case for this transform would be this one: after multiclass/binary get back the original values, therefore used it in this context.


In reply to: 271977941 [](ancestors = 271977941)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gani's PR is not checked in. I can switch the Generate after he checks in.


In reply to: 272332287 [](ancestors = 272332287,271977941)


// Get a small dataset as an IEnumerable.
var rawData = new[] {
new DataPoint() { StudyTime = "0-4yrs" , DevelopmentTime = "6-11yrs" },
Copy link

@shmoradims shmoradims Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6-11yrs [](start = 76, length = 7)

can we use something other than time? Maybe CourseName? I don't want to give the impression that the values of the two columns should be similar or related in any way. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kept it to the same things because of the lookup map. It can have keys for two distinct categories, but i think if they they do multicolumn, it will most likely be for two separate columns that have the same categories of values.


In reply to: 271978738 [](ancestors = 271978738)

@sfilipi sfilipi force-pushed the moreConversionSamples branch from 4171b5b to a5758c3 Compare April 4, 2019 18:12
@sfilipi sfilipi force-pushed the moreConversionSamples branch from a5758c3 to 4e6c4af Compare April 5, 2019 15:45

// TransformedData obtained post-transformation.
//
// StudyTime StudyTimeCategory DevelopmentTime DevelopmentTimeCategory
Copy link
Member

@abgoswam abgoswam Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DevelopmentTime [](start = 48, length = 15)

copy paste error #Resolved

// This will contain the newly created columns.
features = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: false);

Console.WriteLine($" StudyTime StudyTimeCategory DevelopmentTime DevelopmentTimeCategory");
Copy link
Member

@abgoswam abgoswam Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DevelopmentTime [](start = 65, length = 15)

Course #Resolved

// This will contain the newly created columns.
features = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: false);

Console.WriteLine($" StudyTime StudyTimeCategory DevelopmentTime DevelopmentTimeCategory");
Copy link

@shmoradims shmoradims Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DevelopmentTimeCategory [](start = 84, length = 23)

CourseCategory #Resolved

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 7a0d395 into dotnet:master Apr 7, 2019
@sfilipi sfilipi deleted the moreConversionSamples branch April 7, 2019 07:11
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this pull request Apr 9, 2019
* Multi column MapKeyToValue and MapValueToKey
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants