-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Multi column MapKeyToValue and MapValueToKey #3187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3187 +/- ##
==========================================
- Coverage 72.54% 72.53% -0.01%
==========================================
Files 807 807
Lines 144774 144774
Branches 16208 16208
==========================================
- Hits 105021 105017 -4
- Misses 35339 35343 +4
Partials 4414 4414
|
Codecov Report
@@ Coverage Diff @@
## master #3187 +/- ##
==========================================
+ Coverage 72.6% 72.62% +0.01%
==========================================
Files 807 807
Lines 145077 145080 +3
Branches 16213 16213
==========================================
+ Hits 105337 105366 +29
+ Misses 35322 35296 -26
Partials 4418 4418
|
|
||
// at this point, the Label colum is tranformed from strings, to DataViewKeyType and | ||
// the transformation has added the PredictedLabel column, with | ||
var newPipeline = mlContext.Transforms.Conversion.MapKeyToValue(new[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newPipeline [](start = 16, length = 11)
i am confused by this -- the newPipeline
has no interaction with the previous pipeline
.
So how does the newPipeline
know about the mapping that pipeline
generated in MapValueToKey
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what MapKeyToValue does. The mapping is saved in the Annotations of the column.
In reply to: 271831815 [](ancestors = 271831815)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see.. makes sense..
we may want to add that as comment in the sample .. am sure users will have the same question
In reply to: 271866086 [](ancestors = 271866086,271831815)
new LookupMap { Key = "6-11yrs" }, | ||
new LookupMap { Key = "25+yrs" } | ||
|
||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}; [](start = 12, length = 2)
whitespace #Resolved
@@ -100,8 +100,9 @@ internal static TypeConvertingEstimator ConvertType(this TransformsCatalog.Conve | |||
/// <example> | |||
/// <format type="text/markdown"> | |||
/// <] | |||
/// ]]></format> | |||
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValueToKeyInputOutputPair.cs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyToValueToKeyInputOutputPair [](start = 98, length = 30)
this API isnt for IOPair..also path looks incorrect
perhaps a case of misplaced example ? #Resolved
@@ -173,7 +181,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
/// <example> | |||
/// <format type="text/markdown"> | |||
/// <] | |||
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/MapValueToKeyInputOutputPair.cs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueToKey [](start = 26, length = 10)
seems incorrect.. #Closed
@@ -173,7 +181,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
/// <example> | |||
/// <format type="text/markdown"> | |||
/// <] | |||
/// [!code-csharp[ValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/MapValueToKeyInputOutputPair.cs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapValueToKeyInputOutputPair [](start = 98, length = 28)
same comment as above #Resolved
/// <example> | ||
/// <format type="text/markdown"> | ||
/// <] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueToKey [](start = 26, length = 10)
seems incorrect #Resolved
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class MapKeyToValueInputOutputPair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapKeyToValueInputOutputPair [](start = 17, length = 28)
class and file name should be MapKeyToValueMultiColumn
#Resolved
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public static class MapValueToKeyInputOutputPair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapValueToKeyInputOutputPair [](start = 24, length = 28)
we decided to use suffix MultiColumn
for these APIs #Resolved
@@ -173,7 +181,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
/// <example> | |||
/// <format type="text/markdown"> | |||
/// <] | |||
/// [!code-csharp[MapValueToKey](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Conversion/MapValueToKeyManyColumn.cs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many [](start = 136, length = 4)
why fixing it in this PR .. this is not the multi column example ?
also.. file name is Multi not Many .. #Resolved
/// <example> | ||
/// <format type="text/markdown"> | ||
/// <] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyToValue [](start = 127, length = 10)
ValueToKey #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do not see the change .. i.e use of MapValueToKeyMultiColumn.cs for this API instead of MapKeyToValueMultiColumn.cs
In reply to: 271930498 [](ancestors = 271930498,271927396)
@@ -100,8 +100,9 @@ internal static TypeConvertingEstimator ConvertType(this TransformsCatalog.Conve | |||
/// <example> | |||
/// <format type="text/markdown"> | |||
/// <] | |||
/// ]]></format> | |||
/// [!code-csharp[MapKeyToValue](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValueToKey.cs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyToValueToKey.cs [](start = 101, length = 18)
I do not see this file in the codebase #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reply to: 271935868 [](ancestors = 271935868)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Trainers.MulticlassClassification [](start = 38, length = 34)
why are we removing this? we're using the long namespace for trainers to prevent name conflicts #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought every class had its own distinctive name?
In reply to: 271977575 [](ancestors = 271977575)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ic, we call them the same between tasks. Reverting.
In reply to: 272288927 [](ancestors = 272288927,271977575)
var mlContext = new MLContext(seed: 0); | ||
|
||
// Create a list of data examples. | ||
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have inline data like the other sample? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the most common use case for this transform would be this one: after multiclass/binary get back the original values, therefore used it in this context.
In reply to: 271977941 [](ancestors = 271977941)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gani's PR is not checked in. I can switch the Generate after he checks in.
In reply to: 272332287 [](ancestors = 272332287,271977941)
|
||
// Get a small dataset as an IEnumerable. | ||
var rawData = new[] { | ||
new DataPoint() { StudyTime = "0-4yrs" , DevelopmentTime = "6-11yrs" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6-11yrs [](start = 76, length = 7)
can we use something other than time? Maybe CourseName? I don't want to give the impression that the values of the two columns should be similar or related in any way. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i kept it to the same things because of the lookup map. It can have keys for two distinct categories, but i think if they they do multicolumn, it will most likely be for two separate columns that have the same categories of values.
In reply to: 271978738 [](ancestors = 271978738)
4171b5b
to
a5758c3
Compare
Addresssing Abhishek's comments
a5758c3
to
4e6c4af
Compare
|
||
// TransformedData obtained post-transformation. | ||
// | ||
// StudyTime StudyTimeCategory DevelopmentTime DevelopmentTimeCategory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DevelopmentTime [](start = 48, length = 15)
copy paste error #Resolved
// This will contain the newly created columns. | ||
features = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: false); | ||
|
||
Console.WriteLine($" StudyTime StudyTimeCategory DevelopmentTime DevelopmentTimeCategory"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DevelopmentTime [](start = 65, length = 15)
Course #Resolved
// This will contain the newly created columns. | ||
features = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: false); | ||
|
||
Console.WriteLine($" StudyTime StudyTimeCategory DevelopmentTime DevelopmentTimeCategory"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DevelopmentTimeCategory [](start = 84, length = 23)
CourseCategory #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Multi column MapKeyToValue and MapValueToKey
Towards #1209 more samples for MapKeyToValue and MapValueToKey