-
Notifications
You must be signed in to change notification settings - Fork 1.9k
OutputTokens option in FeaturizeText API #2985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2985 +/- ##
==========================================
+ Coverage 72.35% 72.35% +<.01%
==========================================
Files 803 803
Lines 143296 143296
Branches 16155 16155
==========================================
+ Hits 103675 103678 +3
+ Misses 35194 35192 -2
+ Partials 4427 4426 -1
|
@@ -111,8 +111,8 @@ public sealed class Options : TransformInputBase | |||
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to keep numbers or remove them.", ShortName = "num", SortOrder = 8)] | |||
public bool KeepNumbers = TextNormalizingEstimator.Defaults.KeepNumbers; | |||
|
|||
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to output the transformed text tokens as an additional column.", ShortName = "tokens,showtext,showTransformedText", SortOrder = 9)] | |||
public bool OutputTokens; | |||
[Argument(ArgumentType.AtMostOnce, HelpText = "Column containing the transformed text tokens.", ShortName = "OutputTokens,tokens,showtext,showTransformedText", SortOrder = 9)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutputTokens,tokens,showtext,showTransformedText [](start = 121, length = 48)
not sure it make much sense to preserve old names since you basically change logic of this field. #Resolved
@@ -43,7 +43,7 @@ public void TextFeaturizerWorkout() | |||
.AsDynamic; | |||
|
|||
var feat = data.MakeNewEstimator() | |||
.Append(row => row.text.FeaturizeText(options: new TextFeaturizingEstimator.Options { OutputTokens = true, })); | |||
.Append(row => row.text.FeaturizeText(options: new TextFeaturizingEstimator.Options { OutputTokensColumnName = "Data_TransformedText", })); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data_TransformedText [](start = 129, length = 20)
No need in _
anymore, you can name it as you want! #Resolved
I verified it and it works.
The test went through successfully, validating this change does not affect the loading/saving of old models with this change. I believe the reason this works is because while saving the model, the In reply to: 474023125 [](ancestors = 474023125) Refers to: src/Microsoft.ML.Transforms/Text/TextFeaturizingEstimator.cs:675 in 5d6a2f6. [](commit_id = 5d6a2f6, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -35,7 +35,7 @@ public static void Example() | |||
{ | |||
KeepPunctuations = false, | |||
KeepNumbers = false, | |||
OutputTokens = true, | |||
OutputTokensColumnName = "OutputTokens", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutputTokens [](start = 42, length = 12)
TokenizedText
perhaps? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs one safety check.
☘️
Will treat it as bug fix #3002 . Do not want to block this PR on it since we are near code complete for project 13 issues In reply to: 474100886 [](ancestors = 474100886) Refers to: src/Microsoft.ML.Transforms/Text/TextFeaturizingEstimator.cs:334 in 40eb3f4. [](commit_id = 40eb3f4, deletion_comment = False) |
Fixes #2957