-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Conversion of DropSlots, MutualInformationFeatureSelection, and CountFeatureSelection into estimator and transformers #1683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -179,9 +186,52 @@ public bool IsValid() | |||
} | |||
} | |||
|
|||
public sealed class ColumnInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnInfo [](start = 28, length = 10)
You can add same summary as you have in constructor #Closed
Host.Assert(AreRangesValid(SlotsMin, SlotsMax)); | ||
} | ||
|
||
// Factory method for SignatureLoadModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't forget dot in the end of sentence :) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private readonly ColumnType[] _dstTypes; | ||
private readonly SlotDropper[] _slotDropper; | ||
// Track if all the slots of the column are to be dropped. | ||
private readonly bool[] _suppressed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it's one dimensional array? don't we have array of columns and in each column we have slots? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ith element of this array is true if all slots are dropped from the ith column. We are given a set of columns and related ranges of slots to drop for each column. This indicates if for the ith column all the slots will be dropped. So this does not indicate if a give slot is dropped (that's what the ranges are for), but rather whether all slots will be dropped.
In reply to: 236089331 [](ancestors = 236089331)
Any thoughts @[email protected] ? In reply to: 440747947 [](ancestors = 440747947,440492847) |
.Append(new CountFeatureSelector(Env, "bag_of_words", "bag_of_words_count", 10) | ||
.Append(new MutualInformationFeatureSelector(Env, "bag_of_words", "bag_of_words_mi", labelColumn: "label"))); | ||
var est = new WordBagEstimator(ML, "text", "bag_of_words") | ||
.Append(ML.Transforms.FeatureSelection.CountFeatureSelectingEstimator("bag_of_words", "bag_of_words_count", 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML.Transforms.FeatureSelection [](start = 24, length = 30)
why you need full name? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to test that I can access it from mlcontext
In reply to: 236791021 [](ancestors = 236791021)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ongoing work on converting the transformers to estimators (#754). In this PR I convert DropSlots, MutualInformationFeatureSelection, and CountFeatureSelection into estimator and transformers with relative extensions.
In particular: