Skip to content

Conversion of DropSlots, MutualInformationFeatureSelection, and CountFeatureSelection into estimator and transformers #1683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Nov 27, 2018

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Nov 20, 2018

Ongoing work on converting the transformers to estimators (#754). In this PR I convert DropSlots, MutualInformationFeatureSelection, and CountFeatureSelection into estimator and transformers with relative extensions.

In particular:

  1. DropSlots is converted to a transformer.
  2. MutualInformationFeatureSelection and CountFeatureSelection are converted to estimators.
  3. For both estimators, I add static extensions, and dynamic extensions to MLContext.

@@ -179,9 +186,52 @@ public bool IsValid()
}
}

public sealed class ColumnInfo
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnInfo [](start = 28, length = 10)

You can add same summary as you have in constructor #Closed

Host.Assert(AreRangesValid(SlotsMin, SlotsMax));
}

// Factory method for SignatureLoadModel
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget dot in the end of sentence :) #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


In reply to: 236089275 [](ancestors = 236089275)

private readonly ColumnType[] _dstTypes;
private readonly SlotDropper[] _slotDropper;
// Track if all the slots of the column are to be dropped.
private readonly bool[] _suppressed;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it's one dimensional array? don't we have array of columns and in each column we have slots? #ByDesign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ith element of this array is true if all slots are dropped from the ith column. We are given a set of columns and related ranges of slots to drop for each column. This indicates if for the ith column all the slots will be dropped. So this does not indicate if a give slot is dropped (that's what the ranges are for), but rather whether all slots will be dropped.


In reply to: 236089331 [](ancestors = 236089331)

@artidoro
Copy link
Contributor Author

Any thoughts @[email protected] ?


In reply to: 440747947 [](ancestors = 440747947,440492847)

@Ivanidzo4ka
Copy link
Contributor

I would send you to @sfilipi , words are not my expertise


In reply to: 441860586 [](ancestors = 441860586,440747947,440492847)

.Append(new CountFeatureSelector(Env, "bag_of_words", "bag_of_words_count", 10)
.Append(new MutualInformationFeatureSelector(Env, "bag_of_words", "bag_of_words_mi", labelColumn: "label")));
var est = new WordBagEstimator(ML, "text", "bag_of_words")
.Append(ML.Transforms.FeatureSelection.CountFeatureSelectingEstimator("bag_of_words", "bag_of_words_count", 10)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ML.Transforms.FeatureSelection [](start = 24, length = 30)

why you need full name? #ByDesign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to test that I can access it from mlcontext


In reply to: 236791021 [](ancestors = 236791021)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@artidoro artidoro merged commit 3bf74ed into dotnet:master Nov 27, 2018
@artidoro artidoro deleted the dropslotspr branch January 5, 2019 00:02
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants