-
Notifications
You must be signed in to change notification settings - Fork 1.9k
There are two transforms with the Friendly Name "Term Transform" #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What is the proposed final name? Are we happy with either "Dictionarizer" or "TextToKey"? The first seems a bit funky (at least to my eye), but the second is not descriptive since obviously this can be applied to more than just text. So I prefer the first but only because being basically meaningless is preferable to being flat-out misleading and wrong. |
Yeah, I was about to say, "TextToKeyConverter" is a bit wrong as it can take in about anything and convert to Key, but @TomFinley beat me to it. I don't like either one, really. Do we want to expose the concept of To capture the primary use case, "LabelConverter" may be suitable. Scikit calls this a "LabelEncoder": http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html I maybe wrong but Tensorflow seems to not have a specialized transform for this, and uses the categorical featurizers:
|
mmm. ToKey... I actually like that name. ToKey. Super simple, fairly descriptive in five characters. Of course, hash is also a "to key." But it's still better than the original name "term." We sort of have to expose them. They're basically the same things as factors in R, and as far as I can tell you simply can't get around the fact that enumerations into sets is a fairly central concept of ML. Of course whether we actually wind up using "key" in the name, I don't know. LabelEncoder might be fine, indeed term's first name was "auto-label", but I worry somewhat about what will happen when someone uses the text metatransform, inspects the pipeline, and one of the first things there is them "label-encoding" their feature inputs. :) |
@TomFinley, if it's ok I'd like to work on this. Was |
Hi @jwood803 I apologize, I just saw this now!! I'm not sure it was decided. @justinormont likes it (or at least so I presume from the fact that he suggested it), I like it, but other people that might have opinions on this matter have not weighed in -- the ones I can think of are @Zruty0 , @GalOshri , @eerhardt , @KrzysztofCwalina ... |
No worries, @TomFinley. 😄 I can go ahead and mess with it to get a PR out and we can go from there if that sounds like a plan. |
Hi @jwood803 , mmmmaaaaybe? I would just hate for this to happen, then everyone shoots it down. Some history, we named it Let's try this. I'm going to force discussion on the issue by naming my pigsty extension method in #870 to |
Sounds great, @TomFinley! I'll be on the lookout for the discussion. Thanks! 😄 |
Hi @jwood803 I think everyone agrees FYI (or at least, they let the PR go in), so let's possibly count on there being no particular disagreement on the point. |
@jwood803, @TomFinley : Correct, I like |
@TomFinley @justinormont Awesome! Thanks for the update. I'll start messing with this. Thanks! |
In the list of entry points there are two identical transforms, one with the name field: "Transforms.TextToKeyConverter" and the other with the name field: "Transforms.Dictionarizer".
Their Friendly Name filed is the same: "Term Transform".
This will be confusing for systems interfacing with ml.net through the entry points; ml.net should not present the same entry point choice more than once.
The text was updated successfully, but these errors were encountered: