There are two transforms with the Friendly Name "Term Transform" #214

sfilipi · 2018-05-23T17:25:43Z

In the list of entry points there are two identical transforms, one with the name field: "Transforms.TextToKeyConverter" and the other with the name field: "Transforms.Dictionarizer".
Their Friendly Name filed is the same: "Term Transform".

This will be confusing for systems interfacing with ml.net through the entry points; ml.net should not present the same entry point choice more than once.

TomFinley · 2018-05-23T21:54:34Z

What is the proposed final name? Are we happy with either "Dictionarizer" or "TextToKey"? The first seems a bit funky (at least to my eye), but the second is not descriptive since obviously this can be applied to more than just text. So I prefer the first but only because being basically meaningless is preferable to being flat-out misleading and wrong.

justinormont · 2018-05-23T22:47:40Z

Yeah, I was about to say, "TextToKeyConverter" is a bit wrong as it can take in about anything and convert to Key, but @TomFinley beat me to it.

I don't like either one, really. Do we want to expose the concept of keys? If so, "ToKey" is rather terse and hopefully descriptive. I'd say "AnythingToKey", but I don't know that every type, for instance the Picture type, can be converted to a Key type (and the future will bring more types).

To capture the primary use case, "LabelConverter" may be suitable. Scikit calls this a "LabelEncoder": http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

I maybe wrong but Tensorflow seems to not have a specialized transform for this, and uses the categorical featurizers:

categorical_column_with_hash_bucket(...): Represents sparse feature where ids are set by hashing.
categorical_column_with_identity(...): A _CategoricalColumn that returns identity values.
categorical_column_with_vocabulary_file(...): A _CategoricalColumn with a vocabulary file.
categorical_column_with_vocabulary_list(...): A _CategoricalColumn with in-memory vocabulary

TomFinley · 2018-05-24T06:11:24Z

mmm. ToKey... I actually like that name. ToKey. Super simple, fairly descriptive in five characters. Of course, hash is also a "to key." But it's still better than the original name "term."

We sort of have to expose them. They're basically the same things as factors in R, and as far as I can tell you simply can't get around the fact that enumerations into sets is a fairly central concept of ML. Of course whether we actually wind up using "key" in the name, I don't know.

LabelEncoder might be fine, indeed term's first name was "auto-label", but I worry somewhat about what will happen when someone uses the text metatransform, inspects the pipeline, and one of the first things there is them "label-encoding" their feature inputs. :)

jwood803 · 2018-08-22T22:15:45Z

@TomFinley, if it's ok I'd like to work on this. Was ToKey decided to be the name to change one of the Term Transform's to?

TomFinley · 2018-09-08T21:36:13Z

Hi @jwood803 I apologize, I just saw this now!!

I'm not sure it was decided. @justinormont likes it (or at least so I presume from the fact that he suggested it), I like it, but other people that might have opinions on this matter have not weighed in -- the ones I can think of are @Zruty0 , @GalOshri , @eerhardt , @KrzysztofCwalina ...

jwood803 · 2018-09-08T21:41:55Z

No worries, @TomFinley. 😄

I can go ahead and mess with it to get a PR out and we can go from there if that sounds like a plan.

TomFinley · 2018-09-09T20:00:23Z

Hi @jwood803 , mmmmaaaaybe? I would just hate for this to happen, then everyone shoots it down. Some history, we named it term four years ago, and the name stuck, not because anyone liked the name per se or thought it was so great, but because in those intervening four years every time someone suggested a new name for it, someone hated it even worse. (E.g., "dictionarizer..") My prior at this point is to assume everyone will hate any renaming to term.

Let's try this. I'm going to force discussion on the issue by naming my pigsty extension method in #870 to ToKey (while still the underlying classes have Term in their name). If it passes, then we'll consider that @justinormont's suggestion stands, and then the things named Term and whatnot can be renamed.

jwood803 · 2018-09-09T20:33:18Z

Sounds great, @TomFinley! I'll be on the lookout for the discussion. Thanks! 😄

TomFinley · 2018-09-12T20:41:49Z

Hi @jwood803 I think everyone agrees FYI (or at least, they let the PR go in), so let's possibly count on there being no particular disagreement on the point.

justinormont · 2018-09-12T21:10:28Z

@jwood803, @TomFinley : Correct, I like ToKey for being terse and hopefully descriptive.

jwood803 · 2018-09-12T22:09:08Z

@TomFinley @justinormont Awesome! Thanks for the update. I'll start messing with this. Thanks!

shauheen added the bug Something isn't working label May 23, 2018

shauheen added this to the 0518 milestone May 23, 2018

shauheen added good first issue Good for newcomers up-for-grabs A good issue to fix if you are trying to contribute to the project labels May 23, 2018

shauheen modified the milestones: 0518, 0618 May 29, 2018

shauheen modified the milestones: 0618, 0718 Jul 5, 2018

shauheen removed this from the 0718 milestone Aug 4, 2018

shauheen added this to the 0918 milestone Aug 31, 2018

artidoro self-assigned this Sep 7, 2018

TomFinley mentioned this issue Sep 9, 2018

Term estimators in static pipeline #870

Merged

jwood803 mentioned this issue Sep 15, 2018

Fix duplicate transform friendly name #926

Merged

shauheen modified the milestone: 0918 Oct 1, 2018

shauheen closed this as completed Oct 1, 2018

ghost locked as resolved and limited conversation to collaborators Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

There are two transforms with the Friendly Name "Term Transform" #214

There are two transforms with the Friendly Name "Term Transform" #214

sfilipi commented May 23, 2018

TomFinley commented May 23, 2018

Uh oh!

justinormont commented May 23, 2018 •

edited

Loading

Uh oh!

TomFinley commented May 24, 2018

Uh oh!

jwood803 commented Aug 22, 2018

Uh oh!

TomFinley commented Sep 8, 2018

Uh oh!

jwood803 commented Sep 8, 2018

Uh oh!

TomFinley commented Sep 9, 2018 •

edited

Loading

Uh oh!

jwood803 commented Sep 9, 2018

Uh oh!

TomFinley commented Sep 12, 2018

Uh oh!

justinormont commented Sep 12, 2018

Uh oh!

jwood803 commented Sep 12, 2018

Uh oh!

There are two transforms with the Friendly Name "Term Transform" #214

There are two transforms with the Friendly Name "Term Transform" #214

Comments

sfilipi commented May 23, 2018

TomFinley commented May 23, 2018

Uh oh!

justinormont commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomFinley commented May 24, 2018

Uh oh!

jwood803 commented Aug 22, 2018

Uh oh!

TomFinley commented Sep 8, 2018

Uh oh!

jwood803 commented Sep 8, 2018

Uh oh!

TomFinley commented Sep 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jwood803 commented Sep 9, 2018

Uh oh!

TomFinley commented Sep 12, 2018

Uh oh!

justinormont commented Sep 12, 2018

Uh oh!

jwood803 commented Sep 12, 2018

Uh oh!

justinormont commented May 23, 2018 •

edited

Loading

TomFinley commented Sep 9, 2018 •

edited

Loading