Skip to content

Normalization API helpers #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 3, 2018
Merged

Conversation

TomFinley
Copy link
Contributor

In which I introduce some helpers for normalization, and generally try to clean up the code-base. Addresses #433 . Hopefully will be used in #424, though I've changed the code here to use some of it where it made sense.

  • The Microsoft.ML.Data transform had a "hidden" dependency on Microsoft.ML.Transform project via dependency injection, for its existing "helper" for normalization (the console-app centric version). This has been resolved and replaced with direct instantiation. It required moving the normalizer files, however.

  • Introduction of helpers on NormalizeTransform for API-centric operations. (Not necessarily useful directly for console-application/GUI usage.)

  • Some documentation changes on RoleMappedSchema and RoleMappedData, though more non-cosmetic changes I'd expected would come with Direct API: RoleMappedSchema/Data Cleanup, Improvement #445 .

view = CompositeDataLoader.Create(env, loader,
new KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>(null, component));
}
// REVIEW: This verbose constructor should be replaced with zeahmed's enhancements once #405 is committed.
Copy link
Contributor Author

@TomFinley TomFinley Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, can be made more brief with #405. #Closed

view = CompositeDataLoader.Create(env, loader,
new KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>(null, component));
}
// REVIEW: This verbose constructor should be replaced with zeahmed's enhancements once #405 is committed.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zeahmed [](start = 76, length = 7)

I though we have tendency to remove aliases from codebase... #Resolved

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jun 28, 2018

                    groupInfo = ColumnInfo.CreateFromName(data.Schema, convCol.Name, "converted group id");

do we also need this line? it just if you remove labelInfo, groupInfo looks weird if it remaind untouched. #Resolved


Refers to: src/Microsoft.ML.FastTree/FastTree.cs:1361 in 08c9001. [](commit_id = 08c9001, deletion_comment = False)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley
Copy link
Contributor Author

                    groupInfo = ColumnInfo.CreateFromName(data.Schema, convCol.Name, "converted group id");

Yes, I should definitely get rid of it -- it's not actually used for anything.


In reply to: 401122953 [](ancestors = 401122953)


Refers to: src/Microsoft.ML.FastTree/FastTree.cs:1361 in 08c9001. [](commit_id = 08c9001, deletion_comment = False)

view = CompositeDataLoader.Create(env, loader,
new KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>(null, component));
}
IDataView ApplyNormalizer(IHostEnvironment innerEnv, IDataView input)
Copy link
Contributor

@zeahmed zeahmed Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my PR goes in first, you will be able to use convenience constructor here...:) #Closed

Copy link
Contributor Author

@TomFinley TomFinley Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I even had a comment about that. :) But Ivan made me remove it. #Closed

@zeahmed
Copy link
Contributor

zeahmed commented Jun 28, 2018

I have bunch of changes in NormalizeColumn.cs which is being moved to different location in this PR. The best way to avoid extra work for me would be to let PR #405 go first and then this one.

What do you say? Can you hold this PR until #405 is merged? #Closed

/// This contains information about a column in an <see cref="IDataView"/>. It is essentially a convenience cache
/// containing the name, column index, and column type for the column. The intended usage is that users of <see cref="RoleMappedSchema"/>
/// will have a convenient method of getting the index and type without having to separately query it through the <see cref="ISchema"/>,
/// since practically the first thing a consumer of a <see cref="RoleMappedSchema"/> will want to do once they get a mappping is
Copy link
Contributor

@zeahmed zeahmed Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel comment/summary is not complete here as sentence is ending in is? #Resolved

Copy link
Contributor Author

@TomFinley TomFinley Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. You go first then I'll adjust. This work in some ways will be helped by what you're doing as well, as you have noted. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I do that sometimes don't I.


In reply to: 198998270 [](ancestors = 198998270)

/// This method will not modify <paramref name="data"/> if the return from that is <c>null</c> or
/// <c>false</c>.</param>
/// <returns>True if the normalizer was applied and <paramref name="data"/> was modified</returns>
public static bool CreateIfNeeded(IHostEnvironment env, ref RoleMappedData data, ITrainer trainer)
Copy link
Contributor

@zeahmed zeahmed Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I will be using this method in #424 just before calling train method, correct? #Closed

Copy link
Contributor Author

@TomFinley TomFinley Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Also probably before the hypothetical cache-if-needed. #Closed

@TomFinley TomFinley changed the title [WIP] Normalization API helpers Normalization API helpers Jul 2, 2018
Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

{
// REVIEW: The role mapped data has the ability to have multiple columns fill the role of features, which is
// useful in some trainers that are nonetheless parameteric and can therefore benefit from normalization.
var featInfo = schema.Feature;
Copy link
Member

@eerhardt eerhardt Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) check schema for null? #Resolved

@@ -335,6 +335,20 @@ public static bool HasKeyNames(this ISchema schema, int col, int keyCount)
&& type.ItemType.IsText;
}

/// <summary>
/// Returns whether a column has the <see cref="Kinds.IsNormalized"/> metadata set to true.
Copy link
Member

@eerhardt eerhardt Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Do we instead want to document what IsNormalized means? "Returns whether a column <insert what IsNormalized means>". #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add a bit more information. Though that information, I might prefer to make that part of the Kinds static class, since the documentation of those are the primary source of truth. This is just something we added as a convenience on top of that.


In reply to: 199596084 [](ancestors = 199596084)

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some minor questions.

@TomFinley TomFinley force-pushed the tfinley/NormalHelpers branch from d9e0b4d to 3f1e454 Compare July 2, 2018 21:23
@TomFinley TomFinley merged commit 53c2a15 into dotnet:master Jul 3, 2018
@TomFinley TomFinley deleted the tfinley/NormalHelpers branch July 5, 2018 20:07
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
* API conveniences for the Normalize transform
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants