Skip to content

Converted LpNorm, GcNorm and Whitening transforms into transformers/estimators… #961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 21, 2018

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Sep 20, 2018

This PR fixes #959

@@ -88,5 +88,40 @@ public void TextTokenizationWorkout()
CheckEquality("Text", "tokenized.tsv");
Done();
}

[Fact]
public void LpGcNormAndWhiteningWorkout()
Copy link
Contributor Author

@zeahmed zeahmed Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LpGcNormAndWhiteningWorkout [](start = 20, length = 27)

Test is failing right now. There is data mismatch. Looking into it. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also move out of text featurizer tests


In reply to: 219011012 [](ancestors = 219011012)

@Zruty0 Zruty0 mentioned this pull request Sep 20, 2018
/// <param name="subMean">Subtract mean from each value before normalizing.</param>
/// <param name="useStdDev">Normalize by standard deviation rather than L2 norm.</param>
/// <param name="scale">Scale features by this value.</param>
public static Vector<float> GcNormalize(this Vector<float> input,
Copy link
Contributor

@Zruty0 Zruty0 Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GcNormalize [](start = 36, length = 11)

While 'Lp' is fine (it's a mathematical notation L_p), 'Gc' is probably not. 'GlobalContrast' ? Does it even make sense to have both as extensions though? They seem to share a lot in common #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they have different set of parameters and mixing them into one would not look good. I have renamed GcNormalize to GlobalContrastNormalize.


In reply to: 219260214 [](ancestors = 219260214)

/// <param name="maxRows">Max number of rows.</param>
/// <param name="saveInverse">Whether to save inverse (recovery) matrix.</param>
/// <param name="pcaNum">PCA components to retain.</param>
public static Vector<float> Whiten(this Vector<float> input,
Copy link
Contributor

@Zruty0 Zruty0 Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whiten [](start = 36, length = 6)

this sounds like a wacky thing to do to an arbitrary float array, is it not?
Do you think there is a good name that's more elaborate?

And if not, maybe consider not having the pigstension at all? #Resolved

var dataSource = new MultiFileSource(dataPath);

var reader = TextLoader.CreateReader(env,
c => (label: c.LoadFloat(11), features: c.LoadFloat(0, 10)),
Copy link
Contributor

@Zruty0 Zruty0 Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 [](start = 41, length = 2)

these are 0-based, are you sure? #Closed

Copy link
Contributor Author

@zeahmed zeahmed Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are i think.

features are from 0-10 and label is 11. Do you mean something else?


In reply to: 219263470 [](ancestors = 219263470)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it's my mistake


In reply to: 219289826 [](ancestors = 219289826,219263470)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zeahmed !

@TomFinley TomFinley merged commit e66e79b into dotnet:master Sep 21, 2018
@zeahmed zeahmed deleted the LpNorm_estimator branch January 30, 2019 21:31
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert LpNorm, GcNorm and Whitening Transforms to transformers/estimators.
3 participants