Skip to content

More Normalizer Scrubbing #2888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 14, 2019
Merged

More Normalizer Scrubbing #2888

merged 13 commits into from
Mar 14, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Mar 8, 2019

Follow up of #2865 so should fix #2928. Some public APIs were missed.

@wschin wschin requested review from artidoro, TomFinley and sfilipi March 8, 2019 06:04
@wschin wschin self-assigned this Mar 8, 2019
@wschin wschin changed the title More Normalizer Scrubbing More Normalizer Scrubbing (Just Renaming) Mar 8, 2019
@wschin wschin force-pushed the more-norm-scrub branch 2 times, most recently from 5709476 to d22809e Compare March 8, 2019 06:44
@wschin wschin force-pushed the more-norm-scrub branch from d22809e to 8c0dd58 Compare March 8, 2019 07:14
@codecov
Copy link

codecov bot commented Mar 8, 2019

Codecov Report

Merging #2888 into master will increase coverage by <.01%.
The diff coverage is 86.79%.

@@            Coverage Diff             @@
##           master    #2888      +/-   ##
==========================================
+ Coverage   72.18%   72.19%   +<.01%     
==========================================
  Files         796      796              
  Lines      142020   142032      +12     
  Branches    16046    16046              
==========================================
+ Hits       102523   102533      +10     
- Misses      35116    35117       +1     
- Partials     4381     4382       +1
Flag Coverage Δ
#Debug 72.19% <86.79%> (ø) ⬆️
#production 67.97% <82.67%> (ø) ⬆️
#test 88.3% <100%> (ø) ⬆️
Impacted Files Coverage Δ
...rosoft.ML.Transforms/FourierDistributionSampler.cs 84.16% <ø> (ø) ⬆️
test/Microsoft.ML.Tests/Transformers/RffTests.cs 100% <100%> (ø) ⬆️
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 93.62% <100%> (ø) ⬆️
...Microsoft.ML.Data/Transforms/NormalizeColumnDbl.cs 66.3% <100%> (ø) ⬆️
...soft.ML.StaticPipe/LpNormalizerStaticExtensions.cs 100% <100%> (ø) ⬆️
...rosoft.ML.StaticPipelineTesting/StaticPipeTests.cs 95.27% <100%> (ø) ⬆️
...s/Scenarios/Api/CookbookSamples/CookbookSamples.cs 99.49% <100%> (ø) ⬆️
src/Microsoft.ML.StaticPipe/TransformsStatic.cs 68.28% <100%> (ø) ⬆️
...Microsoft.ML.Data/Transforms/NormalizeColumnSng.cs 72.89% <100%> (ø) ⬆️
...osoft.ML.Functional.Tests/IntrospectiveTraining.cs 100% <100%> (ø) ⬆️
... and 41 more

@TomFinley TomFinley requested a review from rogancarr March 10, 2019 01:16
@@ -102,35 +102,35 @@ internal static ColumnOptionsBase Create(string outputColumnName, string inputCo
}
}

public abstract class FixZeroColumnOptionsBase : ColumnOptionsBase
public abstract class ControlZeroColumnOptionsBase : ColumnOptionsBase
Copy link
Member

@sfilipi sfilipi Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ControlZeroColumnOptionsBase [](start = 30, length = 28)

why change this? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixZero is a bool option applied to zeros. Those zeros may not be fixed if FixZero=false. Thus, I feel ControlZero sounds more appropriate.


In reply to: 264105076 [](ancestors = 264105076)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that FixZero makes more sense because it's "fixing" something in place (and not "fixing" a problem). It is a bit more precise than "control".


In reply to: 264343033 [](ancestors = 264343033,264105076)

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixZero means zeros will not be changed right? However, if FixZero=false, zeros will be mapped to a data-dependent value (e.g., mean). It looks like we are able to change zeros to infinite amount of possibilities.


In reply to: 264825031 [](ancestors = 264825031,264343033,264105076)

Copy link
Contributor

@rogancarr rogancarr Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixZero=true means that I want to fix the value of zero in place. FixZero=false means don't fix zero in place. That seems consistent, doesn't it? ControlZero doesn't convey much to me.


In reply to: 264868717 [](ancestors = 264868717,264825031,264343033,264105076)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not only for that flag's name. When appearing in the name of FixZeroColumnOptionsBase, no one knows FixZero is a Boolean variable. Your judgement is valid only when FixZero is a field.


In reply to: 265342165 [](ancestors = 265342165,264868717,264825031,264343033,264105076)

@@ -36,7 +36,7 @@ namespace Microsoft.ML.Transforms
/// This transformation is based on this paper by
/// <a href="http://pages.cs.wisc.edu/~brecht/papers/07.rah.rec.nips.pdf">Rahimi and Recht</a>.
/// </summary>
public sealed class RandomFourierExpansionTransformer : OneToOneTransformerBase
public sealed class RandomFourierKernelTransformer : OneToOneTransformerBase
Copy link
Contributor

@artidoro artidoro Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomFourierKernelTransformer [](start = 24, length = 30)

I am not convinced about this change. I know what Fourier expansion is, but I have tried looking up Fourier Kernel and I have not found anything.
Also it does not seem to fit with the summary above. #Resolved

Copy link
Member Author

@wschin wschin Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to search for Random Fourier Kernel Approximation. Here is a scikit-learn example: https://scikit-learn.org/stable/modules/kernel_approximation.html


In reply to: 264411188 [](ancestors = 264411188)

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will rename RandomFourierKernel to ApproximatedKernel everywhere (in iteration 5).


In reply to: 264443488 [](ancestors = 264443488,264411188)

@@ -71,15 +71,15 @@ public static class NormalizationCatalog
/// </format>
/// </example>
public static LpNormalizingEstimator LpNormalize(this TransformsCatalog catalog, string outputColumnName, string inputColumnName = null,
LpNormalizingEstimatorBase.NormFunction normKind = LpNormalizingEstimatorBase.Defaults.Norm, bool ensureZeroMean = LpNormalizingEstimatorBase.Defaults.LpEnsureZeroMean)
=> new LpNormalizingEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, inputColumnName, normKind, ensureZeroMean);
LpNormalizingEstimatorBase.NormFunction norm = LpNormalizingEstimatorBase.Defaults.Norm, bool ensureZeroMean = LpNormalizingEstimatorBase.Defaults.LpEnsureZeroMean)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norm [](start = 52, length = 4)

Do you think it make sense to unify names in this one, and one in TextFeaturizingEstimator (it's VectorNormalizer) #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their ranges are different. TextFeaturizingEstimator contains None but LpNormalizer doesn't, but it doesn't make sense to add None to the enum of LpNormalizer.


In reply to: 264822764 [](ancestors = 264822764)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case, I'm not taking about type of this parameter, I'm talking about it's name.


In reply to: 264872101 [](ancestors = 264872101,264822764)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use norm in TextFeaturizingEstimator.


In reply to: 264873318 [](ancestors = 264873318,264872101,264822764)

@@ -595,7 +595,7 @@ As a general rule, *if you use a parametric learner, you need to make sure your

ML.NET offers several built-in scaling algorithms, or 'normalizers':
- MinMax normalizer: for each feature, we learn the minimum and maximum value of it, and then linearly rescale it so that the values fit between -1 and 1.
- MeanVar normalizer: for each feature, compute the mean and variance, and then linearly rescale it to zero-mean, unit-variance.
- MeanVariance normalizer: for each feature, compute the mean and variance, and then linearly rescale it to zero-mean, unit-variance.
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MeanVariance [](start = 2, length = 12)

In the field, we usually use the term Standardize to reflect this normalization technique. This is very "statistics-y", but it does seem to be standard. How would everyone feel about changing "MeanVariance" to "Standardize", or at least offer a "Standardize" alias? #Resolved

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel MVN is super bad because it's already an operator in neural networks (e.g., ONNX, Caffe, CoreML).
Just for references:
https://apple.github.io/coremltools/coremlspecification/sections/NeuralNetwork.html#meanvariancenormalizelayerparams
https://github.com/onnx/onnx/blob/master/docs/Operators.md#MeanVarianceNormalization


In reply to: 264826182 [](ancestors = 264826182)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Plus it's more precise than Standardize. Let's keep it MVN. We can always add an alias for Standardize if people are up in arms.


In reply to: 264891367 [](ancestors = 264891367,264826182)

@@ -55,7 +55,7 @@ public static void Example()
//0.165 0.117 -0.547 0.014

// A pipeline to project Features column into L-p normalized vector.
var lpNormalizePipeline = ml.Transforms.LpNormalize(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), normKind: Transforms.LpNormalizingEstimatorBase.NormFunction.L1);
var lpNormalizePipeline = ml.Transforms.LpNormalize(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), norm: Transforms.LpNormalizingEstimatorBase.NormFunction.L1);
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norm [](start = 134, length = 4)

We should be using long-form names here. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. LpNormalize will be replaced by LpNormNormalize.


In reply to: 264826528 [](ancestors = 264826528)

/// <remarks>Note that the statistics gathering and normalization is done independently per slot of the
/// vector values.
/// Note that if values are later transformed that are lower than the minimum, or higher than the maximum,
/// observed during fitting, that the output values may be outside the range of -1 to 1.</remarks>
/// <returns>The normalized column.</returns>
public static NormVector<float> Normalize(
this Vector<float> input, bool fixZero = FZ, long maxTrainingExamples = MaxTrain,
this Vector<float> input, bool ensureZeroUntouched = FZ, long maximumExampleCount = MaxTrain,
Copy link
Member

@sfilipi sfilipi Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureZeroUntouched [](start = 43, length = 19)

it is still fixZero in the arguments. #Resolved

@@ -1550,18 +1550,18 @@ public static Vector<float> FeaturizeText(this Scalar<string> input, Scalar<stri
}
}

public static class RffStaticExtenensions
public static class RandomFourierKernelMappingStaticExtenensions
Copy link
Member

@sfilipi sfilipi Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomFourierKernelMappingStaticExtenensions [](start = 24, length = 44)

we decided to go with acronyms #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would be RFKMStaticExtensions?


In reply to: 264836908 [](ancestors = 264836908)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore. Its acronyms will be AKM which is not a usual term. At least, it's my first time of seeing that.


In reply to: 264869094 [](ancestors = 264869094,264836908)

/// <param name="useCosAndSinBases">If <see langword="true"/>, use both of cos and sin basis functions to create two features for every random Fourier frequency.
/// Otherwise, only cos bases would be used.</param>
/// <param name="generator">Which kernel to use. (if it is null, <see cref="GaussianKernel"/> is used.)</param>
/// <param name="seed">The seed of the random number generator for generating the new features. If not specified global random would be used.</param>
public static Vector<float> LowerVectorSizeWithRandomFourierTransformation(this Vector<float> input,
int dimension = RandomFourierKernelMappingEstimator.Defaults.Rank, bool useCosAndSinBases = RandomFourierKernelMappingEstimator.Defaults.UseCosAndSinBases,
public static Vector<float> RandomFourierKernelMapper(this Vector<float> input,
Copy link
Member

@sfilipi sfilipi Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomFourierKernelMapper [](start = 36, length = 25)

shall we go with RfKernelMappingEstimator #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Artidoro. We feel ApproximatedKernel is a better name for describing this transform's behavior.


In reply to: 264837298 [](ancestors = 264837298)

}
}

public static class PcaEstimatorExtensions
Copy link
Member

@sfilipi sfilipi Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PcaEstimatorExtensions [](start = 24, length = 22)

keeper #Resolved

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What? I guess you mean to have PcaStaticExtensions.


In reply to: 264837387 [](ancestors = 264837387)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks.


In reply to: 264918309 [](ancestors = 264918309,264837387)

{
public readonly bool FixZero;

private protected FixZeroColumnOptionsBase(string outputColumnName, string inputColumnName, long maxTrainingExamples, bool fixZero)
: base(outputColumnName, inputColumnName, maxTrainingExamples)
private protected ControlZeroColumnOptionsBase(string outputColumnName, string inputColumnName, long maximumExampleCount, bool fixZero)
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ControlZeroColumnOptionsBase [](start = 30, length = 28)

Ditto above comment; I prefer FixZero. #Pending

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeros are not fixed. It floats around with different data sets. At my first glance of seeing FixZeroColumnOptionsBase, I expected that this column will fix my 0s as they are because otherwise it should be called FixOrChangeZeroColumnOptionsBase (or just ControlZeroColumnOptionsBase because it offers a special control over zeros).


In reply to: 264865051 [](ancestors = 264865051)

@wschin wschin changed the title More Normalizer Scrubbing (Just Renaming) More Normalizer Scrubbing Mar 12, 2019
/// <param name="subMean">Subtract mean from each value before normalizing.</param>
public static Vector<float> LpNormalize(this Vector<float> input,
LpNormalizingEstimatorBase.NormFunction normKind = LpNormalizingEstimatorBase.Defaults.Norm,
bool subMean = LpNormalizingEstimatorBase.Defaults.LpEnsureZeroMean) => new OutPipelineColumn(input, normKind, subMean);
LpNormalizingEstimatorBase.NormFunction norm = LpNormalizingEstimatorBase.Defaults.Norm,
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norm [](start = 52, length = 4)

Norm is not usually used as a stand-alone. Perhaps normalization? #Resolved

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe no, how would user call ||x||_1? A norm I'd expect.


In reply to: 264866893 [](ancestors = 264866893)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's stick with norm.


In reply to: 264918622 [](ancestors = 264918622,264866893)

/// <remarks>Note that the statistics gathering and normalization is done independently per slot of the
/// vector values.
/// Note that if values are later transformed that are lower than the minimum, or higher than the maximum,
/// observed during fitting, that the output values may be outside the range of -1 to 1.</remarks>
/// <returns>The normalized column.</returns>
public static NormVector<float> Normalize(
this Vector<float> input, bool fixZero = FZ, long maxTrainingExamples = MaxTrain,
this Vector<float> input, bool ensureZeroUntouched = FZ, long maximumExampleCount = MaxTrain,
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureZeroUntouched [](start = 43, length = 19)

I'd prefer to be consistent throughout the APIs on what we call this. Right now, there are three names that this parameter goes by. #Resolved

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool fixZero are all ensureZeroUntouched now.


In reply to: 264867559 [](ancestors = 264867559)

@@ -16,61 +16,61 @@ namespace Microsoft.ML.StaticPipe
/// </summary>
public static class NormalizerStaticExtensions
{
private const long MaxTrain = NormalizingEstimator.Defaults.MaxTrainingExamples;
private const long MaxTrain = NormalizingEstimator.Defaults.MaximumExampleCount;
private const bool FZ = NormalizingEstimator.Defaults.FixZero;
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we in-line these and just break lines in the code? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


In reply to: 264868132 [](ancestors = 264868132)

OnFitAffine<ImmutableArray<float>> onFit = null)
{
return NormalizeByMVCdfCore(input, fixZero, useLog, false, maxTrainingExamples, AffineMapper(onFit));
return NormalizeByMVCdfCore(input, ensureZeroUntouched, useLog, false, maximumExampleCount, AffineMapper(onFit));
}

/// <summary>
/// Learns an affine function based on the observed mean and standard deviation. This is less susceptible
/// to outliers as compared to <see cref="Normalize(Vector{double}, bool, long, OnFitAffine{ImmutableArray{double}})"/>.
/// </summary>
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be expanded upon at some time. Maybe in the docs go-around. #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree but not this huge PR.


In reply to: 264868709 [](ancestors = 264868709)

@@ -71,15 +71,15 @@ public static class NormalizationCatalog
/// </format>
/// </example>
public static LpNormalizingEstimator LpNormalize(this TransformsCatalog catalog, string outputColumnName, string inputColumnName = null,
Copy link
Contributor

@rogancarr rogancarr Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name isn't the best.... #Resolved

Copy link
Member Author

@wschin wschin Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about LpNormNormalize?


In reply to: 264869544 [](ancestors = 264869544)

@rogancarr
Copy link
Contributor

rogancarr commented Mar 12, 2019

              ],

Don't forget to add an alias for the old name when you change the name. You will also need to update the [arguments] in the original class to have this alias as well. #Resolved


Refers to: test/BaselineOutput/Common/EntryPoints/core_manifest.json:16559 in ed6e8b3. [](commit_id = ed6e8b3, deletion_comment = False)

@wschin
Copy link
Member Author

wschin commented Mar 12, 2019

              ],

Fixed. Thank you.


In reply to: 472168174 [](ancestors = 472168174)


Refers to: test/BaselineOutput/Common/EntryPoints/core_manifest.json:16559 in ed6e8b3. [](commit_id = ed6e8b3, deletion_comment = False)

@@ -383,7 +383,9 @@ public ColumnFunctionAccessor(ImmutableArray<ColumnOptions> infos)
[BestFriend]
internal readonly IReadOnlyList<IColumnFunction> ColumnFunctions;

public readonly ImmutableArray<ColumnOptions> Columns;
[BestFriend]
internal readonly ImmutableArray<ColumnOptions> Columns;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableArray [](start = 26, length = 14)

@rogancarr I've ask @wschin to hide this right now, for a reason we don't have common idea of how to do it. Hope you Ok with that. We can always expose it in v1.1
#Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to leave this public for this transform, as there is a user use-case.
Or we need to come up with another way to expose the model parameters for the normalizer, but we cannot hide them.


In reply to: 264910410 [](ancestors = 264910410)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is public now.


In reply to: 265306118 [](ancestors = 265306118,264910410)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can add a comment about not changing this back to internal, as it has customer usage.

I'll add a functional test about it


In reply to: 265317859 [](ancestors = 265317859,265306118,264910410)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a test added by Rogan.


In reply to: 265323244 [](ancestors = 265323244,265317859,265306118,264910410)

MinMaxNormalized: r.Features.Normalize(fixZero: true),
MeanVarNormalized: r.Features.NormalizeByMeanVar(fixZero: false),
MinMaxNormalized: r.Features.Normalize(ensureZeroUntouched: true),
MeanVarNormalized: r.Features.NormalizeByMeanVariance(ensureZeroUntouched: false),
Copy link
Member

@sfilipi sfilipi Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureZeroUntouched [](start = 74, length = 19)

nit: ensureZeroUnchanged #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the word touched, because it makes feel that Normalizer is a cute animal which sometime plays with zeros.


In reply to: 265304319 [](ancestors = 265304319)

bool subMean = LpNormalizingEstimatorBase.Defaults.LpEnsureZeroMean) => new OutPipelineColumn(input, normKind, subMean);
/// <param name="norm">Type of norm to use to normalize each sample.</param>
/// <param name="ensureZeroMean">Subtract mean from each value before normalizing.</param>
public static Vector<float> LpNormNormalize(this Vector<float> input,
Copy link
Member

@sfilipi sfilipi Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LpNormNormalize [](start = 36, length = 15)

is it too bad to keep it as is? NormNormalizing feels like a tongue twist :) #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was suggested by Rogan for alignment with other normalizations. Let me try NormalizeLpNorm.


In reply to: 265323674 [](ancestors = 265323674)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like that better, personally


In reply to: 265327419 [](ancestors = 265327419,265323674)

Copy link
Contributor

@rogancarr rogancarr Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LpNormalizer normalizes a vector in a column and not columns in an IDataView, so maybe we should just call it NormalizeVector.


In reply to: 265328368 [](ancestors = 265328368,265327419,265323674)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several normalizers around. We can't drop things like LpNorm and GlobalContrast.


In reply to: 265340604 [](ancestors = 265340604,265328368,265327419,265323674)

@@ -47,16 +47,16 @@ public Reconciler(LpNormalizingEstimatorBase.NormFunction normKind, bool ensureZ
foreach (var outCol in toOutput)
pairs.Add((outputNames[outCol], inputNames[((OutPipelineColumn)outCol).Input]));

return new LpNormalizingEstimator(env, pairs.ToArray(), _norm, _ensureZeroMean);
return new LpNormNormalizingEstimator(env, pairs.ToArray(), _norm, _ensureZeroMean);
}
}

/// <include file='../Microsoft.ML.Transforms/doc.xml' path='doc/members/member[@name="LpNormalize"]/*'/>
/// <param name="input">The column to apply to.</param>
Copy link
Contributor

@rogancarr rogancarr Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column to apply to. [](start = 32, length = 23)

Isn't this a row? Oh, I see. This should say something like "The column containing the vectors to apply the normalization to." #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.


In reply to: 265340830 [](ancestors = 265340830)

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with some suggestions on naming.

@@ -384,7 +384,11 @@ public ColumnFunctionAccessor(ImmutableArray<ColumnOptions> infos)
[BestFriend]
internal readonly IReadOnlyList<IColumnFunction> ColumnFunctions;

/// <summary>
/// The configuration of the normalizer. The i-th element describes the i-th input-output column pair.
/// </summary>
public readonly ImmutableArray<ColumnOptions> Columns;
Copy link
Contributor

@artidoro artidoro Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnOptions [](start = 39, length = 13)

Since you are doing the Normalizer scrubbing, we need to rename ColumnOptions here. It really does not make sense for it to be named like this.
#2854 #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like that thread converges already..


In reply to: 265349922 [](ancestors = 265349922)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@wschin wschin merged commit dde909a into dotnet:master Mar 14, 2019
@wschin wschin deleted the more-norm-scrub branch March 14, 2019 00:25
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrubbing normalization transforms
5 participants