Skip to content

Conversion of Whitening Transform to estimator with pigstensions #1452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Nov 8, 2018

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Oct 30, 2018

Ongoing work on converting the transformers to estimators (#754). This PR completes the conversion of the Whitening transform to estimator (previously a TrainedWrapperEstimator).

: base(env, RegistrationName, Contracts.CheckRef(args, nameof(args)).Column,
input, TestColumn)
internal WhiteningTransform(IHostEnvironment env, Arguments args, IDataView inputData)
: base(Contracts.CheckRef(env, nameof(env)).Register(nameof(WhiteningTransform)), GetColumnPairs(args.Column))
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?
We usually cast arguments to ColumnInfo class and then call Transfrom(env,cols).MakeDataTransform(input) #Resolved

Copy link
Contributor Author

@artidoro artidoro Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I resolved the issue that you are talking about but let me double check.

Now I have:
Transfrom(env,input,cols).MakeDataTransform(input)

I remove the args, but I think I still need the input data? Is this correct?


In reply to: 229459092 [](ancestors = 229459092)

@artidoro artidoro changed the title WIP: Conversion of Whitening Transform to estimator with pigstensions Conversion of Whitening Transform to estimator with pigstensions Oct 30, 2018

internal const string Summary = "Apply PCA or ZCA whitening algorithm to the input.";

public const string LoaderSignature = "WhiteningTransform";
internal const string FriendlyName = "Whitening Transform";
internal const string LoaderSignature = "VectorWhiteningTransform";
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorWhiteningTransform [](start = 49, length = 24)

Why you changing LoaderSignature?
I think this would screw up loading old models.
At least we have LoaderSignatureOld for a reason. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good point I had done a find replace for this, and did not catch that. Will revert to the previous loadersignature.


In reply to: 231671590 [](ancestors = 231671590)

@@ -108,72 +115,95 @@ public bool TryUnparse(StringBuilder sb)
}
}

public sealed class ColInfoEx
public sealed class ColInfo
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColInfo [](start = 28, length = 7)

we call them ColumnInfo everywhere, let's stick to that pattern. #Closed

private readonly ColInfoEx[] _exes;

private const string RegistrationName = "Whitening";
private readonly ColInfo[] _infos;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_infos [](start = 35, length = 6)

across all transformers we call them _columns #Closed

// *** Binary format ***
// <base>
// foreach added column
// ColInfoEx
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColInfoEx [](start = 17, length = 9)

correct it. #Closed

/// <param name="kind">Whitening kind (PCA/ZCA).</param>
/// <param name="eps">Whitening constant, prevents division by zero.</param>
/// <param name="maxRows">Maximum number of rows used to train the transform.</param>
/// <param name="saveInverse">Whether to save inverse (recovery) matrix.</param>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Whether to save inverse (recovery) matrix. [](start = 12, length = 80)

If purpose of this thing is only to tests, don't expose it.
Make it internal readonly property, and allow to set via arguments, but don't expose it in API. #Closed

private readonly Float[][] _models;
// Stores inverse ("recover") matrix as Float[] for each column. Temporarily internal as it's used in unit test.
// Stores whitening matrix as float[] for each column. _models[i] is the whitening matrix of the i-th input column.
private readonly float[][] _models;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_models [](start = 35, length = 7)

This doesn't have to be part of this PR, so feel free to create new issue.
We need to have way which allow user to access internal models of whitening transform and let user apply this internal models to suitable arrays.
Basically we need to have class which would expose FillValues and have inside itself _models values.
Plus we need to have onFit function for pigsty to pass this class out. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a new issue on that.


In reply to: 231675981 [](ancestors = 231675981)

{
long crowData = GetRowCount();
var infos = GetColumnInfos(args);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetColumnInfos [](start = 24, length = 14)

is it worth to have separate one line function? #Closed

}

// Check if the input column's type is supported. Note that only float vector with a known shape is allowed.
internal static string TestColumn(ColumnType t)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestColumn(ColumnType t) [](start = 31, length = 24)

TestType(ColumnType type) #Closed

{
var models = new float[columns.Length][];
var invModels = new float[columns.Length][];
int[] rowCounts;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int[] rowCounts; [](start = 12, length = 16)

simplify it. ( out int[] rowCounts) #Closed

// int: number of column pairs
// foreach column pair
// name of output col
// name of input col
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Closed

}
for (int i = 0; i < _parent.ColumnPairs.Length; i++)
{
var info = _parent._infos[i];
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var info = _parent._infos[i]; [](start = 20, length = 29)

you use info once, is it necessary? #Closed

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

if (crow == 0)
{
var matrixSize = ccol * ccol;
models[iinfo] = new float[matrixSize];
Copy link
Member

@wschin wschin Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no training data, shouldn't the model be an identity matrix? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, I am turning them into identity matrices.


In reply to: 232023973 [](ancestors = 232023973)

models[iinfo] = new float[matrixSize];
invModels[iinfo] = new float[matrixSize];
continue;
}

// If there is no training data, simply initialize the model matrices.
if (crow == 0)
Copy link
Member

@wschin wschin Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated code block? #Resolved

_models[iinfo] = new float[matrixSize];
InvModels[iinfo] = new float[matrixSize];
models[iinfo] = new float[matrixSize];
invModels[iinfo] = new float[matrixSize];
continue;
}

// Compute covariance matrix (sigma).
Copy link
Member

@wschin wschin Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigma appears only here. Could we remove it? #Resolved

@artidoro artidoro merged commit 9f1c22e into dotnet:master Nov 8, 2018
@artidoro artidoro deleted the whitening branch January 5, 2019 00:02
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants