-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Conversion of Whitening Transform to estimator with pigstensions #1452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
: base(env, RegistrationName, Contracts.CheckRef(args, nameof(args)).Column, | ||
input, TestColumn) | ||
internal WhiteningTransform(IHostEnvironment env, Arguments args, IDataView inputData) | ||
: base(Contracts.CheckRef(env, nameof(env)).Register(nameof(WhiteningTransform)), GetColumnPairs(args.Column)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
We usually cast arguments to ColumnInfo class and then call Transfrom(env,cols).MakeDataTransform(input) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I resolved the issue that you are talking about but let me double check.
Now I have:
Transfrom(env,input,cols).MakeDataTransform(input)
I remove the args, but I think I still need the input data? Is this correct?
In reply to: 229459092 [](ancestors = 229459092)
test/Microsoft.ML.StaticPipelineTesting/Microsoft.ML.StaticPipelineTesting.csproj
Show resolved
Hide resolved
|
||
internal const string Summary = "Apply PCA or ZCA whitening algorithm to the input."; | ||
|
||
public const string LoaderSignature = "WhiteningTransform"; | ||
internal const string FriendlyName = "Whitening Transform"; | ||
internal const string LoaderSignature = "VectorWhiteningTransform"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorWhiteningTransform [](start = 49, length = 24)
Why you changing LoaderSignature?
I think this would screw up loading old models.
At least we have LoaderSignatureOld for a reason. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok good point I had done a find replace for this, and did not catch that. Will revert to the previous loadersignature.
In reply to: 231671590 [](ancestors = 231671590)
@@ -108,72 +115,95 @@ public bool TryUnparse(StringBuilder sb) | |||
} | |||
} | |||
|
|||
public sealed class ColInfoEx | |||
public sealed class ColInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColInfo [](start = 28, length = 7)
we call them ColumnInfo everywhere, let's stick to that pattern. #Closed
private readonly ColInfoEx[] _exes; | ||
|
||
private const string RegistrationName = "Whitening"; | ||
private readonly ColInfo[] _infos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_infos [](start = 35, length = 6)
across all transformers we call them _columns
#Closed
// *** Binary format *** | ||
// <base> | ||
// foreach added column | ||
// ColInfoEx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColInfoEx [](start = 17, length = 9)
correct it. #Closed
/// <param name="kind">Whitening kind (PCA/ZCA).</param> | ||
/// <param name="eps">Whitening constant, prevents division by zero.</param> | ||
/// <param name="maxRows">Maximum number of rows used to train the transform.</param> | ||
/// <param name="saveInverse">Whether to save inverse (recovery) matrix.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Whether to save inverse (recovery) matrix. [](start = 12, length = 80)
If purpose of this thing is only to tests, don't expose it.
Make it internal readonly property, and allow to set via arguments, but don't expose it in API. #Closed
private readonly Float[][] _models; | ||
// Stores inverse ("recover") matrix as Float[] for each column. Temporarily internal as it's used in unit test. | ||
// Stores whitening matrix as float[] for each column. _models[i] is the whitening matrix of the i-th input column. | ||
private readonly float[][] _models; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_models [](start = 35, length = 7)
This doesn't have to be part of this PR, so feel free to create new issue.
We need to have way which allow user to access internal models of whitening transform and let user apply this internal models to suitable arrays.
Basically we need to have class which would expose FillValues and have inside itself _models values.
Plus we need to have onFit function for pigsty to pass this class out. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
long crowData = GetRowCount(); | ||
var infos = GetColumnInfos(args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetColumnInfos [](start = 24, length = 14)
is it worth to have separate one line function? #Closed
} | ||
|
||
// Check if the input column's type is supported. Note that only float vector with a known shape is allowed. | ||
internal static string TestColumn(ColumnType t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestColumn(ColumnType t) [](start = 31, length = 24)
TestType(ColumnType type) #Closed
{ | ||
var models = new float[columns.Length][]; | ||
var invModels = new float[columns.Length][]; | ||
int[] rowCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int[] rowCounts; [](start = 12, length = 16)
simplify it. ( out int[] rowCounts
) #Closed
// int: number of column pairs | ||
// foreach column pair | ||
// name of output col | ||
// name of input col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
for (int i = 0; i < _parent.ColumnPairs.Length; i++) | ||
{ | ||
var info = _parent._infos[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var info = _parent._infos[i]; [](start = 20, length = 29)
you use info once, is it necessary? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (crow == 0) | ||
{ | ||
var matrixSize = ccol * ccol; | ||
models[iinfo] = new float[matrixSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no training data, shouldn't the model be an identity matrix? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, I am turning them into identity matrices.
In reply to: 232023973 [](ancestors = 232023973)
models[iinfo] = new float[matrixSize]; | ||
invModels[iinfo] = new float[matrixSize]; | ||
continue; | ||
} | ||
|
||
// If there is no training data, simply initialize the model matrices. | ||
if (crow == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeated code block? #Resolved
_models[iinfo] = new float[matrixSize]; | ||
InvModels[iinfo] = new float[matrixSize]; | ||
models[iinfo] = new float[matrixSize]; | ||
invModels[iinfo] = new float[matrixSize]; | ||
continue; | ||
} | ||
|
||
// Compute covariance matrix (sigma). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigma
appears only here. Could we remove it? #Resolved
Ongoing work on converting the transformers to estimators (#754). This PR completes the conversion of the Whitening transform to estimator (previously a TrainedWrapperEstimator).