-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Internalize and cleanup recommender project #2451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internalize and cleanup recommender project #2451
Conversation
This reverts commit 9138979.
Codecov Report
@@ Coverage Diff @@
## master #2451 +/- ##
=========================================
Coverage ? 71.21%
=========================================
Files ? 786
Lines ? 140989
Branches ? 16114
=========================================
Hits ? 100408
Misses ? 36112
Partials ? 4469
|
/// <summary> | ||
/// Returns right approximation matrix. | ||
/// </summary> | ||
/// <returns>Flattened into one dimension array with size equal to <see cref="ApproximationRank"/> * <see cref="NumberOfColumns"/></returns> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimension [](start = 40, length = 9)
dimensional #Closed
|
||
/// <summary> | ||
/// Approximation rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approximation rank. [](start = 16, length = 19)
Rank of approximation matrixes? #Closed
private readonly int _approximationRank; | ||
// Packed _numberOfRows by _approximationRank matrix. | ||
///<summary> The number of rows.</summary> | ||
public readonly int NumberOfRows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is one line summary a new convention? I remember we used to have
<summary>
...
</summary>
``` #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still use old one. This one is nice if you have short description.
In reply to: 254906400 [](ancestors = 254906400)
/// Returns left approximation matrix. | ||
/// </summary> | ||
/// <returns>Flattened into one-dimensional array with size equal to <see cref="NumberOfRows"/> * <see cref="ApproximationRank"/></returns> | ||
public float[] GetLeftFactorMatrix() | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any test for the two public functions, GetLeftFactorMatrix and GetRightFactorMatrix? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I modified one of tests see MatrixFactorizationTests.cs
In reply to: 254907007 [](ancestors = 254907007)
/// Returns left approximation matrix. | ||
/// </summary> | ||
/// <returns>Flattened into one-dimensional array with size equal to <see cref="NumberOfRows"/> * <see cref="ApproximationRank"/></returns> | ||
public float[] GetLeftFactorMatrix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float[] [](start = 15, length = 7)
Do we want to use IReadOnlyList
to avoid active copy? #Resolved
|
||
public ColumnType OutputType { get { return NumberType.Float; } } | ||
private ColumnType OutputType { get { return NumberType.Float; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ get { return NumberType.Float; } } [](start = 38, length = 36)
Maybe => NumberType.Float
? #Resolved
|
||
internal MatrixFactorizationPredictor(IHostEnvironment env, SafeTrainingAndModelBuffer buffer, KeyType matrixColumnIndexType, KeyType matrixRowIndexType) | ||
internal MatrixFactorizationModelParameters(IHostEnvironment env, SafeTrainingAndModelBuffer buffer, KeyType matrixColumnIndexType, KeyType matrixRowIndexType) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a public constructor for creating a MatrixFactorizationModelParameters from its public fields? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <remarks> | ||
/// Two values are allowed, <see cref="LossFunctionType.SquareLossRegression"/> or <see cref="LossFunctionType.SquareLossOneClass"/>. | ||
/// The <see cref="LossFunctionType.SquareLossRegression"/> means traditional collaborative filtering problem with squared loss. | ||
/// The <see cref="LossFunctionType.SquareLossOneClass"/> triggers one-class matrix factorization for implicit-feedback recommendation problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add For the first case, please see Equation (1) in [https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf) for the formulation considered. For the second case, please see Equation (3) in [http://yifanhu.net/PUB/cf.pdf
](http://yifanhu.net/PUB/cf.pdf`). #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check them as remarks on loss functions enum?
In reply to: 254911715 [](ancestors = 254911715,254911243)
/// </summary> | ||
/// <remarks> | ||
/// It's the weight of factor matrices' norms in the objective function minimized by matrix factorization's algorithm. A small value could cause over-fitting. | ||
/// </remarks> | ||
[Argument(ArgumentType.AtMostOnce, HelpText = "Regularization parameter. " + | ||
"It's the weight of factor matrices' norms in the objective function minimized by matrix factorization's algorithm. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
norms [](start = 53, length = 5)
Frobenius Norm #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Rank of approximation matrixes. | ||
/// </summary> | ||
/// <remarks> | ||
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m by n [](start = 42, length = 6)
m-by-n #Resolved
/// Rank of approximation matrixes. | ||
/// </summary> | ||
/// <remarks> | ||
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m by k [](start = 91, length = 6)
m-by-k #Resolved
/// Rank of approximation matrixes. | ||
/// </summary> | ||
/// <remarks> | ||
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k by n [](start = 102, length = 6)
k-by-n #Resolved
/// Rank of approximation matrixes. | ||
/// </summary> | ||
/// <remarks> | ||
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approxiation [](start = 120, length = 12)
approximation. #Resolved
|
||
/// <summary> | ||
/// Number of thread will be used during training. If unspecified all aviable threads will be use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thread [](start = 26, length = 6)
threads #Resolved
/// <] | ||
/// ]]></format> | ||
/// </example> | ||
public MatrixFactorizationTrainer MatrixFactorization( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MatrixFactorization [](start = 46, length = 19)
Loss function is an important parameter. Could we have it here? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as soon as I add loss function I need to add rest of option parameters.
We have another overload with options, if user want to specify loss functions, he can use it.
In reply to: 254913045 [](ancestors = 254913045)
_host.Assert(_leftFactorMatrix.Length == _numberOfRows * _approximationRank); | ||
_host.Assert(_rightFactorMatrix.Length == _numberofColumns * _approximationRank); | ||
buffer.Get(out NumberOfRows, out NumberOfColumns, out ApproximationRank, out var leftFactorMatrix, out var rightFactorMatrix); | ||
LeftFactorMatrix = Array.AsReadOnly<float>(leftFactorMatrix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array.AsReadOnly [](start = 31, length = 23)
Not sure if casting is necessary. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
var rightMatrix = model.Model.RightFactorMatrix; | ||
Assert.Equal(leftMatrix.Count, model.Model.NumberOfRows * model.Model.ApproximationRank); | ||
Assert.Equal(rightMatrix.Count, model.Model.NumberOfColumns * model.Model.ApproximationRank); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least Assert.Equal()
the first and the last elements in the returned arrays. #Resolved
}; | ||
|
||
var pipeline = mlContext.Recommendation().Trainers.MatrixFactorization(options); | ||
|
||
// Train a matrix factorization model. | ||
var model = pipeline.Fit(data); | ||
|
||
// Les's validate content of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Les's [](start = 15, length = 5)
Let's
#Resolved
public ColumnType MatrixColumnIndexType { get; } | ||
public ColumnType MatrixRowIndexType { get; } | ||
internal ColumnType MatrixColumnIndexType { get; } | ||
internal ColumnType MatrixRowIndexType { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in Tom's last internalization PR, he left the methods that return the types of the input/output matrices public.
It could make sense that one wants to access that after, say after having loaded a previously trained model to know what type of data it can score. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal string MatrixColumnIndexColumnName { get; } | ||
internal string MatrixRowIndexColumnName { get; } | ||
internal ColumnType MatrixColumnIndexColumnType { get; } | ||
internal ColumnType MatrixRowIndexColumnType { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark here. Although I am not sure if it makes sense to make it available in both places? #ByDesign
/// See <a href="http://yifanhu.net/PUB/cf.pdf">Equation</a> (3). | ||
/// </remarks> | ||
SquareLossOneClass = 12 | ||
}; | ||
|
||
public sealed class Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
Can we have a line of xml here too? #Resolved
@@ -403,7 +483,7 @@ private SafeTrainingAndModelBuffer PrepareBuffer() | |||
/// <param name="validationData">The validation data set.</param> | |||
public MatrixFactorizationPredictionTransformer Train(IDataView trainData, IDataView validationData = null) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this private? Or internal? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this one needed for the case when you train with validation data.
In reply to: 255196397 [](ancestors = 255196397)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var rightMatrix = model.Model.RightFactorMatrix; | ||
Assert.Equal(leftMatrix.Count, model.Model.NumberOfRows * model.Model.ApproximationRank); | ||
Assert.Equal(rightMatrix.Count, model.Model.NumberOfColumns * model.Model.ApproximationRank); | ||
Assert.Equal(leftMatrix[0], (double)0.3091519,5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftMatrix[0] [](start = 25, length = 13)
@wschin do you know why I get
on Mac:
Expected: 0.41953 (rounded from 0.419525027275085)
Actual: 0.30915 (rounded from 0.3091519)
on Linux:
Expected: 0.51506 (rounded from 0.515059947967529)
Actual: 0.30915 (rounded from 0.3091519)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and how you handle that discrepancy between mac/linux and windows before?
In reply to: 255225321 [](ancestors = 255225321)
fixes #2276