Skip to content

Matrix Factorization XML docs #3409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 20, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 74 additions & 46 deletions src/Microsoft.ML.Recommender/MatrixFactorizationTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,39 @@
namespace Microsoft.ML.Trainers
{
/// <summary>
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
/// The <see cref="IEstimator{TTransformer}"/> to predict elements in a matrix using matrix factorization (also known as a type of collaborative filtering).
/// </summary>
/// <remarks>
/// <para>The basic idea of matrix factorization is finding two low-rank factor marcies to apporimate the training matrix.
/// In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
/// and the value at the location specified by the two indexes. For an example data structure of a tuple, one can use:
/// </para>
/// <code language="csharp">
/// <format type="text/markdown"><![CDATA[
/// To create this trainer, use [MatrixFactorization](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(System.String,System.String,System.String,System.Int32,System.Double,System.Int32))
/// or [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
///
/// ### Input and Output Columns
/// There are three input columns required, one for matrix row indexes, one for matrix column indexes, and one for
/// values (i.e., labels) in matrix.
/// They together define a matrix in [COO](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) format.
/// The type for label column is a vector of <xref:System.Single> while the other two columns are
/// [key-typed](<xref:Microsoft.ML.Data.KeyDataViewType>) scalar.
///
/// | Output Column Name | Column Type | Description|
/// | -- | -- | -- |
/// | `Score` | <xref:System.Single> | The predicted matrix value at the location specified by input columns (row index column and column index column). |
///
/// ### Trainer Characteristics
/// | | |
/// | -- | -- |
/// | Machine learning task | Recommender systems |
/// | Is normalization required? | Yes |
/// | Is caching required? | Yes |
/// | Required NuGet in addition to Microsoft.ML | Microsoft.ML.Recommender |
///
/// ### Background
/// The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.
/// In this module, the expected training data (the factorized matrix) is a list of tuples.
/// Every tuple consists of a column index, a row index,
/// and the value at the location specified by the two indices. For an example data structure of a tuple, one can use:
///
/// ```csharp
/// // The following variables defines the shape of a m-by-n matrix. Indexes start with 0; that is, our indexing system
/// // is 0-based.
/// const int m = 60;
Expand All @@ -48,41 +73,41 @@ namespace Microsoft.ML.Trainers
/// // The rating at the MatrixColumnIndex-th column and the MatrixRowIndex-th row.
/// public float Value;
/// }
/// </code>
/// <para> Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill <i>missing values</i>.
/// This behavior is very helpful when building recommender systems.</para>
/// <para>To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example.
/// Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users. That is,
/// rating <i>r</i> at row <i>r</i> and column <i>v</i> means that user <i>u</i> give <i>r</i> to item <i>v</i>.
/// An imcomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs).
/// Assume that<i>R</i> is a m-by-n rating matrix and the rank of the two factor matrices are<i>P</i> (m-by-k matrix) and <i>Q</i> (n-by-k matrix), where k is the approximation rank.
/// The predicted rating at the u-th row and the v-th column in <i>R</i> would be the inner product of the u-th row of P and the v-th row of Q; that is,
/// <i>R</i> is approximated by the product of <i>P</i>'s transpose and <i>Q</i>. This trainer implements
/// <a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf'>a stochastic gradient method</a> for finding <i>P</i>
/// and <i>Q</i> via minimizing the distance between<i> R</i> and the product of <i>P</i>'s transpose and Q.</para>.
/// <para>The underlying library used in ML.NET matrix factorization can be found on <a href='https://github.com/cjlin1/libmf'>a Github repository</a>. For users interested in the mathematical details, please see the references below.</para>
/// <list type = 'bullet'>
/// <item>
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_journal.pdf' > A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems</a></description>
/// </item>
/// <item>
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf' > A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization</a></description>
/// </item>
/// <item>
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf' > LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems</a></description>
/// </item>
/// <item>
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf' > Selection of Negative Samples for One-class Matrix Factorization</a></description>
/// </item>
/// </list>
/// </remarks>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[MF](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Recommendation/MatrixFactorization.cs)]
/// ```
///
/// Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill <i>missing values</i>.
/// This behavior is very helpful when building recommender systems.
///
/// To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example.
/// Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users.
/// That is, rating $r$ at row $u$ and column $v$ means that user $u$ give $r$ to item $v$.
/// An incomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs).
/// Assume that $R\in{\mathbb R}^{m\times n}$ is a m-by-n rating matrix and the [rank](https://en.wikipedia.org/wiki/Rank_(linear_algebra)) of the two factor matrices are $P\in {\mathbb R}^{k\times m}$ and $Q\in {\mathbb R}^{k\times n}$, where $k$ is the approximation rank.
/// The predicted rating at the $u$-th row and the $v$-th column in $R$ would be the inner product of the $u$-th row of $P$ and the $v$-th row of $Q$; that is, $R$ is approximated by the product of $P$'s transpose ($P^T$) and $Q$.
/// Note that $k$ is usually much smaller than $m$ and $n$, so $P^T Q$ is usually called a low-rank approximation of $R$.
///
/// This trainer includes a [stochastic gradient method](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) and a [coordinate descent method](https://en.wikipedia.org/wiki/Coordinate_descent) for finding $P$ and $Q$ via minimizing the distance between (non-missing part of) $R$ and its approximation $P^T Q$.
/// The coordinate descent method included is specifically for one-class matrix factorization where all observed ratings are positive signals (that is, all rating values are 1).
/// Notice that the only way to invoke one-class matrix factorization is to assign [one-class squared loss](xref:"Microsoft.ML.Trainers.MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass")
/// to [loss function](Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options.LossFunction)
/// when calling [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
/// See Page 6 and Page 28 [here](https://www.csie.ntu.edu.tw/~cjlin/talks/facebook.pdf) for a brief introduction to standard matrix factorization and one-class matrix factorization.
/// The [default setting](xref:Microsoft.ML.Trainers.MatrixFactorizationTrainer.LossFunctionType.SquareLossRegression) induces standard matrix factorization.
/// The underlying library used in ML.NET matrix factorization can be found on [a Github repository](https://github.com/cjlin1/libmf).
///
/// For users interested in the mathematical details, please see the references below.
///
/// * For the multi-threading implementation of the used stochastic gradient method, see [A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_journal.pdf).
/// * For the computation happening inside a single thread, see [A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf).
/// * For the parallel coordinate descent method used and one-class matrix factorization formula, see [Selection of Negative Samples for One-class Matrix Factorization](https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf).
/// * For details in the underlying library used, see [LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf).
///
/// ]]>
/// </format>
/// </example>
/// </remarks>
/// <seealso cref="Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(string, string, string, int, double, int)"/>
/// <seealso cref="Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(MatrixFactorizationTrainer.Options)"/>
/// <seealso cref="Options"/>
Copy link
Member

@singlis singlis Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options [](start = 23, length = 7)

is this one needed? #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


In reply to: 276861871 [](ancestors = 276861871)

public sealed class MatrixFactorizationTrainer : ITrainer<MatrixFactorizationModelParameters>,
IEstimator<MatrixFactorizationPredictionTransformer>
{
Expand All @@ -109,22 +134,25 @@ public enum LossFunctionType
};

/// <summary>
/// Advanced options for the <see cref="MatrixFactorizationTrainer"/>.
/// Options for the <see cref="MatrixFactorizationTrainer"/> as used in [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
/// </summary>
public sealed class Options
{
/// <summary>
/// The name of variable (i.e., Column in a <see cref="IDataView"/> type system) used as matrix's column index.
/// The column data must be <see cref="System.Single"/>.
/// </summary>
public string MatrixColumnIndexColumnName;

/// <summary>
/// The name of variable (i.e., column in a <see cref="IDataView"/> type system) used as matrix's row index.
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.
/// </summary>
public string MatrixRowIndexColumnName;

/// <summary>
/// The name variable (i.e., column in a <see cref="IDataView"/> type system) used as matrix's element value.
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.
/// </summary>
public string LabelColumnName;

Expand Down Expand Up @@ -155,10 +183,10 @@ public sealed class Options
public double Lambda = Defaults.Lambda;

/// <summary>
/// Rank of approximation matrixes.
/// Rank of approximation matrices.
/// </summary>
/// <remarks>
/// If input data has size of m-by-n we would build two approximation matrixes m-by-k and k-by-n where k is approximation rank.
/// If input data has size of m-by-n we would build two approximation matrices m-by-k and k-by-n where k is approximation rank.
/// </remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Latent space dimension (denoted by k). If the factorized matrix is m-by-n, " +
"two factor matrices found by matrix factorization are m-by-k and k-by-n, respectively. " +
Expand Down Expand Up @@ -194,13 +222,13 @@ public sealed class Options
/// <remarks>
/// Importance of unobserved (i.e., negative) entries' loss in one-class matrix factorization.
/// In general, only a few of matrix entries (e.g., less than 1%) in the training are observed (i.e., positive).
/// To balance the contributions from unobserved and obverved in the overall loss function, this parameter is
/// To balance the contributions from unobserved and observed in the overall loss function, this parameter is
/// usually a small value so that the solver is able to find a factorization equally good to unobserved and observed
/// entries. If only 10000 observed entries present in a 200000-by-300000 training matrix, one can try Alpha = 10000 / (200000*300000 - 10000).
/// When most entries in the training matrix are observed, one can use Alpha >> 1; for example, if only 10000 in previous
/// matrix is not observed, one can try Alpha = (200000 * 300000 - 10000) / 10000. Consequently,
/// Alpha = (# of observed entries) / (# of unobserved entries) can make observed and unobserved entries equally important
/// in the minimized loss function. However, the best setting in machine learning is alwasy data-depedent so user still needs to
/// in the minimized loss function. However, the best setting in machine learning is always data-dependent so user still needs to
/// try multiple values.
/// </remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Importance of unobserved entries' loss in one-class matrix factorization.")]
Expand All @@ -221,7 +249,7 @@ public sealed class Options
public double C = Defaults.C;

/// <summary>
/// Number of threads will be used during training. If unspecified all aviable threads will be use.
/// Number of threads will be used during training. If unspecified all available threads will be use.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Number of threads can be used in the training procedure.", ShortName = "t,numthreads")]
public int? NumberOfThreads;
Expand Down Expand Up @@ -351,7 +379,7 @@ internal MatrixFactorizationTrainer(IHostEnvironment env, Options options)
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.</param>
/// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.</param>
/// <param name="approximationRank">Rank of approximation matrixes.</param>
/// <param name="approximationRank">Rank of approximation matrices.</param>
/// <param name="learningRate">Initial learning rate. It specifies the speed of the training algorithm.</param>
/// <param name="numIterations">Number of training iterations.</param>
[BestFriend]
Expand Down
22 changes: 13 additions & 9 deletions src/Microsoft.ML.Recommender/RecommenderCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -43,18 +43,20 @@ internal RecommendationTrainers(RecommendationCatalog catalog)
}

/// <summary>
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
/// Create <see cref="MatrixFactorizationTrainer"/>, which predicts element values in a matrix using matrix factorization.
/// </summary>
/// <remarks>
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to apporimate the training matrix.</para>
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.</para>
/// <para>In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
/// and the value at the location specified by the two indexes.
/// </para>
/// </remarks>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.</param>
/// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.</param>
/// <param name="approximationRank">Rank of approximation matrixes.</param>
/// <param name="labelColumnName">The name of the label column. The column data must be <see cref="System.Single"/>.</param>
/// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.</param>
/// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.</param>
/// <param name="approximationRank">Rank of approximation matrices.</param>
/// <param name="learningRate">Initial learning rate. It specifies the speed of the training algorithm.</param>
/// <param name="numberOfIterations">Number of training iterations.</param>
/// <example>
Expand All @@ -74,15 +76,17 @@ public MatrixFactorizationTrainer MatrixFactorization(
approximationRank, learningRate, numberOfIterations);

/// <summary>
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
/// Create <see cref="MatrixFactorizationTrainer"/> with advanced options, which predicts element values in a matrix using matrix factorization.
/// </summary>
/// <remarks>
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to apporimate the training matrix.</para>
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.</para>
/// <para>In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
/// and the value at the location specified by the two indexes. The training configuration is encoded in <see cref="MatrixFactorizationTrainer.Options"/>.
/// To invoke one-class matrix factorization, user needs to specify <see cref="MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass"/>.
/// The default setting <see cref="MatrixFactorizationTrainer.LossFunctionType.SquareLossRegression"/> is for standard matrix factorization problem.
/// </para>
/// </remarks>
/// <param name="options">Advanced arguments to the algorithm.</param>
/// <param name="options">Trainer options.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
Expand Down