Skip to content

Commit 3fa207d

Browse files
authored
Matrix Factorization XML docs (#3409)
1 parent 270df4f commit 3fa207d

File tree

2 files changed

+87
-55
lines changed

2 files changed

+87
-55
lines changed

src/Microsoft.ML.Recommender/MatrixFactorizationTrainer.cs

+74-46
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,39 @@
2323
namespace Microsoft.ML.Trainers
2424
{
2525
/// <summary>
26-
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
26+
/// The <see cref="IEstimator{TTransformer}"/> to predict elements in a matrix using matrix factorization (also known as a type of collaborative filtering).
2727
/// </summary>
2828
/// <remarks>
29-
/// <para>The basic idea of matrix factorization is finding two low-rank factor marcies to apporimate the training matrix.
30-
/// In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
31-
/// and the value at the location specified by the two indexes. For an example data structure of a tuple, one can use:
32-
/// </para>
33-
/// <code language="csharp">
29+
/// <format type="text/markdown"><![CDATA[
30+
/// To create this trainer, use [MatrixFactorization](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(System.String,System.String,System.String,System.Int32,System.Double,System.Int32))
31+
/// or [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
32+
///
33+
/// ### Input and Output Columns
34+
/// There are three input columns required, one for matrix row indexes, one for matrix column indexes, and one for
35+
/// values (i.e., labels) in matrix.
36+
/// They together define a matrix in [COO](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) format.
37+
/// The type for label column is a vector of <xref:System.Single> while the other two columns are
38+
/// [key-typed](<xref:Microsoft.ML.Data.KeyDataViewType>) scalar.
39+
///
40+
/// | Output Column Name | Column Type | Description|
41+
/// | -- | -- | -- |
42+
/// | `Score` | <xref:System.Single> | The predicted matrix value at the location specified by input columns (row index column and column index column). |
43+
///
44+
/// ### Trainer Characteristics
45+
/// | | |
46+
/// | -- | -- |
47+
/// | Machine learning task | Recommender systems |
48+
/// | Is normalization required? | Yes |
49+
/// | Is caching required? | Yes |
50+
/// | Required NuGet in addition to Microsoft.ML | Microsoft.ML.Recommender |
51+
///
52+
/// ### Background
53+
/// The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.
54+
/// In this module, the expected training data (the factorized matrix) is a list of tuples.
55+
/// Every tuple consists of a column index, a row index,
56+
/// and the value at the location specified by the two indices. For an example data structure of a tuple, one can use:
57+
///
58+
/// ```csharp
3459
/// // The following variables defines the shape of a m-by-n matrix. Indexes start with 0; that is, our indexing system
3560
/// // is 0-based.
3661
/// const int m = 60;
@@ -48,41 +73,41 @@ namespace Microsoft.ML.Trainers
4873
/// // The rating at the MatrixColumnIndex-th column and the MatrixRowIndex-th row.
4974
/// public float Value;
5075
/// }
51-
/// </code>
52-
/// <para> Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill <i>missing values</i>.
53-
/// This behavior is very helpful when building recommender systems.</para>
54-
/// <para>To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example.
55-
/// Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users. That is,
56-
/// rating <i>r</i> at row <i>r</i> and column <i>v</i> means that user <i>u</i> give <i>r</i> to item <i>v</i>.
57-
/// An imcomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs).
58-
/// Assume that<i>R</i> is a m-by-n rating matrix and the rank of the two factor matrices are<i>P</i> (m-by-k matrix) and <i>Q</i> (n-by-k matrix), where k is the approximation rank.
59-
/// The predicted rating at the u-th row and the v-th column in <i>R</i> would be the inner product of the u-th row of P and the v-th row of Q; that is,
60-
/// <i>R</i> is approximated by the product of <i>P</i>'s transpose and <i>Q</i>. This trainer implements
61-
/// <a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf'>a stochastic gradient method</a> for finding <i>P</i>
62-
/// and <i>Q</i> via minimizing the distance between<i> R</i> and the product of <i>P</i>'s transpose and Q.</para>.
63-
/// <para>The underlying library used in ML.NET matrix factorization can be found on <a href='https://github.com/cjlin1/libmf'>a Github repository</a>. For users interested in the mathematical details, please see the references below.</para>
64-
/// <list type = 'bullet'>
65-
/// <item>
66-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_journal.pdf' > A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems</a></description>
67-
/// </item>
68-
/// <item>
69-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf' > A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization</a></description>
70-
/// </item>
71-
/// <item>
72-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf' > LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems</a></description>
73-
/// </item>
74-
/// <item>
75-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf' > Selection of Negative Samples for One-class Matrix Factorization</a></description>
76-
/// </item>
77-
/// </list>
78-
/// </remarks>
79-
/// <example>
80-
/// <format type="text/markdown">
81-
/// <![CDATA[
82-
/// [!code-csharp[MF](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Recommendation/MatrixFactorization.cs)]
76+
/// ```
77+
///
78+
/// Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill <i>missing values</i>.
79+
/// This behavior is very helpful when building recommender systems.
80+
///
81+
/// To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example.
82+
/// Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users.
83+
/// That is, rating $r$ at row $u$ and column $v$ means that user $u$ give $r$ to item $v$.
84+
/// An incomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs).
85+
/// Assume that $R\in{\mathbb R}^{m\times n}$ is a m-by-n rating matrix and the [rank](https://en.wikipedia.org/wiki/Rank_(linear_algebra)) of the two factor matrices are $P\in {\mathbb R}^{k\times m}$ and $Q\in {\mathbb R}^{k\times n}$, where $k$ is the approximation rank.
86+
/// The predicted rating at the $u$-th row and the $v$-th column in $R$ would be the inner product of the $u$-th row of $P$ and the $v$-th row of $Q$; that is, $R$ is approximated by the product of $P$'s transpose ($P^T$) and $Q$.
87+
/// Note that $k$ is usually much smaller than $m$ and $n$, so $P^T Q$ is usually called a low-rank approximation of $R$.
88+
///
89+
/// This trainer includes a [stochastic gradient method](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) and a [coordinate descent method](https://en.wikipedia.org/wiki/Coordinate_descent) for finding $P$ and $Q$ via minimizing the distance between (non-missing part of) $R$ and its approximation $P^T Q$.
90+
/// The coordinate descent method included is specifically for one-class matrix factorization where all observed ratings are positive signals (that is, all rating values are 1).
91+
/// Notice that the only way to invoke one-class matrix factorization is to assign [one-class squared loss](xref:"Microsoft.ML.Trainers.MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass")
92+
/// to [loss function](Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options.LossFunction)
93+
/// when calling [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
94+
/// See Page 6 and Page 28 [here](https://www.csie.ntu.edu.tw/~cjlin/talks/facebook.pdf) for a brief introduction to standard matrix factorization and one-class matrix factorization.
95+
/// The [default setting](xref:Microsoft.ML.Trainers.MatrixFactorizationTrainer.LossFunctionType.SquareLossRegression) induces standard matrix factorization.
96+
/// The underlying library used in ML.NET matrix factorization can be found on [a Github repository](https://github.com/cjlin1/libmf).
97+
///
98+
/// For users interested in the mathematical details, please see the references below.
99+
///
100+
/// * For the multi-threading implementation of the used stochastic gradient method, see [A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_journal.pdf).
101+
/// * For the computation happening inside a single thread, see [A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf).
102+
/// * For the parallel coordinate descent method used and one-class matrix factorization formula, see [Selection of Negative Samples for One-class Matrix Factorization](https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf).
103+
/// * For details in the underlying library used, see [LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf).
104+
///
83105
/// ]]>
84106
/// </format>
85-
/// </example>
107+
/// </remarks>
108+
/// <seealso cref="Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(string, string, string, int, double, int)"/>
109+
/// <seealso cref="Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(MatrixFactorizationTrainer.Options)"/>
110+
/// <seealso cref="Options"/>
86111
public sealed class MatrixFactorizationTrainer : ITrainer<MatrixFactorizationModelParameters>,
87112
IEstimator<MatrixFactorizationPredictionTransformer>
88113
{
@@ -109,22 +134,25 @@ public enum LossFunctionType
109134
};
110135

111136
/// <summary>
112-
/// Advanced options for the <see cref="MatrixFactorizationTrainer"/>.
137+
/// Options for the <see cref="MatrixFactorizationTrainer"/> as used in [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
113138
/// </summary>
114139
public sealed class Options
115140
{
116141
/// <summary>
117142
/// The name of variable (i.e., Column in a <see cref="IDataView"/> type system) used as matrix's column index.
143+
/// The column data must be <see cref="System.Single"/>.
118144
/// </summary>
119145
public string MatrixColumnIndexColumnName;
120146

121147
/// <summary>
122148
/// The name of variable (i.e., column in a <see cref="IDataView"/> type system) used as matrix's row index.
149+
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.
123150
/// </summary>
124151
public string MatrixRowIndexColumnName;
125152

126153
/// <summary>
127154
/// The name variable (i.e., column in a <see cref="IDataView"/> type system) used as matrix's element value.
155+
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.
128156
/// </summary>
129157
public string LabelColumnName;
130158

@@ -155,10 +183,10 @@ public sealed class Options
155183
public double Lambda = Defaults.Lambda;
156184

157185
/// <summary>
158-
/// Rank of approximation matrixes.
186+
/// Rank of approximation matrices.
159187
/// </summary>
160188
/// <remarks>
161-
/// If input data has size of m-by-n we would build two approximation matrixes m-by-k and k-by-n where k is approximation rank.
189+
/// If input data has size of m-by-n we would build two approximation matrices m-by-k and k-by-n where k is approximation rank.
162190
/// </remarks>
163191
[Argument(ArgumentType.AtMostOnce, HelpText = "Latent space dimension (denoted by k). If the factorized matrix is m-by-n, " +
164192
"two factor matrices found by matrix factorization are m-by-k and k-by-n, respectively. " +
@@ -194,13 +222,13 @@ public sealed class Options
194222
/// <remarks>
195223
/// Importance of unobserved (i.e., negative) entries' loss in one-class matrix factorization.
196224
/// In general, only a few of matrix entries (e.g., less than 1%) in the training are observed (i.e., positive).
197-
/// To balance the contributions from unobserved and obverved in the overall loss function, this parameter is
225+
/// To balance the contributions from unobserved and observed in the overall loss function, this parameter is
198226
/// usually a small value so that the solver is able to find a factorization equally good to unobserved and observed
199227
/// entries. If only 10000 observed entries present in a 200000-by-300000 training matrix, one can try Alpha = 10000 / (200000*300000 - 10000).
200228
/// When most entries in the training matrix are observed, one can use Alpha >> 1; for example, if only 10000 in previous
201229
/// matrix is not observed, one can try Alpha = (200000 * 300000 - 10000) / 10000. Consequently,
202230
/// Alpha = (# of observed entries) / (# of unobserved entries) can make observed and unobserved entries equally important
203-
/// in the minimized loss function. However, the best setting in machine learning is alwasy data-depedent so user still needs to
231+
/// in the minimized loss function. However, the best setting in machine learning is always data-dependent so user still needs to
204232
/// try multiple values.
205233
/// </remarks>
206234
[Argument(ArgumentType.AtMostOnce, HelpText = "Importance of unobserved entries' loss in one-class matrix factorization.")]
@@ -221,7 +249,7 @@ public sealed class Options
221249
public double C = Defaults.C;
222250

223251
/// <summary>
224-
/// Number of threads will be used during training. If unspecified all aviable threads will be use.
252+
/// Number of threads will be used during training. If unspecified all available threads will be use.
225253
/// </summary>
226254
[Argument(ArgumentType.AtMostOnce, HelpText = "Number of threads can be used in the training procedure.", ShortName = "t,numthreads")]
227255
public int? NumberOfThreads;
@@ -351,7 +379,7 @@ internal MatrixFactorizationTrainer(IHostEnvironment env, Options options)
351379
/// <param name="labelColumnName">The name of the label column.</param>
352380
/// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.</param>
353381
/// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.</param>
354-
/// <param name="approximationRank">Rank of approximation matrixes.</param>
382+
/// <param name="approximationRank">Rank of approximation matrices.</param>
355383
/// <param name="learningRate">Initial learning rate. It specifies the speed of the training algorithm.</param>
356384
/// <param name="numIterations">Number of training iterations.</param>
357385
[BestFriend]

src/Microsoft.ML.Recommender/RecommenderCatalog.cs

+13-9
Original file line numberDiff line numberDiff line change
@@ -43,18 +43,20 @@ internal RecommendationTrainers(RecommendationCatalog catalog)
4343
}
4444

4545
/// <summary>
46-
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
46+
/// Create <see cref="MatrixFactorizationTrainer"/>, which predicts element values in a matrix using matrix factorization.
4747
/// </summary>
4848
/// <remarks>
49-
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to apporimate the training matrix.</para>
49+
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.</para>
5050
/// <para>In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
5151
/// and the value at the location specified by the two indexes.
5252
/// </para>
5353
/// </remarks>
54-
/// <param name="labelColumnName">The name of the label column.</param>
55-
/// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.</param>
56-
/// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.</param>
57-
/// <param name="approximationRank">Rank of approximation matrixes.</param>
54+
/// <param name="labelColumnName">The name of the label column. The column data must be <see cref="System.Single"/>.</param>
55+
/// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.
56+
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.</param>
57+
/// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.
58+
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.</param>
59+
/// <param name="approximationRank">Rank of approximation matrices.</param>
5860
/// <param name="learningRate">Initial learning rate. It specifies the speed of the training algorithm.</param>
5961
/// <param name="numberOfIterations">Number of training iterations.</param>
6062
/// <example>
@@ -74,15 +76,17 @@ public MatrixFactorizationTrainer MatrixFactorization(
7476
approximationRank, learningRate, numberOfIterations);
7577

7678
/// <summary>
77-
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
79+
/// Create <see cref="MatrixFactorizationTrainer"/> with advanced options, which predicts element values in a matrix using matrix factorization.
7880
/// </summary>
7981
/// <remarks>
80-
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to apporimate the training matrix.</para>
82+
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.</para>
8183
/// <para>In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
8284
/// and the value at the location specified by the two indexes. The training configuration is encoded in <see cref="MatrixFactorizationTrainer.Options"/>.
85+
/// To invoke one-class matrix factorization, user needs to specify <see cref="MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass"/>.
86+
/// The default setting <see cref="MatrixFactorizationTrainer.LossFunctionType.SquareLossRegression"/> is for standard matrix factorization problem.
8387
/// </para>
8488
/// </remarks>
85-
/// <param name="options">Advanced arguments to the algorithm.</param>
89+
/// <param name="options">Trainer options.</param>
8690
/// <example>
8791
/// <format type="text/markdown">
8892
/// <![CDATA[

0 commit comments

Comments
 (0)