You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/Microsoft.ML.Recommender/MatrixFactorizationTrainer.cs
+74-46
Original file line number
Diff line number
Diff line change
@@ -23,14 +23,39 @@
23
23
namespaceMicrosoft.ML.Trainers
24
24
{
25
25
/// <summary>
26
-
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
26
+
/// The <see cref="IEstimator{TTransformer}"/> to predict elements in a matrix using matrix factorization (also known as a type of collaborative filtering).
27
27
/// </summary>
28
28
/// <remarks>
29
-
/// <para>The basic idea of matrix factorization is finding two low-rank factor marcies to apporimate the training matrix.
30
-
/// In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
31
-
/// and the value at the location specified by the two indexes. For an example data structure of a tuple, one can use:
32
-
/// </para>
33
-
/// <code language="csharp">
29
+
/// <format type="text/markdown"><)
31
+
/// or [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
32
+
///
33
+
/// ### Input and Output Columns
34
+
/// There are three input columns required, one for matrix row indexes, one for matrix column indexes, and one for
35
+
/// values (i.e., labels) in matrix.
36
+
/// They together define a matrix in [COO](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) format.
37
+
/// The type for label column is a vector of <xref:System.Single> while the other two columns are
/// | Output Column Name | Column Type | Description|
41
+
/// | -- | -- | -- |
42
+
/// | `Score` | <xref:System.Single> | The predicted matrix value at the location specified by input columns (row index column and column index column). |
43
+
///
44
+
/// ### Trainer Characteristics
45
+
/// | | |
46
+
/// | -- | -- |
47
+
/// | Machine learning task | Recommender systems |
48
+
/// | Is normalization required? | Yes |
49
+
/// | Is caching required? | Yes |
50
+
/// | Required NuGet in addition to Microsoft.ML | Microsoft.ML.Recommender |
51
+
///
52
+
/// ### Background
53
+
/// The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.
54
+
/// In this module, the expected training data (the factorized matrix) is a list of tuples.
55
+
/// Every tuple consists of a column index, a row index,
56
+
/// and the value at the location specified by the two indices. For an example data structure of a tuple, one can use:
57
+
///
58
+
/// ```csharp
34
59
/// // The following variables defines the shape of a m-by-n matrix. Indexes start with 0; that is, our indexing system
/// // The rating at the MatrixColumnIndex-th column and the MatrixRowIndex-th row.
49
74
/// public float Value;
50
75
/// }
51
-
/// </code>
52
-
/// <para> Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill <i>missing values</i>.
53
-
/// This behavior is very helpful when building recommender systems.</para>
54
-
/// <para>To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example.
55
-
/// Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users. That is,
56
-
/// rating <i>r</i> at row <i>r</i> and column <i>v</i> means that user <i>u</i> give <i>r</i> to item <i>v</i>.
57
-
/// An imcomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs).
58
-
/// Assume that<i>R</i> is a m-by-n rating matrix and the rank of the two factor matrices are<i>P</i> (m-by-k matrix) and <i>Q</i> (n-by-k matrix), where k is the approximation rank.
59
-
/// The predicted rating at the u-th row and the v-th column in <i>R</i> would be the inner product of the u-th row of P and the v-th row of Q; that is,
60
-
/// <i>R</i> is approximated by the product of <i>P</i>'s transpose and <i>Q</i>. This trainer implements
61
-
/// <a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf'>a stochastic gradient method</a> for finding <i>P</i>
62
-
/// and <i>Q</i> via minimizing the distance between<i> R</i> and the product of <i>P</i>'s transpose and Q.</para>.
63
-
/// <para>The underlying library used in ML.NET matrix factorization can be found on <a href='https://github.com/cjlin1/libmf'>a Github repository</a>. For users interested in the mathematical details, please see the references below.</para>
64
-
/// <list type = 'bullet'>
65
-
/// <item>
66
-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_journal.pdf' > A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems</a></description>
67
-
/// </item>
68
-
/// <item>
69
-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf' > A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization</a></description>
70
-
/// </item>
71
-
/// <item>
72
-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf' > LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems</a></description>
73
-
/// </item>
74
-
/// <item>
75
-
/// <description><a href='https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf' > Selection of Negative Samples for One-class Matrix Factorization</a></description>
/// Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill <i>missing values</i>.
79
+
/// This behavior is very helpful when building recommender systems.
80
+
///
81
+
/// To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example.
82
+
/// Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users.
83
+
/// That is, rating $r$ at row $u$ and column $v$ means that user $u$ give $r$ to item $v$.
84
+
/// An incomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs).
85
+
/// Assume that $R\in{\mathbb R}^{m\times n}$ is a m-by-n rating matrix and the [rank](https://en.wikipedia.org/wiki/Rank_(linear_algebra)) of the two factor matrices are $P\in {\mathbb R}^{k\times m}$ and $Q\in {\mathbb R}^{k\times n}$, where $k$ is the approximation rank.
86
+
/// The predicted rating at the $u$-th row and the $v$-th column in $R$ would be the inner product of the $u$-th row of $P$ and the $v$-th row of $Q$; that is, $R$ is approximated by the product of $P$'s transpose ($P^T$) and $Q$.
87
+
/// Note that $k$ is usually much smaller than $m$ and $n$, so $P^T Q$ is usually called a low-rank approximation of $R$.
88
+
///
89
+
/// This trainer includes a [stochastic gradient method](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) and a [coordinate descent method](https://en.wikipedia.org/wiki/Coordinate_descent) for finding $P$ and $Q$ via minimizing the distance between (non-missing part of) $R$ and its approximation $P^T Q$.
90
+
/// The coordinate descent method included is specifically for one-class matrix factorization where all observed ratings are positive signals (that is, all rating values are 1).
91
+
/// Notice that the only way to invoke one-class matrix factorization is to assign [one-class squared loss](xref:"Microsoft.ML.Trainers.MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass")
92
+
/// to [loss function](Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options.LossFunction)
93
+
/// when calling [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
94
+
/// See Page 6 and Page 28 [here](https://www.csie.ntu.edu.tw/~cjlin/talks/facebook.pdf) for a brief introduction to standard matrix factorization and one-class matrix factorization.
95
+
/// The [default setting](xref:Microsoft.ML.Trainers.MatrixFactorizationTrainer.LossFunctionType.SquareLossRegression) induces standard matrix factorization.
96
+
/// The underlying library used in ML.NET matrix factorization can be found on [a Github repository](https://github.com/cjlin1/libmf).
97
+
///
98
+
/// For users interested in the mathematical details, please see the references below.
99
+
///
100
+
/// * For the multi-threading implementation of the used stochastic gradient method, see [A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_journal.pdf).
101
+
/// * For the computation happening inside a single thread, see [A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf).
102
+
/// * For the parallel coordinate descent method used and one-class matrix factorization formula, see [Selection of Negative Samples for One-class Matrix Factorization](https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf).
103
+
/// * For details in the underlying library used, see [LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf).
@@ -109,22 +134,25 @@ public enum LossFunctionType
109
134
};
110
135
111
136
/// <summary>
112
-
/// Advanced options for the <see cref="MatrixFactorizationTrainer"/>.
137
+
/// Options for the <see cref="MatrixFactorizationTrainer"/> as used in [MatrixFactorization(Options)](xref:Microsoft.ML.RecommendationCatalog.RecommendationTrainers.MatrixFactorization(Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options)).
113
138
/// </summary>
114
139
publicsealedclassOptions
115
140
{
116
141
/// <summary>
117
142
/// The name of variable (i.e., Column in a <see cref="IDataView"/> type system) used as matrix's column index.
143
+
/// The column data must be <see cref="System.Single"/>.
118
144
/// </summary>
119
145
publicstringMatrixColumnIndexColumnName;
120
146
121
147
/// <summary>
122
148
/// The name of variable (i.e., column in a <see cref="IDataView"/> type system) used as matrix's row index.
149
+
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.
123
150
/// </summary>
124
151
publicstringMatrixRowIndexColumnName;
125
152
126
153
/// <summary>
127
154
/// The name variable (i.e., column in a <see cref="IDataView"/> type system) used as matrix's element value.
155
+
/// The column data must be <see cref="Microsoft.ML.Data.KeyDataViewType"/>.
128
156
/// </summary>
129
157
publicstringLabelColumnName;
130
158
@@ -155,10 +183,10 @@ public sealed class Options
155
183
publicdoubleLambda=Defaults.Lambda;
156
184
157
185
/// <summary>
158
-
/// Rank of approximation matrixes.
186
+
/// Rank of approximation matrices.
159
187
/// </summary>
160
188
/// <remarks>
161
-
/// If input data has size of m-by-n we would build two approximation matrixes m-by-k and k-by-n where k is approximation rank.
189
+
/// If input data has size of m-by-n we would build two approximation matrices m-by-k and k-by-n where k is approximation rank.
162
190
/// </remarks>
163
191
[Argument(ArgumentType.AtMostOnce,HelpText="Latent space dimension (denoted by k). If the factorized matrix is m-by-n, "+
164
192
"two factor matrices found by matrix factorization are m-by-k and k-by-n, respectively. "+
@@ -194,13 +222,13 @@ public sealed class Options
194
222
/// <remarks>
195
223
/// Importance of unobserved (i.e., negative) entries' loss in one-class matrix factorization.
196
224
/// In general, only a few of matrix entries (e.g., less than 1%) in the training are observed (i.e., positive).
197
-
/// To balance the contributions from unobserved and obverved in the overall loss function, this parameter is
225
+
/// To balance the contributions from unobserved and observed in the overall loss function, this parameter is
198
226
/// usually a small value so that the solver is able to find a factorization equally good to unobserved and observed
199
227
/// entries. If only 10000 observed entries present in a 200000-by-300000 training matrix, one can try Alpha = 10000 / (200000*300000 - 10000).
200
228
/// When most entries in the training matrix are observed, one can use Alpha >> 1; for example, if only 10000 in previous
201
229
/// matrix is not observed, one can try Alpha = (200000 * 300000 - 10000) / 10000. Consequently,
202
230
/// Alpha = (# of observed entries) / (# of unobserved entries) can make observed and unobserved entries equally important
203
-
/// in the minimized loss function. However, the best setting in machine learning is alwasy data-depedent so user still needs to
231
+
/// in the minimized loss function. However, the best setting in machine learning is always data-dependent so user still needs to
204
232
/// try multiple values.
205
233
/// </remarks>
206
234
[Argument(ArgumentType.AtMostOnce,HelpText="Importance of unobserved entries' loss in one-class matrix factorization.")]
@@ -221,7 +249,7 @@ public sealed class Options
221
249
publicdoubleC=Defaults.C;
222
250
223
251
/// <summary>
224
-
/// Number of threads will be used during training. If unspecified all aviable threads will be use.
252
+
/// Number of threads will be used during training. If unspecified all available threads will be use.
225
253
/// </summary>
226
254
[Argument(ArgumentType.AtMostOnce,HelpText="Number of threads can be used in the training procedure.",ShortName="t,numthreads")]
/// Train a matrix factorization model. It factorizes the training matrix into the product of two low-rank matrices.
79
+
/// Create <see cref="MatrixFactorizationTrainer"/> with advanced options, which predicts element values in a matrix using matrix factorization.
78
80
/// </summary>
79
81
/// <remarks>
80
-
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to apporimate the training matrix.</para>
82
+
/// <para>The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix.</para>
81
83
/// <para>In this module, the expected training data is a list of tuples. Every tuple consists of a column index, a row index,
82
84
/// and the value at the location specified by the two indexes. The training configuration is encoded in <see cref="MatrixFactorizationTrainer.Options"/>.
85
+
/// To invoke one-class matrix factorization, user needs to specify <see cref="MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass"/>.
86
+
/// The default setting <see cref="MatrixFactorizationTrainer.LossFunctionType.SquareLossRegression"/> is for standard matrix factorization problem.
83
87
/// </para>
84
88
/// </remarks>
85
-
/// <param name="options">Advanced arguments to the algorithm.</param>
0 commit comments