Skip to content

Commit 3d6f66d

Browse files
authored
XML documentation for Calibrated and Non Calibrated SDCA Binary Trainers. (#3395)
* XML documentation for Calibrated and Non Calibrated SDCA Trainers. * PR feedback. * PR feedback. * PR feedback.
1 parent 2153977 commit 3d6f66d

File tree

4 files changed

+96
-12
lines changed

4 files changed

+96
-12
lines changed

docs/api-reference/algo-details-sdca.md

+34-3
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,24 @@
11
### Training Algorithm Details
22
This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a
33
state-of-the-art optimization technique for convex objective functions. The
4-
algorithm can be scaled for use on large out-of-memory data sets due to a
5-
semi-asynchronized implementation that supports multi-threading.
4+
algorithm can be scaled because it's a streaming training algorithm as described
5+
in a [KDD best
6+
paper.](https://www.csie.ntu.edu.tw/~cjlin/papers/disk_decomposition/tkdd_disk_decomposition.pdf)
67

78
Convergence is underwritten by periodically enforcing synchronization between
89
primal and dual variables in a separate thread. Several choices of loss
9-
functions are also provided.
10+
functions are also provided such as
11+
[hinge-loss](https://en.wikipedia.org/wiki/Hinge_loss) and [logistic
12+
loss](http://www.hongliangjie.com/wp-content/uploads/2011/10/logistic.pdf).
13+
Depending on the loss used, the trained model can be, for example, [support
14+
vector machine](https://en.wikipedia.org/wiki/Support-vector_machine) or
15+
[logistic regression](https://en.wikipedia.org/wiki/Logistic_regression). The
16+
SDCA method combines several of the best properties such the ability to do
17+
streaming learning (without fitting the entire data set into your memory),
18+
reaching a reasonable result with a few scans of the whole data set (for
19+
example, see experiments in [this
20+
paper](https://www.csie.ntu.edu.tw/~cjlin/papers/cddual.pdf)), and spending no
21+
computation on zeros in sparse data sets.
1022

1123
Note that SDCA is a stochastic and streaming optimization algorithm. The result
1224
depends on the order of training data because the stopping tolerance is not
@@ -17,6 +29,25 @@ For reproducible results, it is recommended that one sets 'Shuffle' to False and
1729
'NumThreads' to 1. Elastic net regularization can be specified by the 'L2Const'
1830
and 'L1Threshold' parameters. Note that the 'L2Const' has an effect on the rate
1931
of convergence. In general, the larger the 'L2Const', the faster SDCA converges.
32+
Regularization is a method that can render an ill-posed problem more tractable
33+
by imposing constraints that provide information to supplement the data and that
34+
prevents overfitting by penalizing model's magnitude usually measured by some
35+
norm functions. This can improve the generalization of the model learned by
36+
selecting the optimal complexity in the bias-variance tradeoff. Regularization
37+
works by adding the penalty that is associated with coefficient values to the
38+
error of the hypothesis. An accurate model with extreme coefficient values would
39+
be penalized more, but a less accurate model with more conservative values would
40+
be penalized less. This learner supports [elastic net
41+
regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a
42+
linear combination of L1-norm (LASSO), $|| \boldsymbol{w} ||_1$, and L2-norm
43+
(ridge), $|| \boldsymbol{w} ||_2^2$ regularizations. L1-nrom and L2-norm
44+
regularizations have different effects and uses that are complementary in
45+
certain respects. Using L1-norm can increase sparsity of the trained
46+
$\boldsymbol{w}$. When working with high-dimensional data, it shrinks small
47+
weights of irrevalent features to 0 and therefore no reource will be spent on
48+
those bad features when making prediction. L2-norm regularization is preferable
49+
for data that is not sparse and it largely penalizes the existence of large
50+
weights.
2051

2152
For more information, see:
2253
* [Scaling Up Stochastic Dual Coordinate

src/Microsoft.ML.StandardTrainers/Standard/SdcaBinary.cs

+56-3
Original file line numberDiff line numberDiff line change
@@ -1550,12 +1550,34 @@ private protected override BinaryPredictionTransformer<TModelParameters> MakeTra
15501550
/// The trained model is <a href='https://en.wikipedia.org/wiki/Calibration_(statistics)'>calibrated</a> and can produce probability by feeding the output value of the
15511551
/// linear function to a <see cref="PlattCalibrator"/>.
15521552
/// </summary>
1553-
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
1553+
/// <remarks>
1554+
/// <format type="text/markdown"><![CDATA[
1555+
/// To create this trainer, use [SdcaLogisticRegression](xref:Microsoft.ML.StandardTrainersCatalog.SdcaLogisticRegression(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,System.String,System.String,System.String,System.Nullable{System.Single},System.Nullable{System.Single},System.Nullable{System.Int32}))
1556+
/// or [SdcaLogisticRegression(Options)](xref:Microsoft.ML.StandardTrainersCatalog.SdcaLogisticRegression(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaLogisticRegressionBinaryTrainer.Options)).
1557+
///
1558+
/// [!include[io](~/../docs/samples/docs/api-reference/io-columns-binary-classification.md)]
1559+
///
1560+
/// ### Trainer Characteristics
1561+
/// | | |
1562+
/// | -- | -- |
1563+
/// | Machine learning task | Binary classification |
1564+
/// | Is normalization required? | Yes |
1565+
/// | Is caching required? | No |
1566+
/// | Required NuGet in addition to Microsoft.ML | None |
1567+
///
1568+
/// [!include[algorithm](~/../docs/samples/docs/api-reference/algo-details-sdca.md)]
1569+
/// ]]>
1570+
/// </format>
1571+
/// </remarks>
1572+
/// <seealso cref="StandardTrainersCatalog.SdcaLogisticRegression(BinaryClassificationCatalog.BinaryClassificationTrainers, string, string, string, float?, float?, int?)"/>
1573+
/// <seealso cref="StandardTrainersCatalog.SdcaLogisticRegression(BinaryClassificationCatalog.BinaryClassificationTrainers, SdcaLogisticRegressionBinaryTrainer.Options)"/>
1574+
/// <seealso cref="Options"/>
15541575
public sealed class SdcaLogisticRegressionBinaryTrainer :
15551576
SdcaBinaryTrainerBase<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>
15561577
{
15571578
/// <summary>
1558-
/// Options for the <see cref="SdcaLogisticRegressionBinaryTrainer"/>.
1579+
/// Options for the <see cref="SdcaLogisticRegressionBinaryTrainer"/> as used in
1580+
/// [SdcaLogisticRegression(Options)](xref:Microsoft.ML.StandardTrainersCatalog.SdcaLogisticRegression(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaLogisticRegressionBinaryTrainer.Options)).
15591581
/// </summary>
15601582
public sealed class Options : BinaryOptionsBase
15611583
{
@@ -1614,7 +1636,38 @@ private protected override SchemaShape.Column[] ComputeSdcaBinaryClassifierSchem
16141636
/// <summary>
16151637
/// The <see cref="IEstimator{TTransformer}"/> for training a binary logistic regression classification model using the stochastic dual coordinate ascent method.
16161638
/// </summary>
1617-
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
1639+
/// <remarks>
1640+
/// <format type="text/markdown"><![CDATA[
1641+
/// To create this trainer, use [SdcaNonCalibrated](xref:Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,System.String,System.String,System.String,Microsoft.ML.Trainers.ISupportSdcaClassificationLoss,System.Nullable{System.Single},System.Nullable{System.Single},System.Nullable{System.Int32}))
1642+
/// or [SdcaNonCalibrated(Options)](xref:Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaNonCalibratedBinaryTrainer.Options)).
1643+
///
1644+
/// [!include[io](~/../docs/samples/docs/api-reference/io-columns-binary-classification.md)]
1645+
///
1646+
/// ### Trainer Characteristics
1647+
/// | | |
1648+
/// | -- | -- |
1649+
/// | Machine learning task | Binary classification |
1650+
/// | Is normalization required? | Yes |
1651+
/// | Is caching required? | No |
1652+
/// | Required NuGet in addition to Microsoft.ML | None |
1653+
///
1654+
/// ### Training Algorithm Details
1655+
/// This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a state-of-the-art optimization technique for convex objective functions.
1656+
/// The algorithm can be scaled for use on large out-of-memory data sets due to a semi-asynchronized implementation that supports multi-threading.
1657+
/// Convergence is underwritten by periodically enforcing synchronization between primal and dual updates in a separate thread.
1658+
/// Several choices of loss functions are also provided.The SDCA method combines several of the best properties and capabilities of logistic regression and SVM algorithms.
1659+
/// Note that SDCA is a stochastic and streaming optimization algorithm. The results depends on the order of the training data.
1660+
/// For reproducible results, it is recommended that one sets 'Shuffle' to False and 'NumThreads' to 1.
1661+
/// Elastic net regularization can be specified by the 'L2Const' and 'L1Threshold' parameters. Note that the 'L2Const' has an effect on the rate of convergence.
1662+
/// In general, the larger the 'L2Const', the faster SDCA converges.
1663+
/// For more information, see: [Scaling Up Stochastic Dual Coordinate Ascent](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/main-3.pdf ) and
1664+
/// [Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization](http://www.jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf).
1665+
/// ]]>
1666+
/// </format>
1667+
/// </remarks>
1668+
/// <seealso cref="Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,System.String,System.String,System.String,Microsoft.ML.Trainers.ISupportSdcaClassificationLoss,System.Nullable{System.Single},System.Nullable{System.Single},System.Nullable{System.Int32})"/>
1669+
/// <seealso cref="Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaNonCalibratedBinaryTrainer.Options)"/>
1670+
/// <seealso cref="Options"/>
16181671
public sealed class SdcaNonCalibratedBinaryTrainer : SdcaBinaryTrainerBase<LinearBinaryModelParameters>
16191672
{
16201673
/// <summary>

src/Microsoft.ML.StandardTrainers/Standard/SdcaRegression.cs

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ namespace Microsoft.ML.Trainers
3939
/// | Is caching required? | No |
4040
/// | Required NuGet in addition to Microsoft.ML | None |
4141
///
42-
/// [!include[io](~/../docs/samples/docs/api-reference/algo-details-sdca.md)]
42+
/// [!include[algorithm](~/../docs/samples/docs/api-reference/algo-details-sdca.md)]
4343
/// ]]>
4444
/// </format>
4545
/// </remarks>

src/Microsoft.ML.StandardTrainers/StandardTrainersCatalog.cs

+5-5
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ public static SdcaLogisticRegressionBinaryTrainer SdcaLogisticRegression(
211211
}
212212

213213
/// <summary>
214-
/// Create <see cref="SdcaLogisticRegressionBinaryTrainer"/> using advanced options, which predicts a target using a linear classification model.
214+
/// Create <see cref="SdcaLogisticRegressionBinaryTrainer"/> with advanced options, which predicts a target using a linear classification model.
215215
/// </summary>
216216
/// <param name="catalog">The binary classification catalog trainer object.</param>
217217
/// <param name="options">Trainer options.</param>
@@ -233,11 +233,11 @@ public static SdcaLogisticRegressionBinaryTrainer SdcaLogisticRegression(
233233
}
234234

235235
/// <summary>
236-
/// Predict a target using a linear classification model trained with <see cref="SdcaNonCalibratedBinaryTrainer"/>.
236+
/// Create <see cref="SdcaNonCalibratedBinaryTrainer"/>, which predicts a target using a linear classification model.
237237
/// </summary>
238238
/// <param name="catalog">The binary classification catalog trainer object.</param>
239-
/// <param name="labelColumnName">The name of the label column.</param>
240-
/// <param name="featureColumnName">The name of the feature column.</param>
239+
/// <param name="labelColumnName">The name of the label column. The column data must be <see cref="System.Boolean"/>.</param>
240+
/// <param name="featureColumnName">The name of the feature column. The column data must be a known-sized vector of <see cref="System.Single"/>.</param>
241241
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
242242
/// <param name="lossFunction">The <a href="https://en.wikipedia.org/wiki/Loss_function">loss</a> function minimized in the training process. Defaults to <see cref="LogLoss"/> if not specified.</param>
243243
/// <param name="l2Regularization">The L2 weight for <a href='https://en.wikipedia.org/wiki/Regularization_(mathematics)'>regularization</a>.</param>
@@ -265,7 +265,7 @@ public static SdcaNonCalibratedBinaryTrainer SdcaNonCalibrated(
265265
}
266266

267267
/// <summary>
268-
/// Predict a target using a linear classification model trained with <see cref="SdcaNonCalibratedBinaryTrainer"/> and advanced options.
268+
/// Create <see cref="SdcaNonCalibratedBinaryTrainer"/> using advanced options, which predicts a target using a linear classification model trained over boolean label data.
269269
/// </summary>
270270
/// <param name="catalog">The binary classification catalog trainer object.</param>
271271
/// <param name="options">Trainer options.</param>

0 commit comments

Comments
 (0)