Skip to content

XML documentation for Calibrated and Non Calibrated SDCA Binary Trainers. #3395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 34 additions & 3 deletions docs/api-reference/algo-details-sdca.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
### Training Algorithm Details
This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a
state-of-the-art optimization technique for convex objective functions. The
algorithm can be scaled for use on large out-of-memory data sets due to a
semi-asynchronized implementation that supports multi-threading.
algorithm can be scaled because it's a streaming training algorithm as described
in a [KDD best
paper.](https://www.csie.ntu.edu.tw/~cjlin/papers/disk_decomposition/tkdd_disk_decomposition.pdf)

Convergence is underwritten by periodically enforcing synchronization between
primal and dual variables in a separate thread. Several choices of loss
functions are also provided.
functions are also provided such as
[hinge-loss](https://en.wikipedia.org/wiki/Hinge_loss) and [logistic
loss](http://www.hongliangjie.com/wp-content/uploads/2011/10/logistic.pdf).
Depending on the loss used, the trained model can be, for example, [support
vector machine](https://en.wikipedia.org/wiki/Support-vector_machine) or
[logistic regression](https://en.wikipedia.org/wiki/Logistic_regression). The
SDCA method combines several of the best properties such the ability to do
streaming learning (without fitting the entire data set into your memory),
reaching a reasonable result with a few scans of the whole data set (for
example, see experiments in [this
paper](https://www.csie.ntu.edu.tw/~cjlin/papers/cddual.pdf)), and spending no
computation on zeros in sparse data sets.

Note that SDCA is a stochastic and streaming optimization algorithm. The result
depends on the order of training data because the stopping tolerance is not
Expand All @@ -17,6 +29,25 @@ For reproducible results, it is recommended that one sets 'Shuffle' to False and
'NumThreads' to 1. Elastic net regularization can be specified by the 'L2Const'
and 'L1Threshold' parameters. Note that the 'L2Const' has an effect on the rate
of convergence. In general, the larger the 'L2Const', the faster SDCA converges.
Regularization is a method that can render an ill-posed problem more tractable
by imposing constraints that provide information to supplement the data and that
prevents overfitting by penalizing model's magnitude usually measured by some
norm functions. This can improve the generalization of the model learned by
selecting the optimal complexity in the bias-variance tradeoff. Regularization
works by adding the penalty that is associated with coefficient values to the
error of the hypothesis. An accurate model with extreme coefficient values would
be penalized more, but a less accurate model with more conservative values would
be penalized less. This learner supports [elastic net
regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a
linear combination of L1-norm (LASSO), $|| \boldsymbol{w} ||_1$, and L2-norm
(ridge), $|| \boldsymbol{w} ||_2^2$ regularizations. L1-nrom and L2-norm
regularizations have different effects and uses that are complementary in
certain respects. Using L1-norm can increase sparsity of the trained
$\boldsymbol{w}$. When working with high-dimensional data, it shrinks small
weights of irrevalent features to 0 and therefore no reource will be spent on
those bad features when making prediction. L2-norm regularization is preferable
for data that is not sparse and it largely penalizes the existence of large
weights.

For more information, see:
* [Scaling Up Stochastic Dual Coordinate
Expand Down
59 changes: 56 additions & 3 deletions src/Microsoft.ML.StandardTrainers/Standard/SdcaBinary.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1550,12 +1550,34 @@ private protected override BinaryPredictionTransformer<TModelParameters> MakeTra
/// The trained model is <a href='https://en.wikipedia.org/wiki/Calibration_(statistics)'>calibrated</a> and can produce probability by feeding the output value of the
/// linear function to a <see cref="PlattCalibrator"/>.
/// </summary>
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
/// <remarks>
/// <format type="text/markdown"><![CDATA[
/// To create this trainer, use [SdcaLogisticRegression](xref:Microsoft.ML.StandardTrainersCatalog.SdcaLogisticRegression(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,System.String,System.String,System.String,System.Nullable{System.Single},System.Nullable{System.Single},System.Nullable{System.Int32}))
/// or [SdcaLogisticRegression(Options)](xref:Microsoft.ML.StandardTrainersCatalog.SdcaLogisticRegression(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaLogisticRegressionBinaryTrainer.Options)).
///
/// [!include[io](~/../docs/samples/docs/api-reference/io-columns-binary-classification.md)]
///
/// ### Trainer Characteristics
/// | | |
/// | -- | -- |
/// | Machine learning task | Binary classification |
/// | Is normalization required? | Yes |
/// | Is caching required? | No |
/// | Required NuGet in addition to Microsoft.ML | None |
///
/// [!include[algorithm](~/../docs/samples/docs/api-reference/algo-details-sdca.md)]
/// ]]>
/// </format>
/// </remarks>
/// <seealso cref="StandardTrainersCatalog.SdcaLogisticRegression(BinaryClassificationCatalog.BinaryClassificationTrainers, string, string, string, float?, float?, int?)"/>
/// <seealso cref="StandardTrainersCatalog.SdcaLogisticRegression(BinaryClassificationCatalog.BinaryClassificationTrainers, SdcaLogisticRegressionBinaryTrainer.Options)"/>
/// <seealso cref="Options"/>
public sealed class SdcaLogisticRegressionBinaryTrainer :
SdcaBinaryTrainerBase<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>
{
/// <summary>
/// Options for the <see cref="SdcaLogisticRegressionBinaryTrainer"/>.
/// Options for the <see cref="SdcaLogisticRegressionBinaryTrainer"/> as used in
/// [SdcaLogisticRegression(Options)](xref:Microsoft.ML.StandardTrainersCatalog.SdcaLogisticRegression(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaLogisticRegressionBinaryTrainer.Options)).
/// </summary>
public sealed class Options : BinaryOptionsBase
{
Expand Down Expand Up @@ -1614,7 +1636,38 @@ private protected override SchemaShape.Column[] ComputeSdcaBinaryClassifierSchem
/// <summary>
/// The <see cref="IEstimator{TTransformer}"/> for training a binary logistic regression classification model using the stochastic dual coordinate ascent method.
/// </summary>
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
/// <remarks>
/// <format type="text/markdown"><![CDATA[
/// To create this trainer, use [SdcaNonCalibrated](xref:Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,System.String,System.String,System.String,Microsoft.ML.Trainers.ISupportSdcaClassificationLoss,System.Nullable{System.Single},System.Nullable{System.Single},System.Nullable{System.Int32}))
/// or [SdcaNonCalibrated(Options)](xref:Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaNonCalibratedBinaryTrainer.Options)).
///
/// [!include[io](~/../docs/samples/docs/api-reference/io-columns-binary-classification.md)]
///
/// ### Trainer Characteristics
/// | | |
/// | -- | -- |
/// | Machine learning task | Binary classification |
/// | Is normalization required? | Yes |
/// | Is caching required? | No |
/// | Required NuGet in addition to Microsoft.ML | None |
///
/// ### Training Algorithm Details
/// This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a state-of-the-art optimization technique for convex objective functions.
/// The algorithm can be scaled for use on large out-of-memory data sets due to a semi-asynchronized implementation that supports multi-threading.
/// Convergence is underwritten by periodically enforcing synchronization between primal and dual updates in a separate thread.
/// Several choices of loss functions are also provided.The SDCA method combines several of the best properties and capabilities of logistic regression and SVM algorithms.
/// Note that SDCA is a stochastic and streaming optimization algorithm. The results depends on the order of the training data.
/// For reproducible results, it is recommended that one sets 'Shuffle' to False and 'NumThreads' to 1.
/// Elastic net regularization can be specified by the 'L2Const' and 'L1Threshold' parameters. Note that the 'L2Const' has an effect on the rate of convergence.
/// In general, the larger the 'L2Const', the faster SDCA converges.
/// For more information, see: [Scaling Up Stochastic Dual Coordinate Ascent](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/main-3.pdf ) and
/// [Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization](http://www.jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf).
/// ]]>
/// </format>
/// </remarks>
/// <seealso cref="Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,System.String,System.String,System.String,Microsoft.ML.Trainers.ISupportSdcaClassificationLoss,System.Nullable{System.Single},System.Nullable{System.Single},System.Nullable{System.Int32})"/>
/// <seealso cref="Microsoft.ML.StandardTrainersCatalog.SdcaNonCalibrated(Microsoft.ML.BinaryClassificationCatalog.BinaryClassificationTrainers,Microsoft.ML.Trainers.SdcaNonCalibratedBinaryTrainer.Options)"/>
/// <seealso cref="Options"/>
public sealed class SdcaNonCalibratedBinaryTrainer : SdcaBinaryTrainerBase<LinearBinaryModelParameters>
{
/// <summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ namespace Microsoft.ML.Trainers
/// | Is caching required? | No |
/// | Required NuGet in addition to Microsoft.ML | None |
///
/// [!include[io](~/../docs/samples/docs/api-reference/algo-details-sdca.md)]
/// [!include[algorithm](~/../docs/samples/docs/api-reference/algo-details-sdca.md)]
/// ]]>
/// </format>
/// </remarks>
Expand Down
10 changes: 5 additions & 5 deletions src/Microsoft.ML.StandardTrainers/StandardTrainersCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ public static SdcaLogisticRegressionBinaryTrainer SdcaLogisticRegression(
}

/// <summary>
/// Create <see cref="SdcaLogisticRegressionBinaryTrainer"/> using advanced options, which predicts a target using a linear classification model.
/// Create <see cref="SdcaLogisticRegressionBinaryTrainer"/> with advanced options, which predicts a target using a linear classification model.
/// </summary>
/// <param name="catalog">The binary classification catalog trainer object.</param>
/// <param name="options">Trainer options.</param>
Expand All @@ -233,11 +233,11 @@ public static SdcaLogisticRegressionBinaryTrainer SdcaLogisticRegression(
}

/// <summary>
/// Predict a target using a linear classification model trained with <see cref="SdcaNonCalibratedBinaryTrainer"/>.
/// Create <see cref="SdcaNonCalibratedBinaryTrainer"/>, which predicts a target using a linear classification model.
/// </summary>
/// <param name="catalog">The binary classification catalog trainer object.</param>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="featureColumnName">The name of the feature column.</param>
/// <param name="labelColumnName">The name of the label column. The column data must be <see cref="System.Boolean"/>.</param>
/// <param name="featureColumnName">The name of the feature column. The column data must be a known-sized vector of <see cref="System.Single"/>.</param>
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
/// <param name="lossFunction">The <a href="https://en.wikipedia.org/wiki/Loss_function">loss</a> function minimized in the training process. Defaults to <see cref="LogLoss"/> if not specified.</param>
/// <param name="l2Regularization">The L2 weight for <a href='https://en.wikipedia.org/wiki/Regularization_(mathematics)'>regularization</a>.</param>
Expand Down Expand Up @@ -265,7 +265,7 @@ public static SdcaNonCalibratedBinaryTrainer SdcaNonCalibrated(
}

/// <summary>
/// Predict a target using a linear classification model trained with <see cref="SdcaNonCalibratedBinaryTrainer"/> and advanced options.
/// Create <see cref="SdcaNonCalibratedBinaryTrainer"/> using advanced options, which predicts a target using a linear classification model trained over boolean label data.
/// </summary>
/// <param name="catalog">The binary classification catalog trainer object.</param>
/// <param name="options">Trainer options.</param>
Expand Down