API Reference needs to include expected column types #3127

singlis · 2019-03-28T01:28:55Z

Issue

Our API documentation for trainers, evaluate and cross validate need to specify the expected column types.

For example:

Parameters
labelColumn
String
The name of the label column.

matrixColumnIndexColumnName
String
The name of the column hosting the matrix's column IDs.

matrixRowIndexColumnName
String
The name of the column hosting the matrix's row IDs.

Taken from here:
Matrix Factorization Help

Note that this takes:

Label Column (really name of the label column) and that the type is string
MatrixIndexColumnName -- also string
MatrixRowIndexColumnName - also string

The type string provides no information on the actual expected/supported column type.

Expected

There needs to be more documentation regarding the column types that trainers are expecting and if that trainer will add additional columns as a result of the transformation.

Suggestion

This can be added to the parameter description, for example:
The name of the label column. The label column must be one of the following ColumnType: DataKind.Int64, DataKind.Float,...</param>
Additional content regarding if columns are added and what those columns are should be added in the Remarks section. Columns that are added should also include their ColumnType as well.

The text was updated successfully, but these errors were encountered:

singlis · 2019-03-28T01:30:14Z

Note that this also applies to Evaluate and CrossValidate api references.

sfilipi · 2019-04-02T17:57:19Z

I think we should put the suggestion for the data type of the label in the summary of the trainer extensions documentation, since that is the string that intellisense displays when it gets added.

singlis · 2019-04-04T22:08:41Z

I flushed out an example with the FieldAwareFactorizationMachine. I put this on the extension method. The FieldAwareFactorizationMachine takes in a featureColumnName, labelColumnName and exampleWeightColumnName -- for each parameter that is a column name, these now have additional text in the param reference that explains the expected column type.

The FieldAwareFactorizationMachine also adds columns to the transformed data. In order to document the added columns, I added this to remarks using the xml docs table (really as a list with the type set as table). The table has the column name, expected column type, and a description of what the column is

@wschin - this could be on the GetOutputSchema instead, but if we document the extension method rather than the class, this would be harder to find.

@sfilipi the parameter reference for label column can be dupped in the summary if needed.

Here is the sample:

        /// <summary>
        /// Predict a target using a field-aware factorization machine algorithm.
        /// </summary>
        /// <remarks>
        /// Note that because there is only one feature column, the underlying model is equivalent to standard factorization machine.
        /// The following columns will be added to the <see cref="IDataView"/> after transform:
        /// <list type="table">
        ///     <listheader>
        ///         <term>Column Name</term>
        ///         <term>Column Type</term>
        ///         <term>Description</term>
        ///     </listheader>
        ///     <item>
        ///         <term>Score</term>
        ///         <term><see cref="DataKind.Single"/></term>
        ///         <term>The unbounded score that was calculated by the trainer to determine the prediction.</term>
        ///     </item>
        ///     <item>
        ///         <term>PredictedLabel</term>
        ///         <term><see cref="DataKind.Boolean"/></term>
        ///         <term>The predicted label made by the trainer.</term>
        ///     </item>
        ///     <item>
        ///         <term>Probability</term>
        ///         <term><see cref="DataKind.Single"/></term>
        ///         <term>The probability of the score, this is used to determine the final predicted label.</term>
        ///     </item>
        /// </list>
        /// </remarks>
        /// <param name="catalog">The binary classification catalog trainer object.</param>
        /// <param name="featureColumnName">The name of the feature column. The <paramref name="featureColumnName"/> must refer to a column of type <see cref="DataKind.Single"/></param>
        /// <param name="labelColumnName">The name of the label column. The <paramref name="labelColumnName"/> must refer to a column of type <see cref="DataKind.Boolean"/></param>
        /// <param name="exampleWeightColumnName">The name of the example weight column (optional). The <paramref name="exampleWeightColumnName"/> must refer to a column of type <see cref="DataKind.Single"/></param>

singlis · 2019-04-04T22:25:59Z

cc @shmoradims @glebuk as well.

singlis · 2019-04-09T16:46:30Z

I've broken down the items that need to be updated based upon the catalogs:

Catalog	APIs	Issue Reference
AnomalyDetection	Evaluate Trainers.RandomizedPca
BinaryClassification	CrossValidate CrossValidateNonCalibrated Evaluate EvaluateNonCalibrated PermutationFeatureImportance Trainers.AveragedPerceptron Trainers.FastForest Trainers.FastTree Trainers.FieldAwareFactorizationMachine Trainers.GeneralizedAdditiveModels Trainers.LightGbm Trainers.LinearSupportVectorMachines Trainers.LogisticRegression Trainers.StochasticDualCoordinateAscent Trainers.StochasticDualCoordinateNonCalibrated Trainers.StochasticGradientDescent Trainers.StochasticGradientDescentNonCalibrated Trainers.SymbolicStochasticGradientDescent
Clustering	CrossValidate Evaluate Trainers.KMeans
MulticlassClassification	CrossValidate Evaluate PermutationFeatureImportance Trainers.LightGbm Trainers.LogisticRegression Trainers.NaiveBayes Trainers.OneVsAll Trainers.PairwiseCoupling Trainers.StochasticDualCoordinateAscent
Ranking	Evaluate PermutationFeatureImportance Trainers.FastTree Trainers.LightGbm
Regression	CrossValidate Evaluate PermutationFeatureImportance Trainers.FastForest Trainers.FastTree Trainers.FastTreeTweedie Trainers.GeneralizedAdditiveModels Trainers.LightGbm Trainers.OnlineGradientDescent Trainers.OrdinaryLeastSquares Trainers.PoissonRegression Trainers.StochasticDualCoordinateAscent

shmoradims · 2019-05-20T22:50:45Z

I believe the input/output types were addressed for all trainers and transforms during the API reference project. Here's an example for FFM with input/output sub-section in remarks:
https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.trainers.fieldawarefactorizationmachinetrainer?view=ml-dotnet

singlis added the documentation Related to documentation of ML.NET label Mar 28, 2019

singlis mentioned this issue Mar 28, 2019

Getting started with ML .NET with in-memory data is *painful*. #3037

Closed

singlis mentioned this issue Apr 4, 2019

API reference - XML documentation template for transforms #3204

Closed

shmoradims closed this as completed May 20, 2019

ghost locked as resolved and limited conversation to collaborators Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference needs to include expected column types #3127

API Reference needs to include expected column types #3127

singlis commented Mar 28, 2019

singlis commented Mar 28, 2019

sfilipi commented Apr 2, 2019 •

edited

Loading

singlis commented Apr 4, 2019 •

edited

Loading

singlis commented Apr 4, 2019

singlis commented Apr 9, 2019 •

edited

Loading

shmoradims commented May 20, 2019

API Reference needs to include expected column types #3127

API Reference needs to include expected column types #3127

Comments

singlis commented Mar 28, 2019

Issue

Expected

Suggestion

singlis commented Mar 28, 2019

sfilipi commented Apr 2, 2019 • edited Loading

singlis commented Apr 4, 2019 • edited Loading

singlis commented Apr 4, 2019

singlis commented Apr 9, 2019 • edited Loading

shmoradims commented May 20, 2019

sfilipi commented Apr 2, 2019 •

edited

Loading

singlis commented Apr 4, 2019 •

edited

Loading

singlis commented Apr 9, 2019 •

edited

Loading