Skip to content

API Reference needs to include expected column types #3127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
singlis opened this issue Mar 28, 2019 · 6 comments
Closed

API Reference needs to include expected column types #3127

singlis opened this issue Mar 28, 2019 · 6 comments
Labels
documentation Related to documentation of ML.NET

Comments

@singlis
Copy link
Member

singlis commented Mar 28, 2019

Issue

Our API documentation for trainers, evaluate and cross validate need to specify the expected column types.

For example:

Parameters
labelColumn
String
The name of the label column.

matrixColumnIndexColumnName
String
The name of the column hosting the matrix's column IDs.

matrixRowIndexColumnName
String
The name of the column hosting the matrix's row IDs.

Taken from here:
Matrix Factorization Help

Note that this takes:

  • Label Column (really name of the label column) and that the type is string
  • MatrixIndexColumnName -- also string
  • MatrixRowIndexColumnName - also string

The type string provides no information on the actual expected/supported column type.

Expected

There needs to be more documentation regarding the column types that trainers are expecting and if that trainer will add additional columns as a result of the transformation.

Suggestion

This can be added to the parameter description, for example:
The name of the label column. The label column must be one of the following ColumnType: DataKind.Int64, DataKind.Float,...</param>
Additional content regarding if columns are added and what those columns are should be added in the Remarks section. Columns that are added should also include their ColumnType as well.

@singlis singlis added the documentation Related to documentation of ML.NET label Mar 28, 2019
@singlis
Copy link
Member Author

singlis commented Mar 28, 2019

Note that this also applies to Evaluate and CrossValidate api references.

@sfilipi
Copy link
Member

sfilipi commented Apr 2, 2019

I think we should put the suggestion for the data type of the label in the summary of the trainer extensions documentation, since that is the string that intellisense displays when it gets added.

@singlis
Copy link
Member Author

singlis commented Apr 4, 2019

I flushed out an example with the FieldAwareFactorizationMachine. I put this on the extension method. The FieldAwareFactorizationMachine takes in a featureColumnName, labelColumnName and exampleWeightColumnName -- for each parameter that is a column name, these now have additional text in the param reference that explains the expected column type.

The FieldAwareFactorizationMachine also adds columns to the transformed data. In order to document the added columns, I added this to remarks using the xml docs table (really as a list with the type set as table). The table has the column name, expected column type, and a description of what the column is

@wschin - this could be on the GetOutputSchema instead, but if we document the extension method rather than the class, this would be harder to find.

@sfilipi the parameter reference for label column can be dupped in the summary if needed.

Here is the sample:

        /// <summary>
        /// Predict a target using a field-aware factorization machine algorithm.
        /// </summary>
        /// <remarks>
        /// Note that because there is only one feature column, the underlying model is equivalent to standard factorization machine.
        /// The following columns will be added to the <see cref="IDataView"/> after transform:
        /// <list type="table">
        ///     <listheader>
        ///         <term>Column Name</term>
        ///         <term>Column Type</term>
        ///         <term>Description</term>
        ///     </listheader>
        ///     <item>
        ///         <term>Score</term>
        ///         <term><see cref="DataKind.Single"/></term>
        ///         <term>The unbounded score that was calculated by the trainer to determine the prediction.</term>
        ///     </item>
        ///     <item>
        ///         <term>PredictedLabel</term>
        ///         <term><see cref="DataKind.Boolean"/></term>
        ///         <term>The predicted label made by the trainer.</term>
        ///     </item>
        ///     <item>
        ///         <term>Probability</term>
        ///         <term><see cref="DataKind.Single"/></term>
        ///         <term>The probability of the score, this is used to determine the final predicted label.</term>
        ///     </item>
        /// </list>
        /// </remarks>
        /// <param name="catalog">The binary classification catalog trainer object.</param>
        /// <param name="featureColumnName">The name of the feature column. The <paramref name="featureColumnName"/> must refer to a column of type <see cref="DataKind.Single"/></param>
        /// <param name="labelColumnName">The name of the label column. The <paramref name="labelColumnName"/> must refer to a column of type <see cref="DataKind.Boolean"/></param>
        /// <param name="exampleWeightColumnName">The name of the example weight column (optional). The <paramref name="exampleWeightColumnName"/> must refer to a column of type <see cref="DataKind.Single"/></param>

@singlis
Copy link
Member Author

singlis commented Apr 4, 2019

cc @shmoradims @glebuk as well.

@singlis
Copy link
Member Author

singlis commented Apr 9, 2019

I've broken down the items that need to be updated based upon the catalogs:

Catalog APIs Issue Reference
AnomalyDetection
  • Evaluate
  • Trainers.RandomizedPca
BinaryClassification
  • CrossValidate
  • CrossValidateNonCalibrated
  • Evaluate
  • EvaluateNonCalibrated
  • PermutationFeatureImportance
  • Trainers.AveragedPerceptron
  • Trainers.FastForest
  • Trainers.FastTree
  • Trainers.FieldAwareFactorizationMachine
  • Trainers.GeneralizedAdditiveModels
  • Trainers.LightGbm
  • Trainers.LinearSupportVectorMachines
  • Trainers.LogisticRegression
  • Trainers.StochasticDualCoordinateAscent
  • Trainers.StochasticDualCoordinateNonCalibrated
  • Trainers.StochasticGradientDescent
  • Trainers.StochasticGradientDescentNonCalibrated
  • Trainers.SymbolicStochasticGradientDescent
Clustering
  • CrossValidate
  • Evaluate
  • Trainers.KMeans
MulticlassClassification
  • CrossValidate
  • Evaluate
  • PermutationFeatureImportance
  • Trainers.LightGbm
  • Trainers.LogisticRegression
  • Trainers.NaiveBayes
  • Trainers.OneVsAll
  • Trainers.PairwiseCoupling
  • Trainers.StochasticDualCoordinateAscent
Ranking
  • Evaluate
  • PermutationFeatureImportance
  • Trainers.FastTree
  • Trainers.LightGbm
Regression
  • CrossValidate
  • Evaluate
  • PermutationFeatureImportance
  • Trainers.FastForest
  • Trainers.FastTree
  • Trainers.FastTreeTweedie
  • Trainers.GeneralizedAdditiveModels
  • Trainers.LightGbm
  • Trainers.OnlineGradientDescent
  • Trainers.OrdinaryLeastSquares
  • Trainers.PoissonRegression
  • Trainers.StochasticDualCoordinateAscent

@shmoradims
Copy link

I believe the input/output types were addressed for all trainers and transforms during the API reference project. Here's an example for FFM with input/output sub-section in remarks:
https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.trainers.fieldawarefactorizationmachinetrainer?view=ml-dotnet

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

No branches or pull requests

3 participants