Skip to content

Comments added to LearningPipeline class to make Intellisense more helpful. #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 9, 2018
90 changes: 90 additions & 0 deletions src/Microsoft.ML/LearningPipeline.cs
Original file line number Diff line number Diff line change
Expand Up @@ -26,25 +26,115 @@ public ScorerPipelineStep(Var<IDataView> data, Var<ITransformModel> model)
public Var<ITransformModel> Model { get; }
}


/// <summary>
/// LearningPipeline class is used to define the steps needed to perform desired machine learning task.<para/>
Copy link
Contributor

@TomFinley TomFinley May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perform desired [](start = 69, length = 15)

"perform desired" => "perform a desired"? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LearningPipeline [](start = 8, length = 16)

Class names should appear in <see tags`. However in this case, given that this is describing this class, I might just replace "LearningPipeline class is used" with "This class is used..."

/// The steps are defined by adding a data loader (e.g. <see cref="TextLoader"/>) followed by zero or more transforms (e.g. <see cref="Microsoft.ML.Transforms.TextFeaturizer"/>)
/// and atmost one trainer/learner (e.g. <see cref="Microsoft.ML.Trainers.FastTreeBinaryClassifier"/>) in the pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type-o atmost should be 2 words.

///
/// Data can be analyzed at every step by inspecting the LearningPipeline object in VS.Net debugger.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this line provides much value. Do you think we need it?

/// <example>
Copy link
Member

@eerhardt eerhardt May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/xmldoc/example it shows that <example> is not nested inside the <summary> section. Instead, it is its own top-level section. #Resolved

/// <para/>
/// For example,<para/>
/// <code>
/// var pipeline = new LearningPipeline();
/// pipeline.Add(new TextLoader &lt;SentimentData&gt; (dataPath, separator: ","));
/// pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
/// pipeline.Add(new FastTreeBinaryClassifier());
///
/// var model = pipeline.Train&lt;SentimentData, SentimentPrediction&gt;();
/// </code>
/// </example>
/// </summary>
[DebuggerTypeProxy(typeof(LearningPipelineDebugProxy))]
public class LearningPipeline : ICollection<ILearningPipelineItem>
{
private List<ILearningPipelineItem> Items { get; } = new List<ILearningPipelineItem>();

/// <summary>
/// Construct an empty LearningPipeline object.
Copy link
Contributor

@TomFinley TomFinley May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LearningPipeline [](start = 31, length = 16)

Generally use <see here. #Closed

/// </summary>
public LearningPipeline()
{
}

/// <summary>
/// Get the count of ML components in the LearningPipeline object
/// </summary>
public int Count => Items.Count;
public bool IsReadOnly => false;

/// <summary>
/// Add a data loader, transform or trainer into the pipeline.
/// Possible data loader(s), transforms and trainers options are
/// <para>
/// Data Loader:
/// <see cref="Microsoft.ML.TextLoader{TInput}" />
/// etc.
/// </para>
/// <para>
/// Transforms:
/// <see cref="Microsoft.ML.Transforms.Dictionarizer"/>,
/// <see cref="Microsoft.ML.Transforms.CategoricalOneHotVectorizer"/>
/// <see cref="Microsoft.ML.Transforms.MinMaxNormalizer"/>,
/// <see cref="Microsoft.ML.Transforms.ColumnCopier"/>,
/// <see cref="Microsoft.ML.Transforms.ColumnConcatenator"/>,
/// <see cref="Microsoft.ML.Transforms.TextFeaturizer"/>,
/// etc.
/// </para>
/// <para>
/// Trainers:
/// <see cref="Microsoft.ML.Trainers.AveragedPerceptronBinaryClassifier"/>,
/// <see cref="Microsoft.ML.Trainers.LogisticRegressor"/>,
/// <see cref="Microsoft.ML.Trainers.StochasticDualCoordinateAscentClassifier"/>,
/// <see cref="Microsoft.ML.Trainers.FastTreeRegressor"/>,
/// etc.
/// </para>
/// For a complete list of transforms and trainers, please see "Microsoft.ML.Transforms" and "Microsoft.ML.Trainers" namespaces.
/// </summary>
/// <param name="item"></param>
public void Add(ILearningPipelineItem item) => Items.Add(item);

/// <summary>
/// Remove all the transforms/trainers from the pipeline.
Copy link
Contributor

@TomFinley TomFinley May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transforms/trainers [](start = 27, length = 19)

Not loaders? It seems like this would also clear any loaders. #Closed

/// </summary>
public void Clear() => Items.Clear();

/// <summary>
/// Check if a specific loader/transform/trainer is in the pipeline?
/// </summary>
/// <param name="item">Any ML component (data loader, transform or trainer) defined as ILearningPipelineItem.</param>
Copy link
Contributor

@TomFinley TomFinley May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ILearningPipelineItem [](start = 95, length = 21)

When having type names in documentation, it may be good practice for us to use the <see tags, so this would become <see cref="ILearningPipelineItem" />. The reason is: the XML docs will be generated with links, and attempts to, say, rename the type in VS will track these usages as well. #Closed

/// <returns>true/false</returns>
Copy link
Contributor

@TomFinley TomFinley May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns>true/false</returns [](start = 13, length = 27)

That the return value is either true or false is implied by the return value being bool. Can this be improved somehow? It seems like any "improvement" would just involve restating what what already written in the description, so is it acceptable in situations like this to just not include a <returns> tag? #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I typically copy the MSDN docs: https://msdn.microsoft.com/en-us/library/k5cf1d56(v=vs.110).aspx

Return Value
Type: System.Boolean
true if item is found in the ICollection; otherwise, false.

public bool Contains(ILearningPipelineItem item) => Items.Contains(item);

/// <summary>
/// Copy the pipeline items into an array.
/// </summary>
/// <param name="array">Array the items are copied to.</param>
/// <param name="arrayIndex">Index to start copying from.</param>
Copy link
Contributor

@TomFinley TomFinley May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index to start copying from [](start = 37, length = 27)

The phrase "Index to start copying from", the "from" suggests that this is an index on the source, but rather, it is an index on the destination.

Perhaps the word "into" vs. "from" would be more clear... plus an explicit paramref, just to make things absolutely crystal clear. So something like maybe, Index of <paramref name="array" /> we start copying to. I imagine it could be wordsmithed better. :) #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically copy the MSDN docs for comments like this. See ICollection.CopyTo

array
Type: System.Array
The one-dimensional Array that is the destination of the elements copied from ICollection. The Array must have zero-based indexing.

index
Type: System.Int32
The zero-based index in array at which copying begins.

public void CopyTo(ILearningPipelineItem[] array, int arrayIndex) => Items.CopyTo(array, arrayIndex);
public IEnumerator<ILearningPipelineItem> GetEnumerator() => Items.GetEnumerator();

/// <summary>
/// Remove an item from the pipeline.
/// </summary>
/// <param name="item">ILearningPipelineItem to remove.</param>
/// <returns>true/false</returns>
public bool Remove(ILearningPipelineItem item) => Items.Remove(item);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

/// <summary>
/// Train the model using the ML components in the pipeline.
/// </summary>
/// <typeparam name="TInput">Type of data instances the model will be trained on. It's a custom type defined by the user according to the structure of data.
/// <para/>
/// E.g. please see "Microsoft.ML.Scenarios.ScenarioTests.SentimentData" in "Microsoft.ML.Tests.csproj" for input type definition for sentiment classification task.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think typical users will have access to the repo. So putting a reference like this in the XML ref docs wouldn't help that much. Maybe a FWLink to the 'getting started' doc instead?

/// The type is defined for a .csv file that contains sentiment classification data with Sentiment and SentimentText as two columns in the .csv file.
/// </typeparam>
/// <typeparam name="TOutput">Ouput type. The prediction will be return based on this type.
/// E.g. for sentiment classifcation scenario, the prediction type is defined at "Microsoft.ML.Scenarios.ScenarioTests.SentimentPrediction" in "Microsoft.ML.Tests.csproj".
/// </typeparam>
/// <returns>PredictionModel object. This is the model object used for prediction on new instances. </returns>
public PredictionModel<TInput, TOutput> Train<TInput, TOutput>()
where TInput : class
where TOutput : class, new()
Expand Down