Skip to content

Towards #3204 - ColumCopying documentation #3316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 15, 2019
Merged

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Apr 12, 2019

Adhering to the template in #3204 (comment) for the ColumnCopying estimator extensions, estimator, transformer.

@sfilipi sfilipi requested review from artidoro, shmoradims, natke and zeahmed and removed request for artidoro and zeahmed April 12, 2019 20:07
@codecov
Copy link

codecov bot commented Apr 12, 2019

Codecov Report

Merging #3316 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3316      +/-   ##
==========================================
+ Coverage   72.65%   72.65%   +<.01%     
==========================================
  Files         807      807              
  Lines      145191   145191              
  Branches    16223    16223              
==========================================
+ Hits       105485   105486       +1     
  Misses      35290    35290              
+ Partials     4416     4415       -1
Flag Coverage Δ
#Debug 72.65% <ø> (ø) ⬆️
#production 68.18% <ø> (ø) ⬆️
#test 88.97% <ø> (ø) ⬆️
Impacted Files Coverage Δ
.../Microsoft.ML.Data/Transforms/ExtensionsCatalog.cs 100% <ø> (ø) ⬆️
src/Microsoft.ML.Data/Transforms/ColumnCopying.cs 85.43% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <0%> (ø) ⬆️
src/Microsoft.ML.Transforms/NormalizerCatalog.cs 84.78% <0%> (ø) ⬆️
...StandardTrainers/Standard/LinearModelParameters.cs 60.31% <0%> (+0.26%) ⬆️

@@ -31,8 +31,22 @@
namespace Microsoft.ML.Transforms
{
/// <summary>
/// <see cref="ColumnCopyingEstimator"/> copies the input column to another column named as specified in the parameters of the transformation.
/// Estimator creating the <see cref="ColumnCopyingTransformer"/>.
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimator [](start = 8, length = 9)

Are we doing cref=IEstimator like in trainers, or is that too much? #Closed

Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the wording in the trainers i.e. "The for the XXX transformer." (Not fussed about whether we cref the Estimator or not).

I think using creating here is confusing, as we create the estimator object using the extension method. #Resolved

@@ -51,11 +51,15 @@ public InputOutputColumnPair(string outputColumnName, string inputColumnName = n
public static class TransformExtensionsCatalog
{
/// <summary>
/// Copies the input column to another column named as specified in <paramref name="outputColumnName"/>.
/// Creates a <see cref="ColumnCopyingEstimator"/> that copies the data from the column specified in <paramref name="inputColumnName"/>
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creates [](start = 12, length = 7)

Create #Closed

@@ -51,11 +51,15 @@ public InputOutputColumnPair(string outputColumnName, string inputColumnName = n
public static class TransformExtensionsCatalog
{
/// <summary>
/// Copies the input column to another column named as specified in <paramref name="outputColumnName"/>.
/// Creates a <see cref="ColumnCopyingEstimator"/> that copies the data from the column specified in <paramref name="inputColumnName"/>
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that [](start = 58, length = 5)

'that' -> ', which' ? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this that is valid, unless is a person. Then we need to use which? @natke?


In reply to: 275065191 [](ancestors = 275065191)

Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one that always gets me. Strictly, "that" is used when you are defining the thing you are talking about, and "which" is used when you are giving extra information about the thing. In this case I would say we are in the second category because we know we are talking about a ColumnCopyEstimator. That is a very long-winded way of saying my vote goes with "Creates a ### , which copies the data from the column specified in " #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clearing it!


In reply to: 275080289 [](ancestors = 275080289)

/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param>
/// <param name="inputColumnName">Name of the columns to transform.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// The type of data, is the same as the one in the input column.</param>
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of data, [](start = 12, length = 17)

This column's data type will be the same as that of the input column? #Resolved

/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// The type of data, is the same as the one in the input column.</param>
/// <param name="inputColumnName">Name of the column to copy the data from.
/// This estimator operates over any type of data.</param>
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type of data [](start = 45, length = 12)

type of data -> data type?
#Closed

/// </summary>
/// <remarks>For more information on this estimator see <see cref="ColumnCopyingEstimator"/>.</remarks>
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on this estimator [](start = 41, length = 18)

let's just drop 'on this estimator' b/c this is the extension method. I think 'for more information see ...' is enough #Resolved

@@ -67,6 +81,10 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema)
}
}

/// <summary>
/// Transformer that creates a new column, named as specified in the output column name parameters, and
/// copies the data from the input column to this new column.
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ColumnCopyingEstimator should be the main page and this content should go there.

Here we can just say "The transformer resulted from fitting <cref=ColumnCopyingEstimator>" or something short that links back to the estimator.

#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!


In reply to: 275069923 [](ancestors = 275069923)

/// <remarks>
/// <format type="text/markdown"><![CDATA[
/// ### Estimator FAQ
/// | Question | Answer |
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change similar to my PR #Resolved

/// | Question | Answer |
/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | No |
/// | Input column data type. | Any |
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. [](start = 32, length = 1)

is the period necessary? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the above have a ? feels natural to have a . here. IDK, @natke?


In reply to: 275070222 [](ancestors = 275070222)

Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no to the "." :-) #Resolved

/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | No |
/// | Input column data type. | Any |
/// | Output column data type. | The same as the data type in the input column. |
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add a new row for "Associated transformer"? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? you mean the transformer that gets generated from Fit?

it is right below the table.


In reply to: 275070613 [](ancestors = 275070613)

/// | Does this estimator need to look at the data to train its parameters? | No |
/// | Input column data type. | Any |
/// | Output column data type. | The same as the data type in the input column. |
///
Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// [](start = 4, length = 3)

we also need 'additional nugets' here

Copy link
Member Author

@sfilipi sfilipi Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that is going even on that components that are part of Microsoft.ML NuGet? Should we put it only on the ones that are not on the main NuGet?


In reply to: 275070680 [](ancestors = 275070680)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|"Required NuGet in addition to Microsoft.ML" | None|


In reply to: 275072871 [](ancestors = 275072871,275070680)

@@ -31,8 +31,22 @@
namespace Microsoft.ML.Transforms
{
/// <summary>
/// <see cref="ColumnCopyingEstimator"/> copies the input column to another column named as specified in the parameters of the transformation.
/// Estimator creating the <see cref="ColumnCopyingTransformer"/>.
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the wording in the trainers i.e. "The for the XXX transformer." (Not fussed about whether we cref the Estimator or not).

I think using creating here is confusing, as we create the estimator object using the extension method. #Resolved

/// | Question | Answer |
/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | No |
/// | Input column data type. | Any |
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no to the "." :-) #Resolved

@@ -51,11 +51,15 @@ public InputOutputColumnPair(string outputColumnName, string inputColumnName = n
public static class TransformExtensionsCatalog
{
/// <summary>
/// Copies the input column to another column named as specified in <paramref name="outputColumnName"/>.
/// Creates a <see cref="ColumnCopyingEstimator"/> that copies the data from the column specified in <paramref name="inputColumnName"/>
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one that always gets me. Strictly, "that" is used when you are defining the thing you are talking about, and "which" is used when you are giving extra information about the thing. In this case I would say we are in the second category because we know we are talking about a ColumnCopyEstimator. That is a very long-winded way of saying my vote goes with "Creates a ### , which copies the data from the column specified in " #Resolved

/// </summary>
/// <param name="catalog">The transform's catalog</param>
/// <param name="columns">The pairs of input and output columns.</param>
/// <remarks>This estimator can operate over several columns.
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say this transform? #Resolved

/// <param name="catalog">The transform's catalog</param>
/// <param name="columns">The pairs of input and output columns.</param>
/// <remarks>This estimator can operate over several columns.
/// For more information on this estimator see <see cref="ColumnCopyingEstimator"/>.</remarks>
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop estimator? #Resolved

/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | No |
/// | Input column data type | Any |
/// | Output column data type | The same as the data type in the input column. |
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nitpick: remove the "." #Resolved

@@ -51,11 +51,15 @@ public InputOutputColumnPair(string outputColumnName, string inputColumnName = n
public static class TransformExtensionsCatalog
{
/// <summary>
/// Copies the input column to another column named as specified in <paramref name="outputColumnName"/>.
/// Create a <see cref="ColumnCopyingEstimator"/> which copies the data from the column specified in <paramref name="inputColumnName"/>
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit: add "," before which #Resolved

@@ -67,11 +71,13 @@ public static ColumnCopyingEstimator CopyColumns(this TransformsCatalog catalog,
=> new ColumnCopyingEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, inputColumnName);

/// <summary>
/// Copies the input column, name specified in the first item of the tuple,
/// to another column, named as specified in the second item of the tuple.
/// Create a <see cref="ColumnCopyingEstimator"/> which copies the data from the column specified in <see cref="InputOutputColumnPair.InputColumnName" />
Copy link
Contributor

@natke natke Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add "," before which #Resolved

@sfilipi sfilipi self-assigned this Apr 13, 2019
@sfilipi sfilipi requested review from rogancarr and zeahmed April 15, 2019 16:38
/// copies the data from the input column to this new column.
/// ]]>
/// </format>
/// </remarks>
Copy link
Member

@wschin wschin Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe at least one way to use this transform? Maybe add a link to ../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/CopyColumns.cs like what we will do to trainers? If a user is reading this document, they want to use it in their pipeline. #Resolved

@@ -51,11 +51,15 @@ public InputOutputColumnPair(string outputColumnName, string inputColumnName = n
public static class TransformExtensionsCatalog
{
/// <summary>
/// Copies the input column to another column named as specified in <paramref name="outputColumnName"/>.
/// Create a <see cref="ColumnCopyingEstimator"/>, which copies the data from the column specified in <paramref name="inputColumnName"/>
/// to a new column: <paramref name="outputColumnName"/>.
Copy link
Member

@wschin wschin Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
/// to a new column: <paramref name="outputColumnName"/>.
/// to a new column <paramref name="outputColumnName"/>.
``` #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the:? I think i might leave it, since Natalie seems ok with it?


In reply to: 275449364 [](ancestors = 275449364)

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My only wish is to add example link to estimator.

/// </summary>
/// <remarks>For more information see <see cref="ColumnCopyingEstimator"/>.</remarks>
Copy link

@shmoradims shmoradims Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// For more information see . [](start = 8, length = 85)

we don't have this for trainers. we should be consistent. is this helpful?
@natke what's your take?

Copy link
Contributor

@natke natke Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do already link to ColumnCopyingEstimator above, but this doesn't hurt @shmoradims

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

/// </summary>
/// <remarks>For more information see <see cref="ColumnCopyingEstimator"/>.</remarks>
Copy link
Contributor

@natke natke Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do already link to ColumnCopyingEstimator above, but this doesn't hurt @shmoradims

@sfilipi sfilipi merged commit 64badcf into dotnet:master Apr 15, 2019
@sfilipi sfilipi deleted the 3204Simple branch April 15, 2019 21:09
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this pull request Apr 16, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants