-
Notifications
You must be signed in to change notification settings - Fork 1.9k
XML documentation for five text related transforms #3418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3418 +/- ##
==========================================
+ Coverage 72.7% 72.73% +0.02%
==========================================
Files 807 807
Lines 145171 145206 +35
Branches 16225 16230 +5
==========================================
+ Hits 105551 105613 +62
+ Misses 35201 35176 -25
+ Partials 4419 4417 -2
|
/// | Input column data type | Vector of [Keys](<xref:Microsoft.ML.Data.KeyDataViewType>) | | ||
/// | Output column data type | Known-sized vector of <xref:System.Single> | | ||
/// | ||
/// The resulting [NgramExtractingTransformer]<xref:Microsoft.ML.Transforms.Text.NgramExtractingTransformer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NgramExtractingTransformer] [](start = 23, length = 27)
If you make it as link you should put <xref>
in braces [text](url)
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And i'm not sure we even want make it url, what is wrong with just <xref>
In reply to: 276868333 [](ancestors = 276868333)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the anchor text is the same as the class/type name, you don't really need to use anchor pattern. You can save some time by using autolinks: xref:UID
also we don't have []<> pattern. this won't work as is.
In reply to: 276868428 [](ancestors = 276868428,276868333)
[EnumValueDisplay("TF (Term Frequency)")] | ||
Tf = 0, | ||
|
||
/// <summary>Inverse Document Frequency. A ratio (the logarithm of inverse relative frequency) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary [](start = 12, length = 12)
thank you for adding this.
Should we make it
<summary>
text
</summary>
? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | Yes | | ||
/// | Input column data type | <xref:System.ReadOnlyMemory{System.Char}> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref:System.ReadOnlyMemory{System.Char} [](start = 35, length = 41)
Scalar or vector ? #Resolved
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of keys.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector [](start = 46, length = 6)
unknown size vector #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Normalizes incoming text in <paramref name="inputColumnName"/> by changing case, removing diacritical marks, punctuation marks and/or numbers | ||
/// and outputs new text as <paramref name="outputColumnName"/>. | ||
/// Creates a <see cref="TextNormalizingEstimator"/>, which normalizes incoming text in <paramref name="inputColumnName"/> by changing case, | ||
/// removing diacritical marks, punctuation marks and/or numbers and outputs new text as <paramref name="outputColumnName"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it reads as it's doing casing removing diacritics and punctuations, and only for numbers you have option to decide. #Closed
/// This column's data type will remain scalar of text or a vector of text depending on the input column data type.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, | ||
/// the value of the <paramref name="outputColumnName"/> will be used as source. | ||
/// This estimator operates on text and vector of text data types.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and [](start = 44, length = 3)
or? #Closed
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// The output column is of type variable vector of string.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string [](start = 60, length = 6)
text
here and below #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of text.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector [](start = 46, length = 6)
unknown size vector #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable-sized (as discussed in the chat)
In reply to: 277070083 [](ancestors = 277070083,276870655)
@@ -124,10 +130,16 @@ public static class TextCatalog | |||
=> new TextNormalizingEstimator(Contracts.CheckRef(catalog, nameof(catalog)).GetEnvironment(), | |||
outputColumnName, inputColumnName, caseMode, keepDiacritics, keepPunctuations, keepNumbers); | |||
|
|||
/// <include file='doc.xml' path='doc/members/member[@name="WordEmbeddings"]/*' /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'doc.xml' [](start = 26, length = 9)
if doc.xml is no longer used, please remove the file #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is still used by the trasnforms that Ivan, and senja are workign on . I would rather remove it at the end, and not modify it to avoid conflicts.
In reply to: 277032552 [](ancestors = 277032552)
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of <see cref="System.Single"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, | ||
/// the value of the <paramref name="outputColumnName"/> will be used as source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source [](start = 81, length = 6)
let's not use 'source'. just drop 'as source'. it reads fine without it. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can modify it here, but it's everywhere, that's the pattern that a lot of our transforms follow.
In reply to: 277032960 [](ancestors = 277032960)
@@ -161,10 +179,13 @@ public static class TextCatalog | |||
=> new WordEmbeddingEstimator(Contracts.CheckRef(catalog, nameof(catalog)).GetEnvironment(), | |||
outputColumnName, customModelFile, inputColumnName ?? outputColumnName); | |||
|
|||
/// <include file='doc.xml' path='doc/members/member[@name="WordEmbeddings"]/*' /> | |||
/// <summary> | |||
/// Create an <see cref="WordEmbeddingEstimator"/>, which is a text featurizer that converts vectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vectors [](start = 101, length = 7)
singular or plural? if it's plural, please say 'multiple vectors' or 'one or more vectors' #Resolved
/// <param name="ngramLength">Ngram length.</param> | ||
/// <param name="skipLength">Maximum number of tokens to skip when constructing an ngram.</param> | ||
/// <param name="skipLength">Number of tokens to skip between each ngram. By defaults no token is skipped.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaults [](start = 85, length = 8)
default (no s) #Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | Yes | | ||
/// | Input column data type | <xref:System.ReadOnlyMemory{System.Char}> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref:System.ReadOnlyMemory{System.Char} [](start = 35, length = 41)
we might need to change this. i'll let you know. #Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | Vector of <xref:System.ReadOnlyMemory{System.Char}> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<x [](start = 45, length = 2)
please add [text] as anchor-text #Resolved
/// | Input column data type | Vector of <xref:System.ReadOnlyMemory{System.Char}> | | ||
/// | Output column data type | Known-sized vector of <xref:System.Single> | | ||
/// | ||
/// The [WordEmbeddingTransformer](<xref:Microsoft.ML.Transforms.Text.WordEmbeddingTransformer>) produces a new column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(xref:Microsoft.ML.Transforms.Text.WordEmbeddingTransformer) [](start = 38, length = 62)
fix #Resolved
@@ -795,33 +826,43 @@ internal WordEmbeddingEstimator(IHostEnvironment env, string customModelFile, pa | |||
/// </summary> | |||
public enum PretrainedModelKind | |||
{ | |||
/// <summary>GloVe 50 dimensional word embeddings.</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
///
GloVe 50 dimensional word embeddings.
[](start = 12, length = 60)
///
/// GloVe 50 dimensional word embeddings.
#Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | <xref:System.ReadOnlyMemory{System.Char}> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 34, length = 1)
add [text] for anchor text #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// The output column is of type variable vector of string.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source. | ||
/// This column should be of type string.</param> | ||
/// <param name="separators">The separators to use (uses space character by default).</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't change 'internal' stuff. it's extra work for you and for review. #Resolved
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of <see cref="System.Single"/>.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector [](start = 46, length = 6)
known-sized #Resolved
/// <param name="inputColumnName">Name of the column to transform.</param> | ||
/// <param name="customModelFile">The path of the pre-trained embeddings model to use.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of <see cref="System.Single"/>.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector [](start = 46, length = 6)
known-sized #Resolved
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a variable-sized vector of text.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source. | ||
/// This estimator operates of text data type.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text data type [](start = 39, length = 14)
we support just scalar or vector #Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | [Text](xref:Microsoft.ML.Data.TextDataViewType) or Vector of [Text](xref:Microsoft.ML.Data.TextDataViewType) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 82, length = 2)
double space #Resolved
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | [Text](xref:Microsoft.ML.Data.TextDataViewType) or Vector of [Text](xref:Microsoft.ML.Data.TextDataViewType) | | ||
/// | Output column data type | The same as the data type in the input column | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the data type in the input column [](start = 36, length = 45)
We drop empty strings, so your vector would end-up variable vector. #Resolved
We don't support variable vector? That's weird. #Resolved Refers to: src/Microsoft.ML.Transforms/Text/TextNormalizing.cs:545 in 3f4b997. [](commit_id = 3f4b997, deletion_comment = False) |
I mean if it's vector make it variable vector and in any other case let it be scalar. In reply to: 485010348 [](ancestors = 485010348) Refers to: src/Microsoft.ML.Transforms/Text/TextNormalizing.cs:545 in 3f4b997. [](commit_id = 3f4b997, deletion_comment = False) |
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | Yes | | ||
/// | Input column data type | Scalar of [Text](xref:Microsoft.ML.Data.TextDataViewType) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalar [](start = 35, length = 6)
I don't see any checks for that. We just check that type is text, which can be true for Scalar or Vector. #Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | Scalar of [Text](xref:Microsoft.ML.Data.TextDataViewType) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalar [](start = 35, length = 6)
Check IsColumnTypeValid() we don't care about is it scalar or vector as long as underlying type is Text #Resolved
Good catch just fixed it. In reply to: 485010465 [](ancestors = 485010465,485010348) Refers to: src/Microsoft.ML.Transforms/Text/TextNormalizing.cs:545 in 3f4b997. [](commit_id = 3f4b997, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked by #3204.
This PR covers: