You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
27
-
/// This column's data type will be a vector of <see cref="System.UInt32"/>, or a scalar <see cref="System.UInt32"/> based on whether the input column data types
27
+
/// This column's data type will be a vector of keys, or a scalar keys based on whether the input column data types
28
28
/// are vectors or scalars.</param>
29
29
/// <param name="inputColumnName">Name of the column whose data will be hashed.
30
30
/// If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
Copy file name to clipboardExpand all lines: src/Microsoft.ML.Data/Transforms/Hashing.cs
+1-1
Original file line number
Diff line number
Diff line change
@@ -1113,7 +1113,7 @@ public override void Process()
1113
1113
/// | -- | -- |
1114
1114
/// | Does this estimator need to look at the data to train its parameters? | Yes, if the mapping of the hashes to the values is required. |
1115
1115
/// | Input column data type | Vector or scalars of numeric, boolean, [text](xref:Microsoft.ML.Data.TextDataViewType), [DateTime](xref: System.DateTime) and [key](xref:Microsoft.ML.Data.KeyDataViewType) data types.|
1116
-
/// | Output column data type | Vector or scalar [System.Int32](xref:System.Int32).|
1116
+
/// | Output column data type | Vector or scalar [key](xref:Microsoft.ML.Data.KeyDataViewType)|
/// | Does this estimator need to look at the data to train its parameters? | Yes |
24
-
/// | Input column data type | Scalar numeric, boolean, [text](xref:Microsoft.ML.Data.TextDataViewType), [System.DateTime](xref:System.DateTime) or [key](xref:Microsoft.ML.Data.KeyDataViewType) data types.|
25
-
/// | Output column data type | [key](xref:Microsoft.ML.Data.KeyDataViewType)|
24
+
/// | Input column data type | Scalar or vector of numeric, boolean, [text](xref:Microsoft.ML.Data.TextDataViewType), [System.DateTime](xref:System.DateTime) and [key](xref:Microsoft.ML.Data.KeyDataViewType) data types.|
25
+
/// | Output column data type | Scalar or vector of [key](xref:Microsoft.ML.Data.KeyDataViewType)|
26
26
///
27
27
/// The ValueToKeyMappingEstimator builds up term vocabularies(dictionaries) mapping the input values to the keys on the dictionary.
28
28
/// If multiple columns are used, each column builds/uses exactly one vocabulary.
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param>
23
-
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
24
-
/// <param name="outputKind">Output kind: Bag (multi-set vector), Ind (indicator vector), Key (index), or Binary encoded indicator vector.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
24
+
/// This column's data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
25
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
26
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, this column's data type will be a key in the case of a scalar input column
27
+
/// or a vector of keys in the case of a vector input column.</param>
28
+
/// <param name="inputColumnName">Name of column to convert to one-hot vectors. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/>
29
+
/// will be used as source. This column's data type can be scalar or vector of numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/>,</param>
30
+
/// <param name="outputKind">Output kind: Bag (multi-set vector), Indicator (indicator vector), Key (index), or Binary encoded indicator vector.</param>
25
31
/// <param name="maximumNumberOfKeys">Maximum number of terms to keep per column when auto-training.</param>
26
-
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/> choosen they will be in the order encountered.
27
-
/// If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>, items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
32
+
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/>
33
+
/// choosen they will be in the order encountered. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>,
34
+
/// items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
28
35
/// <param name="keyData">Specifies an ordering for the encoding. If specified, this should be a single column data view,
29
36
/// and the key-values will be taken from that column. If unspecified, the ordering will be determined from the input data upon fitting.</param>
30
37
/// <example>
@@ -44,14 +51,21 @@ public static OneHotEncodingEstimator OneHotEncoding(this TransformsCatalog.Cate
/// <param name="columns">Specifies the names of the columns on which to apply the transformation.</param>
57
+
/// <remarks>If multiple columns are passed to the estimator, all of the columns will be processed in a single pass over the data.
58
+
/// Therefore, it is more efficient to specify one estimator with many columns than it is to specify many estimators each with a single column.</remarks>
/// <param name="columns">The pairs of input and output columns. The output columns' data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
61
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
62
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, the output columns' data type will be a key in the case of scalar input column
63
+
/// or a vector of keys in the case of a vector input column.</param>
51
64
/// <param name="outputKind">Output kind: Bag (multi-set vector), Ind (indicator vector), Key (index), or Binary encoded indicator vector.</param>
52
65
/// <param name="maximumNumberOfKeys">Maximum number of terms to keep per column when auto-training.</param>
53
-
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/> choosen they will be in the order encountered.
54
-
/// If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>, items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
66
+
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/>
67
+
/// choosen they will be in the order encountered. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>,
68
+
/// items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
55
69
/// <param name="keyData">Specifies an ordering for the encoding. If specified, this should be a single column data view,
56
70
/// and the key-values will be taken from that column. If unspecified, the ordering will be determined from the input data upon fitting.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param>
104
-
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
118
+
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
119
+
/// This column's data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
120
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
121
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, this column's data type will be a key in the case of a scalar input column
122
+
/// or a vector of keys in the case of a vector input column.
123
+
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
124
+
/// This column's data type can be scalar or vector of numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/>.</param>
/// <param name="numberOfBits">Number of bits to hash into. Must be between 1 and 30, inclusive.</param>
107
127
/// <param name="seed">Hashing seed.</param>
108
128
/// <param name="useOrderedHashing">Whether the position of each term should be included in the hash.</param>
109
129
/// <param name="maximumNumberOfInverts">During hashing we constuct mappings between original values and the produced hash values.
110
-
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing, as such, can map many initial values to one.
130
+
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing,
131
+
/// as such, can map many initial values to one.</param>
111
132
/// <paramref name="maximumNumberOfInverts"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
112
133
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param>
113
134
/// <example>
@@ -128,16 +149,22 @@ public static OneHotHashEncodingEstimator OneHotHashEncoding(this TransformsCata
/// Convert text columns into hash-based one-hot encoded vector columns.
152
+
/// Create a <see cref="OneHotHashEncodingEstimator"/>, which converts one or more input text columns specified by <paramref name="columns"/>
153
+
/// into as many columns of hash-based one-hot encoded vectors.
132
154
/// </summary>
155
+
/// <remarks>If multiple columns are passed to the estimator, all of the columns will be processed in a single pass over the data.
156
+
/// Therefore, it is more efficient to specify one estimator with many columns than it is to specify many estimators each with a single column.</remarks>
/// <param name="columns">Specifies the names of the columns on which to apply the transformation.</param>
158
+
/// <param name="columns">The pairs of input and output columns. The output columns' data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
159
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
160
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, the output columns' data type will be a key in the case of scalar input column
161
+
/// or a vector of keys in the case of a vector input column.</param>
/// <param name="numberOfBits">Number of bits to hash into. Must be between 1 and 30, inclusive.</param>
137
164
/// <param name="seed">Hashing seed.</param>
138
165
/// <param name="useOrderedHashing">Whether the position of each term should be included in the hash.</param>
139
166
/// <param name="maximumNumberOfInverts">During hashing we constuct mappings between original values and the produced hash values.
140
-
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing, as such, can map many initial values to one.
167
+
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing, as such, can map many initial values to one.
141
168
/// <paramref name="maximumNumberOfInverts"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
142
169
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param>
0 commit comments