You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
27
-
/// This column's data type will be a vector of <see cref="System.UInt32"/>, or a scalar <see cref="System.UInt32"/> based on whether the input column data types
27
+
/// This column's data type will be a vector of keys, or a scalar keys based on whether the input column data types
28
28
/// are vectors or scalars.</param>
29
29
/// <param name="inputColumnName">Name of the column whose data will be hashed.
30
30
/// If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
Copy file name to clipboardExpand all lines: src/Microsoft.ML.Data/Transforms/Hashing.cs
+1-1
Original file line number
Diff line number
Diff line change
@@ -1113,7 +1113,7 @@ public override void Process()
1113
1113
/// | -- | -- |
1114
1114
/// | Does this estimator need to look at the data to train its parameters? | Yes, if the mapping of the hashes to the values is required. |
1115
1115
/// | Input column data type | Vector or scalars of numeric, boolean, [text](xref:Microsoft.ML.Data.TextDataViewType), [DateTime](xref: System.DateTime) and [key](xref:Microsoft.ML.Data.KeyDataViewType) data types.|
1116
-
/// | Output column data type | Vector or scalar [System.Int32](xref:System.Int32).|
1116
+
/// | Output column data type | Vector or scalar [key](xref:Microsoft.ML.Data.KeyDataViewType)|
/// | Does this estimator need to look at the data to train its parameters? | Yes |
24
-
/// | Input column data type | Scalar numeric, boolean, [text](xref:Microsoft.ML.Data.TextDataViewType), [System.DateTime](xref:System.DateTime) or [key](xref:Microsoft.ML.Data.KeyDataViewType) data types.|
25
-
/// | Output column data type | [key](xref:Microsoft.ML.Data.KeyDataViewType)|
24
+
/// | Input column data type | Scalar or vector of numeric, boolean, [text](xref:Microsoft.ML.Data.TextDataViewType), [System.DateTime](xref:System.DateTime) and [key](xref:Microsoft.ML.Data.KeyDataViewType) data types.|
25
+
/// | Output column data type | Scalar or vector of [key](xref:Microsoft.ML.Data.KeyDataViewType)|
26
26
///
27
27
/// The ValueToKeyMappingEstimator builds up term vocabularies(dictionaries) mapping the input values to the keys on the dictionary.
28
28
/// If multiple columns are used, each column builds/uses exactly one vocabulary.
Copy file name to clipboardExpand all lines: src/Microsoft.ML.Transforms/CategoricalCatalog.cs
+37-22
Original file line number
Diff line number
Diff line change
@@ -15,17 +15,22 @@ namespace Microsoft.ML
15
15
publicstaticclassCategoricalCatalog
16
16
{
17
17
/// <summary>
18
-
/// Create a <see cref="OneHotEncodingEstimator"/>, which converts the text input column specified by <paramref name="inputColumnName"/> into a column of one-hot encoded vectors named <paramref name="outputColumnName"/>.
18
+
/// Create a <see cref="OneHotEncodingEstimator"/>, which converts the input column specified by <paramref name="inputColumnName"/>
19
+
/// into a column of one-hot encoded vectors named <paramref name="outputColumnName"/>.
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
22
-
/// This column's data type will be a vector of floats for <paramref name="outputKind"/> Bag, Indicator, and Binary. For <paramref name="outputKind"/> Key, the data type will be a key in the case of a singleton input column or a vector of keys in the case of a vector input column.</param>
23
-
/// <param name="inputColumnName">Name of column to convert to one-hot vectors. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
24
-
/// This column's data type can be numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/></param>
23
+
/// This column's data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
24
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
25
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, this column's data type will be a key in the case of a scalar input column
26
+
/// or a vector of keys in the case of a vector input column.</param>
27
+
/// <param name="inputColumnName">Name of column to convert to one-hot vectors. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/>
28
+
/// will be used as source. This column's data type can be scalar or vector of numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/>,</param>
25
29
/// <param name="outputKind">Output kind: Bag (multi-set vector), Indicator (indicator vector), Key (index), or Binary encoded indicator vector.</param>
26
30
/// <param name="maximumNumberOfKeys">Maximum number of terms to keep per column when auto-training.</param>
27
-
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/> choosen they will be in the order encountered.
28
-
/// If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>, items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
31
+
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/>
32
+
/// choosen they will be in the order encountered. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>,
33
+
/// items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
29
34
/// <param name="keyData">Specifies an ordering for the encoding. If specified, this should be a single column data view,
30
35
/// and the key-values will be taken from that column. If unspecified, the ordering will be determined from the input data upon fitting.</param>
31
36
/// <example>
@@ -45,16 +50,19 @@ public static OneHotEncodingEstimator OneHotEncoding(this TransformsCatalog.Cate
/// Create a <see cref="OneHotEncodingEstimator"/>, which converts the input text column specified by <see cref="InputOutputColumnPair.InputColumnName"/> into a column of one-hot encoded vectors named <see cref="InputOutputColumnPair.OutputColumnName"/>.
53
+
/// Create a <see cref="OneHotEncodingEstimator"/>, which converts one or more input text columns specified in <paramref name="columns"/>
54
+
/// into as many columns of one-hot encoded vectors.
/// <param name="columns">The pairs of input and output columns. The data type of the input column can be numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/>.
52
-
/// The data type of the output column will be a vector of floats for <paramref name="outputKind"/> Bag, Indicator, and Binary.
53
-
/// For <paramref name="outputKind"/> Key, the data type of the output column will be a key in the case of a singleton input column or a vector of keys in the case of a vector input column.</param>
/// <param name="columns">The pairs of input and output columns. The output columns' data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
58
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
59
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, the output columns' data type will be a key in the case of scalar input column
60
+
/// or a vector of keys in the case of a vector input column.</param>
54
61
/// <param name="outputKind">Output kind: Bag (multi-set vector), Ind (indicator vector), Key (index), or Binary encoded indicator vector.</param>
55
62
/// <param name="maximumNumberOfKeys">Maximum number of terms to keep per column when auto-training.</param>
56
-
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/> choosen they will be in the order encountered.
57
-
/// If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>, items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
63
+
/// <param name="keyOrdinality">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByOccurrence"/>
64
+
/// choosen they will be in the order encountered. If <see cref="ValueToKeyMappingEstimator.KeyOrdinality.ByValue"/>,
65
+
/// items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
58
66
/// <param name="keyData">Specifies an ordering for the encoding. If specified, this should be a single column data view,
59
67
/// and the key-values will be taken from that column. If unspecified, the ordering will be determined from the input data upon fitting.</param>
/// Create a <see cref="OneHotHashEncodingEstimator"/>, which converts a text column specified by <paramref name="inputColumnName"/> into a hash-based one-hot encoded vector column named <paramref name="outputColumnName"/>.
111
+
/// Create a <see cref="OneHotHashEncodingEstimator"/>, which converts a text column specified by <paramref name="inputColumnName"/>
112
+
/// into a hash-based one-hot encoded vector column named <paramref name="outputColumnName"/>.
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
107
-
/// This column's data type will be a vector of floats for <paramref name="outputKind"/> Bag, Indicator, and Binary. For <paramref name="outputKind"/> Key, the data type will be a key in the case of a singleton input column or a vector of keys in the case of a vector input column.</param>
116
+
/// This column's data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
117
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
118
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, this column's data type will be a key in the case of a scalar input column
119
+
/// or a vector of keys in the case of a vector input column.
108
120
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
109
-
/// This column's data type can be numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/>.</param>
121
+
/// This column's data type can be scalar or vector of numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/>.</param>
/// <param name="numberOfBits">Number of bits to hash into. Must be between 1 and 30, inclusive.</param>
112
124
/// <param name="seed">Hashing seed.</param>
113
125
/// <param name="useOrderedHashing">Whether the position of each term should be included in the hash.</param>
114
126
/// <param name="maximumNumberOfInverts">During hashing we constuct mappings between original values and the produced hash values.
115
-
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing, as such, can map many initial values to one.
127
+
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing,
128
+
/// as such, can map many initial values to one.</param>
116
129
/// <paramref name="maximumNumberOfInverts"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
117
130
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param>
118
131
/// <example>
@@ -133,12 +146,14 @@ public static OneHotHashEncodingEstimator OneHotHashEncoding(this TransformsCata
/// Create a <see cref="OneHotHashEncodingEstimator"/>, which converts the input text column specified by <see cref="InputOutputColumnPair.InputColumnName"/> into a column of hash-based one-hot encoded vectors named <see cref="InputOutputColumnPair.OutputColumnName"/>
149
+
/// Create a <see cref="OneHotHashEncodingEstimator"/>, which converts one or more input text columns specified by <paramref name="columns"/>
150
+
/// into as many columns of hash-based one-hot encoded vectors.
/// <param name="columns">The pairs of input and output columns. The data type of the input column can be numeric, text, boolean, <see cref="System.DateTime"/> or <see cref="System.DateTimeOffset"/>.
140
-
/// The data type of the output column will be a vector of floats for <paramref name="outputKind"/> Bag, Indicator, and Binary.
141
-
/// For <paramref name="outputKind"/> Key, the data type of the output column will be a key in the case of a singleton input column or a vector of keys in the case of a vector input column.</param>
153
+
/// <param name="columns">The pairs of input and output columns. The output columns' data type will be a vector of <see cref="System.Single"/> if <paramref name="outputKind"/> is
154
+
/// <see cref="OneHotEncodingEstimator.OutputKind.Bag"/>, <see cref="OneHotEncodingEstimator.OutputKind.Indicator"/>, and <see cref="OneHotEncodingEstimator.OutputKind.Binary"/>.
155
+
/// If <paramref name="outputKind"/> is <see cref="OneHotEncodingEstimator.OutputKind.Key"/>, the output columns' data type will be a key in the case of scalar input column
156
+
/// or a vector of keys in the case of a vector input column.</param>
0 commit comments