Skip to content

Towards #3204 - Documentation for MLContext.Transforms.Categorical #3388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 21, 2019

Conversation

najeeb-kazmi
Copy link
Member

Documentation for Categorical transforms using template given in #3204 and implemented in #3316.

@codecov
Copy link

codecov bot commented Apr 18, 2019

Codecov Report

Merging #3388 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3388      +/-   ##
==========================================
+ Coverage   72.69%    72.7%   +<.01%     
==========================================
  Files         807      807              
  Lines      145172   145171       -1     
  Branches    16225    16225              
==========================================
+ Hits       105533   105542       +9     
+ Misses      35223    35217       -6     
+ Partials     4416     4412       -4
Flag Coverage Δ
#Debug 72.7% <ø> (ø) ⬆️
#production 68.23% <ø> (ø) ⬆️
#test 88.97% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/OneHotHashEncoding.cs 88.09% <ø> (ø) ⬆️
src/Microsoft.ML.Data/Transforms/ColumnCopying.cs 85.43% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/CategoricalCatalog.cs 68.42% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/OneHotEncoding.cs 86.07% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/KernelCatalog.cs 33.33% <0%> (ø) ⬆️
...icrosoft.ML.Transforms/RandomFourierFeaturizing.cs 83.41% <0%> (ø) ⬆️
src/Microsoft.ML.OnnxTransformer/OnnxTransform.cs 86.24% <0%> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/ONNX.cs 100% <0%> (ø) ⬆️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.26% <0%> (+0.15%) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 85.11% <0%> (+0.4%) ⬆️
... and 2 more

@codecov
Copy link

codecov bot commented Apr 18, 2019

Codecov Report

Merging #3388 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3388   +/-   ##
=======================================
  Coverage   72.76%   72.76%           
=======================================
  Files         808      808           
  Lines      145452   145452           
  Branches    16244    16244           
=======================================
  Hits       105842   105842           
  Misses      35189    35189           
  Partials     4421     4421
Flag Coverage Δ
#Debug 72.76% <ø> (ø) ⬆️
#production 68.27% <ø> (ø) ⬆️
#test 89.04% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Data/Transforms/Hashing.cs 92.32% <ø> (ø) ⬆️
...t.ML.Data/Transforms/ValueToKeyMappingEstimator.cs 88.67% <ø> (ø) ⬆️
src/Microsoft.ML.Data/Transforms/ColumnCopying.cs 85.43% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/CategoricalCatalog.cs 68.42% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/OneHotEncoding.cs 86.07% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/OneHotHashEncoding.cs 88.09% <ø> (ø) ⬆️
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 64.07% <ø> (ø) ⬆️
src/Microsoft.ML.Core/Data/ModelSaveContext.cs 91.57% <0%> (ø) ⬆️
src/Microsoft.ML.PCA/PcaTrainer.cs 79.94% <0%> (ø) ⬆️
... and 35 more

@najeeb-kazmi najeeb-kazmi added the documentation Related to documentation of ML.NET label Apr 18, 2019
@najeeb-kazmi najeeb-kazmi requested a review from natke April 19, 2019 22:30
@artidoro
Copy link
Contributor

artidoro commented Apr 20, 2019

    /// and the key-values will be taken from that column. If unspecified, the ordering will be determined from the input data upon fitting.</param>

nit: There are some very long line in this document could you break them up? #Resolved


Refers to: src/Microsoft.ML.Transforms/CategoricalCatalog.cs:30 in 75fb8c4. [](commit_id = 75fb8c4, deletion_comment = False)

///
/// The output of this transform is specified by <xref:Microsoft.ML.Transforms.OneHotEncodingEstimator.OutputKind>:
///
/// - <xref:Microsoft.ML.Transforms.OneHotEncodingEstimator.OutputKind.Indicator> produces an [indicator vector](https://en.wikipedia.org/wiki/Indicator_vector).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [](start = 8, length = 1)

just checking: is this valid for markdown lists? I've always used *

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both produce the same bullet

/// - <xref:Microsoft.ML.Transforms.OneHotEncodingEstimator.OutputKind.Key> produces keys in a <xref:Microsoft.ML.Data.KeyDataViewType> column.
/// If the input column is a vector, the output contains a vectory [keys](xref:Microsoft.ML.Data.KeyDataViewType), where each slot of the
/// vector corresponds to the respective slot of the input vector.
/// If a category is not found in the bulit dictionary, it is assigned the value zero.
Copy link
Member

@sfilipi sfilipi Apr 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero [](start = 85, length = 4)

suggest adding: which represents the key missing from the dictionary.

The mutual information of two random variables X and Y is a measure of the mutual dependence between the variables.
Formally, the mutual information can be written as:
</para>
<para>I(X;Y) = E[log(p(x,y)) - log(p(x)) - log(p(y))]</para>
Copy link
Contributor

@glebuk glebuk Apr 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't there a latex formula that Senja has created just recently for this? https://github.com/dotnet/machinelearning/pull/3448/files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't applicable to the categorical transforms

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@najeeb-kazmi najeeb-kazmi merged commit 7aa5a45 into dotnet:master Apr 21, 2019
@najeeb-kazmi najeeb-kazmi deleted the 3204_categorical branch January 30, 2020 01:19
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants