Skip to content

Samples for CustomMapping, IndicateMissingValues, ReplaceMissingValues #3216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 10, 2019

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Apr 5, 2019

Related to #1209
Fixes #3117

Made samples for the multi-column setting of:

  • ReplaceMissingValues
  • IndicateMissingValues

I also made a sample to save and load the CustomMapping estimator.

@artidoro artidoro self-assigned this Apr 5, 2019
@artidoro artidoro added the documentation Related to documentation of ML.NET label Apr 5, 2019
@shmoradims
Copy link

shmoradims commented Apr 5, 2019

using System;

please remove Sample from filenames: CustomMapingSample -> CustomMapping

Same with the other file #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/CustomMappingSample.cs:1 in d32f027. [](commit_id = d32f027, deletion_comment = False)

// ReplaceMissingValues is used to create a column where missing values are replaced according to the ReplacementMode.
var defaultPipeline = mlContext.Transforms.ReplaceMissingValues(new[] {
new InputOutputColumnPair("MissingReplaced1", "Features1"),
new InputOutputColumnPair("MissingReplaced2", "Features2")
Copy link

@shmoradims shmoradims Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: we don't have an in-place transformation multicolumn sample. can we transform the data in-place and not have MissingReplaced1/MissingReplaced2?

@artidoro
Copy link
Contributor Author

artidoro commented Apr 8, 2019

Thank you for the review comments, I have update the code accordingly.

<PropertyGroup>
<TargetFramework>netcoreapp2.1</TargetFramework>
<OutputType>Exe</OutputType>
<AssemblyOriginatorKeyFile>$(ToolsDir)Test.snk</AssemblyOriginatorKeyFile>
Copy link
Contributor Author

@artidoro artidoro Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default the assemblies are signed using Open.snk. However, if we want to register an assembly in the component catalog using mlContext.ComponentCatalog.RegisterAssembly(), the assembly needs to pass the following condition:

private static bool CanContainExtensions(Assembly assembly)
{
if (assembly.FullName.StartsWith("Microsoft.ML.", StringComparison.Ordinal)
&& HasMLNetPublicKey(assembly))
{
return false;
}
return true;
}

So it needs to have a different signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with Eric, it could be even simpler not to sign this assembly. We don't need strong naming for the Samples assembly as it should not be referenced by external code anyways.

// Features: [-1, 2, -3] MissingReplaced: [-1, 2, -3]
// Features: [-1, NaN, -3] MissingReplaced: [-1, 0, -3]

// Mean ReplaceMode:
Copy link
Contributor

@rogancarr rogancarr Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Mean ReplaceMode: [](start = 12, length = 20)

Isn't there one more mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also maximum and minimum, but I don't think it adds much to add them to this sample.


In reply to: 273727261 [](ancestors = 273727261)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the documentation, if we have more than one, we should have all of them — right now, it looks inconsistent or incomplete.


In reply to: 273735371 [](ancestors = 273735371,273727261)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather remove the mean replacement mode in this sample then.
We are not trying to exhaust all the possible settings in our samples. Would that be fine?


In reply to: 273736506 [](ancestors = 273736506,273735371,273727261)

private class TransformedData : InputData
{
public bool IsUnderThirty { get; set; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra empty line.

// Expected output:
// Features1: [1, 1, 0] MissingIndicator1: [False, False, False] Features2: [1, 1] MissingIndicator2: [False, False]
// Features1: [0, NaN, 1] MissingIndicator1: [False, True, False] Features2: [NaN, 1] MissingIndicator2: [True, False]
// Features1: [-1, NaN, -3] MissingIndicator1: [False, True, False] Features2: [1, ∞] MissingIndicator2: [False, False]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT - spacing for MissingIndicator doesnt align with the above lines.

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codecov
Copy link

codecov bot commented Apr 10, 2019

Codecov Report

Merging #3216 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3216      +/-   ##
==========================================
- Coverage   72.62%   72.62%   -0.01%     
==========================================
  Files         807      807              
  Lines      145080   145080              
  Branches    16213    16213              
==========================================
- Hits       105369   105365       -4     
- Misses      35294    35298       +4     
  Partials     4417     4417
Flag Coverage Δ
#Debug 72.62% <ø> (-0.01%) ⬇️
#production 68.17% <ø> (-0.01%) ⬇️
#test 88.92% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/ExtensionsCatalog.cs 57.14% <ø> (ø) ⬆️
...rc/Microsoft.ML.Transforms/CustomMappingCatalog.cs 100% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs 73.7% <0%> (-0.34%) ⬇️
...StandardTrainers/Standard/LinearModelParameters.cs 60.05% <0%> (-0.27%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.26% <0%> (+0.15%) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 85.31% <0%> (+0.6%) ⬆️

@artidoro artidoro merged commit 304170a into dotnet:master Apr 10, 2019
artidoro added a commit to artidoro/machinelearning that referenced this pull request Apr 10, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants