Skip to content

Binary FastTree/Forest samples using T4 templates. #3035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 20, 2019

Conversation

shmoradims
Copy link

Related to #2522. The *.cs files are auto-generated. Please review the .tt and .ttinclude files.

yield return new DataPoint
{
Label = label,
// Create random features that are correlated with label.
Copy link
Member

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Note that data points with Label=false tend to have larger feature values. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


In reply to: 267439620 [](ancestors = 267439620)

// This example requires installation of additional NuGet package
// <a href="https://www.nuget.org/packages/Microsoft.ML.FastTree/">Microsoft.ML.FastTree</a>.
public static void Example()
{
Copy link
Member

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to execute those Example()s in a test? Just want to make sure they are always executable. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll add tests in a separate PR.


In reply to: 267441145 [](ancestors = 267441145)

var mlContext = new MLContext(seed: 0);

// Create a list of training examples.
var examples = GenerateRandomDataPoints(1000);
Copy link
Member

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples [](start = 16, length = 8)

dataPoints? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


In reply to: 267448220 [](ancestors = 267448220)

}

// Class used to capture predictions.
private class Prediction
Copy link
Member

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prediction [](start = 22, length = 10)

Not sure if we want to produce probability for calibrated models or socre for non-calibrated models. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had probabilities originally. But was surprised that FastForest wasn't calibrated and FastTree was. So for the sake of sharing the same T4 template and not having too much conditional formatting I'm using PredictedLabel which all binary trainers are guaranteed to produce. Scores are a bit confusing because it's not clear what -50 or +40.1 mean without calibration.


In reply to: 267449313 [](ancestors = 267449313)

public static class FastTreeWithOptions
{
// This example requires installation of additional NuGet package
// <a href="https://www.nuget.org/packages/Microsoft.ML.FastTree/">Microsoft.ML.FastTree</a>.
Copy link
Member

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful comment! #ByDesign

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! My only major concern is that we might need tests to make sure those samples can always be executed correctly in the future.

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great now. I think .ttinclude can be made more generic as we create more samples. Thanks for doing it @shmoradims.

@codecov
Copy link

codecov bot commented Mar 20, 2019

Codecov Report

Merging #3035 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3035      +/-   ##
==========================================
+ Coverage   72.41%   72.41%   +<.01%     
==========================================
  Files         803      803              
  Lines      143851   143851              
  Branches    16173    16173              
==========================================
+ Hits       104171   104172       +1     
+ Misses      35258    35257       -1     
  Partials     4422     4422
Flag Coverage Δ
#Debug 72.41% <ø> (ø) ⬆️
#production 68.09% <ø> (ø) ⬆️
#test 88.61% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <ø> (ø) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️

@shmoradims shmoradims merged commit a2d7987 into dotnet:master Mar 20, 2019
@shmoradims shmoradims deleted the tree_binary_samples2 branch April 2, 2019 23:41
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants