Skip to content

Adding a binary classification PFI Example #1793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 20, 2018

Conversation

rogancarr
Copy link
Contributor

This PR adds a binary classification example. I added it to the same file as the regression example, and refactored out the file loading so that the examples focus more on the technique and less on file-loading. I also added a discussion on random fluctuations in PFI values.

Fixes #1766

@artidoro artidoro self-requested a review November 30, 2018 22:46
@@ -5,57 +5,22 @@

namespace Microsoft.ML.Samples.Dynamic
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should leave the samples in this project small and simple; they are meant to illustrate the API usage; not give a complete overview/tutorial on how to solve a problem.
Longer, more complete samples can go in the machinelarning-samples repo.

It might be overwhelming, and hard to digest, to see a long, long example in a page that is supposed to just tell you how to call the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this conceptual separation - that short code snippets to illustrate the API show up on the docs site, and that end-to-end samples go in the separate repo. Is there a way that we can add links to relevant end-to-end samples in the docs site though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. However, PFI is a bit challenging because the return values also need to be explained. What I've done here is try to make these short and easy, while showing how to use the API and interpret the results.

The latest commits (with rebase) refactor each task (e.g. Regression, BinaryClassification) into its own short file, adding a PfiHelper class to hold common things like data loading and linear-model-weight extraction.

One thing that adds to the length is a discussion of how to interpret the results. I believe these are appropriate for the API because these are the first sorts of questions people ask when you present PFI to them.

What do you think @sfilipi @montebhoover and @glebuk ?

Copy link
Contributor

@montebhoover montebhoover Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this - the separation of the files makes each sample more manageable. I'd be curious if the prose explanation at the end of the sample would be better suited to show up in the docs outside of the example code, like in some sort of remarks section, but that is a minor improvement in my mind and we could make that later if desired.

@@ -5,57 +5,22 @@

namespace Microsoft.ML.Samples.Dynamic
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this conceptual separation - that short code snippets to illustrate the API show up on the docs site, and that end-to-end samples go in the separate repo. Is there a way that we can add links to relevant end-to-end samples in the docs site though?

@rogancarr rogancarr force-pushed the 1764_PFI_binaryClassification branch from 1dc2423 to 9a2ff1a Compare December 3, 2018 22:32
@rogancarr
Copy link
Contributor Author

Note: Rebased to bring in changes from master.

using System;
using System.Linq;

namespace Microsoft.ML.Samples.Dynamic.PermutationFeatureImportance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you need to merge with master anyway...

You already inside PermutationFeatureImportance namespace, I don't see much reason behind adding Pfi prefix to names, classes and methods.


namespace Microsoft.ML.Samples.Dynamic.PermutationFeatureImportance
{
public class PfiHelper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PfiHelper [](start = 17, length = 9)

make it internal as well?


namespace Microsoft.ML.Samples.Dynamic.PermutationFeatureImportance
{
public class PfiHelper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you add BinaryClassificationExample and Regression to our documentation.
Is it ok to have reference to helper class in documentation?
I doubt content of this file would be visible to user.

@Ivanidzo4ka
Copy link
Contributor

@rogancarr Is this still relevant and you just don't have time to update it, or this is obsolete now?

@rogancarr
Copy link
Contributor Author

@rogancarr Is this still relevant and you just don't have time to update it, or this is obsolete now?

This is relevant, but I'm waiting for the next PFI PR (#1844 ) to go through before I do merge fixes, since that will cause more merge issues.

@rogancarr rogancarr force-pushed the 1764_PFI_binaryClassification branch from 458c44d to c471bbb Compare December 20, 2018 18:35
@rogancarr rogancarr merged commit f25bd4b into dotnet:master Dec 20, 2018
@rogancarr rogancarr deleted the 1764_PFI_binaryClassification branch December 20, 2018 20:52
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants