Skip to content

Updating the XML Docs for Permutation Feature Importance #1733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 27, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,37 @@ namespace Microsoft.ML
public static class PermutationFeatureImportanceExtensions
{
/// <summary>
/// Permutation Feature Importance is a technique that calculates how much each feature 'matters' to the predictions.
/// Namely, how much the model's predictions will change if we randomly permute the values of one feature across the evaluation set.
/// If the quality doesn't change much, this feature is not very important. If the quality drops drastically, this was a really important feature.
/// Permutation Feature Importance (PFI) for Regression
/// </summary>
/// <remarks>
/// <para>
/// Permutation feature importance (PFI) is a technique to determine the global importance of features in a trained
/// machine learning model. PFI is a simple yet powerul technique motivated by Breiman in his Random Forest paper, section 10
/// (Breiman. <a href='https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf'>"Random Forests."</a> Machine Learning, 2001.)
/// The advantage of the PFI method is that it is model agnostic -- it works with any model that can be
/// evaluated -- and it can use any dataset, not just the training set, to compute feature importance metrics.
/// </para>
/// <para>
/// PFI works by taking a labeled dataset, choosing a feature, and permuting the values
/// for that feature across all the examples, so that each example now has a random value for the feature and
/// the original values for all other features. The evalution metric (e.g. AUC or R-squared) is then calculated
/// for this modified dataset, and the change in the evaluation metric from the original dataset is computed.
/// The larger the change in the evaluation metric, the more important the feature is to the model.
/// PFI works by performing this permutation analysis across all the features of a model, one after another.
/// </para>
/// <para>
/// In this implementation, PFI computes the change in all possible regression evaluation metrics for each feature, and an
/// <code>ImmutableArray</code> of <code>RegressionEvaluator.Result</code> objects is returned. See the sample below for an
/// example of working with these results to analyze the feature importance of a model.
/// </para>
/// </remarks>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[PFI](~/../docs/samples/doc/samples/Microsoft.ML.Samples/Dynamic/PermutationFeatureImportance.cs)]
/// ]]>
/// </format>
/// </example>
/// <param name="ctx">The regression context.</param>
/// <param name="model">The model to evaluate.</param>
/// <param name="data">The evaluation data set.</param>
Expand Down