diff --git a/src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs b/src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs index 9162c82d24..4975a7250d 100644 --- a/src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs +++ b/src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs @@ -12,10 +12,37 @@ namespace Microsoft.ML public static class PermutationFeatureImportanceExtensions { /// - /// Permutation Feature Importance is a technique that calculates how much each feature 'matters' to the predictions. - /// Namely, how much the model's predictions will change if we randomly permute the values of one feature across the evaluation set. - /// If the quality doesn't change much, this feature is not very important. If the quality drops drastically, this was a really important feature. + /// Permutation Feature Importance (PFI) for Regression /// + /// + /// + /// Permutation feature importance (PFI) is a technique to determine the global importance of features in a trained + /// machine learning model. PFI is a simple yet powerul technique motivated by Breiman in his Random Forest paper, section 10 + /// (Breiman. "Random Forests." Machine Learning, 2001.) + /// The advantage of the PFI method is that it is model agnostic -- it works with any model that can be + /// evaluated -- and it can use any dataset, not just the training set, to compute feature importance metrics. + /// + /// + /// PFI works by taking a labeled dataset, choosing a feature, and permuting the values + /// for that feature across all the examples, so that each example now has a random value for the feature and + /// the original values for all other features. The evalution metric (e.g. AUC or R-squared) is then calculated + /// for this modified dataset, and the change in the evaluation metric from the original dataset is computed. + /// The larger the change in the evaluation metric, the more important the feature is to the model. + /// PFI works by performing this permutation analysis across all the features of a model, one after another. + /// + /// + /// In this implementation, PFI computes the change in all possible regression evaluation metrics for each feature, and an + /// ImmutableArray of RegressionEvaluator.Result objects is returned. See the sample below for an + /// example of working with these results to analyze the feature importance of a model. + /// + /// + /// + /// + /// + /// + /// /// The regression context. /// The model to evaluate. /// The evaluation data set.