get_feature_importance

Calculate and return the feature importances.

Method call format

get_feature_importance(data=None,
                       fstr_type=EFstrType.FeatureImportance,
                       prettified=False,
                       thread_count=-1,
                       verbose=False)

Parameters

ParameterPossible typesDescriptionDefault value
datacatboost.Pool

The dataset for feature importance calculation.

The required dataset depends on the selected feature importance calculation type (specified in the fstr_type parameter):

  • ShapValues — Any dataset. Feature importances are calculated for every object in this dataset.
  • FeatureImportance — Either None or the same dataset that was used for training if the model does not contain information regarding the weight of leaves. All models trained with CatBoost version 0.9 or higher contain leaf weight information by default.

Required parameter for the ShapValues type of feature importances and in case the model does not contain information regarding the weight of leaves.

None otherwise.

fstr_type
Note.

It is recommended to use EFStrType for this parameter.

The type of feature importance to calculate.

Possible values:
  • FeatureImportance: The individual importance values for each of the input features.

  • ShapValues: A vector with contributions of each feature to the prediction for every input object and the expected value of the model prediction for the object (average prediction given no knowledge about the object).
  • Interaction: The value of the feature interaction strength for each pair of features.

FeatureImportance
prettifiedbool
Return the feature importances as a list of the following pairs sorted by feature importance:
(feature_id, feature importance)

Should be used with the  FeatureImportance fstr_type only.

False
thread_countint

The number of threads to use during training.

Optimizes the speed of execution. This parameter doesn't affect results.

-1 (the number of threads is equal to the number of cores) (The number of processor cores)
verbose
  • bool
  • int

The purpose of this parameter depends on the type of the given value:

  • bool — Output progress to stdout.

    Works with the ShapValues type of feature importance calculation.

  • int — The logging period.
False

Type of return value

Depends on the selected feature strength calculation method:
  • FeatureImportance with the prettified parameter set to “False”: a list of length [n_features] with float feature importances values for each feature
  • FeatureImportance with the prettified parameter set to “True”: a list of length [n_features] with (feature_id (string), feature_importance (float)) pairs, sorted by feature importance values in descending order
  • ShapValues: np.array of shape (n_objects, n_features + 1) with float ShapValues for each (object, feature)
  • Interaction: list of length [ n_features] of three element lists of (first_feature_index, second_feature_index, interaction_score (float))