Feature importance

CatBoost provides different types of feature importance calculation:
Feature importance calculation typeImplementations
The most important features in the formula
The contribution of each feature to the formulaShapValues
The features that work well together

Detailed information regarding usage specifics for different CatBoost implementations.

FeatureImportance

The individual importance values for each of the input features.

See the FeatureImportance file format.

Calculation principles
  •  is the number of documents in a leaf.
  •  is the formula value in the leaf.

If the model uses a combination of some of the input features instead of using them individually, an average feature importance for these features is calculated and output. For example, the model uses a combination of features f54, c56 and f77. First, the feature importance is calculated for the combination of these features. Then the resulting value is divided by three and is assigned to each of the features.

If the model uses a feature both individually and in a combination with other features the total importance value of this feature is defined using the following formula:

  • is the individual feature importance of the j-th feature.
  • is the average feature importance of the j-th feature in the i-th combinational feature.

InternalFeatureImportance

The importance values both for each of the input features and for their combinations (if any).

See the InternalFeatureImportance file format.

Calculation principles
  •  is the number of documents in a leaf.
  •  is the formula value in the leaf.

If the model uses a combination of some of the input features instead of using them individually, an average feature importance for these features is calculated and output. For example, the model uses a combination of features f54, c56 and f77. First, the feature importance is calculated for the combination of these features. Then the resulting value is divided by three and is assigned to each of the features.

If the model uses a feature both individually and in a combination with other features the total importance value of this feature is defined using the following formula:

  • is the individual feature importance of the j-th feature.
  • is the average feature importance of the j-th feature in the i-th combinational feature.

ShapValues

A vector with contributions of each feature to the prediction for every input object and the expected value of the model prediction for the object (average prediction given no knowledge about the object).

  • is the contribution of the i-th feature.
  • is the expected value of the model prediction.

For a given object the sum is equal to the prediction on this object.

This is an implementation of the Consistent Individualized Feature Attribution for Tree Ensembles approach.

See the ShapValues file format.

Use the SHAP package to plot the returned values.

Calculation principles

The feature importance is calculated as follows for each feature :

  • is the number of input features.
  • is the set of all input features.
  • is the set of non-zero feature indexes (the features that are being observed and not unknown).
  • is the model's prediction for the input , where  is the expected value of the function conditioned on a subset S of the input features.

Interaction

The value of the feature interaction strength for each pair of features.

See the Interaction file format.

Calculation principles

InternalInteraction

The value of the feature interaction strength for each pair of features that are used in the model. Internally the model uses feature combinations as separate features. All feature combinations that are used in the model are listed separately. For example, if the model contains a feature named F1 and a combination of features {F2, F3}, the interaction between F1 and the combination of features {F2, F3} is listed in the output file.
  • The rows are sorted in descending order of the feature interaction strength value.

See the InternalInteraction file format.

Calculation principles