get_object_importance

Calculate the effect of objects from the train dataset on the optimized metric values for the objects from the input dataset:
  • Positive values reflect that the optimized metric increases.
  • Negative values reflect that the optimized metric decreases.

The higher the deviation from 0, the bigger the impact that an object has on the optimized metric.

The method is an implementation of the approach described in the Finding Influential Training Samples for Gradient Boosted Decision Trees paper .

Method call format

get_object_importance(pool,
                      train_pool, 
                      top_size=-1, 
                      ostr_type='Average', 
                      update_method='SinglePoint', 
                      importance_values_sign='All', 
                      thread_count=-1)

Parameters

ParameterPossible typesDescriptionDefault value
poolcatboost.PoolThe data for calculating object importances.Required parameter
train_poolcatboost.Pool

The dataset used for training.

Required parameter
top_sizeint

Defines the number of most important objects from the training dataset. The number of returned objects is limited to this number.

-1 (top size is not limited)
ostr_typeint

The method for calculating the object importances.

Possible values:
  • Average — The average of scores of objects from the training dataset for every object from the input dataset.
  • PerObject — The scores of each object from the training dataset for each object from the input dataset.
Average
update_methodstring

The algorithm accuracy method.

Possible values:
  • SinglePoint — The fastest and least accurate method.
  • TopKLeaves — Specify the number of leaves. The higher the value, the more accurate and the slower the calculation.
  • AllPoints — The slowest and most accurate method.
Supported parameters:
For example, the following value sets the method to TopKLeaves and limits the number of leaves to 3:
TopKLeaves:top=3
SinglePoint
importance_values_signstring

Defines the type of effect that the objects from the training dataset must have on the optimized metric value for objects from the input dataset. Only the appropriate objects are output.

Possible values:
  • Positive
  • Negative
  • All
All
thread_countint

The number of threads to use during training.

Optimizes the speed of execution. This parameter doesn't affect results.

-1 (the number of threads is equal to the number of cores)The number of processor cores)

Type of return value

Two lists of lists with indices and scores.

For example, if the input dataset contains 3 rows and the training dataset contains 4 rows, the indices list takes the following structure:

[[1, 2, 3, 4], [4, 3, 2, 1], [1, 3, 4, 2]]

The scores list has the same structure with the corresponding scores instead of indices.