sum_models

Purpose

Blend trees and counters of two or more trained CatBoost models into a new model. Leaf values can be individually weighted for each input model. For example, it may be useful to blend models trained on different validation datasets.

Method call format

sum_models(models, 
           weights=None, 
           ctr_merge_policy='IntersectingCountersAverage')

Parameters

ParameterPossible valuesDescriptionDefault value
modelslist of CatBoost models

A list of models to blend.

Required parameter
weightslist of numbers

A list of weights for the leaf values of each model. The length of this list must be equal to the number of blended models.

А list of weights equal to “1.0/N” for N blended models gives the average prediction. For example, the following list of weights gives the average prediction for four blended models:
[0.25,0.25,0.25,0.25]
None (leaf values weights are set to 1 for all models)
ctr_merge_policystringThe counters merging policy. Possible values:
  • FailIfCtrsIntersects — Ensure that the models have zero intersecting counters.
  • LeaveMostDiversifiedTable — Use the most diversified counters by the count of unique hash values.
  • IntersectingCountersAverage — Use the average ctr counter values in the intersecting bins.
IntersectingCountersAverage

Type of return value

CatBoost model

Example

from catboost import CatBoostClassifier, Pool, sum_models
from catboost.datasets import amazon
import numpy as np
from sklearn.model_selection import train_test_split

train_df, _ = amazon()

y = train_df.ACTION
X = train_df.drop('ACTION', axis=1)

categorical_features_indices = np.where(X.dtypes != np.float)[0]

X_train, X_validation, y_train, y_validation = train_test_split(X, 
                                                                y, 
                                                                train_size=0.8, 
                                                                random_state=42)

train_pool = Pool(X_train, 
                  y_train, 
                  cat_features=categorical_features_indices)
validate_pool = Pool(X_validation, 
                     y_validation, 
                     cat_features=categorical_features_indices)

models = []
for i in range(5):
    model = CatBoostClassifier(iterations=100, 
                               random_seed=i)
    model.fit(train_pool, 
              eval_set=validate_pool)
    models.append(model)

models_avrg = sum_models(models, 
                         weights=[1.0/len(models)] * len(models))