Speeding up the training

Note.

Certain changes to these parameters can decrease the quality of the resulting model.

Iterations and learning rate

By default, CatBoost builds 1000 trees. The number of iterations can be decreased to speed up the training.

When the number of iterations decreases, the learning rate needs to be increased. By default, the value of the learning rate is defined automatically depending on the number of iterations and the input dataset. Changing the number of iterations to a smaller value is a good starting point for optimization.

The default learning rate is close to optimal one, but it can be tuned to get the best possible quality. Look at evaluation metric values on each iteration to tune the learning rate:
  • Decrease the learning rate if overfitting is observed.
  • Increase the learning rate if there is no overfitting and the error on the evaluation dataset still reduces on the last iteration.
Parameters
CLI parametersPython parametersR parameters

-i

--iterations

iterations

Aliases:
  • num_boost_round
  • n_estimators
  • num_trees
iterations

-w

--learning-rate

learning_rate

Alias: eta

learning_rate

Boosting type

By default, the boosting type is set to Ordered for small datasets. This prevents overfitting but it is expensive in terms of computation. Try to set the value of this parameter to Plain to speed up the training.

Parameters
CLI parametersPython parametersR parameters
--boosting-typeboosting_typeboosting_type

Bootstrap type

By default, the method for sampling the weights of objects is set to Bayesian. The training is performed faster if the Bernoulli method is set and the value for the sample rate for bagging is smaller than 1.

Parameters
CLI parametersPython parametersR parameters
--bootstrap-typebootstrap_typebootstrap_type
--subsamplesubsamplesubsample

One-hot encoding

By default, one-hot encoding is used for categorical features with 2 different values or less. Statistics are calculated for all other categorical features. This is more time consuming than using one-hot encoding.

Set a larger value for this parameter to speed up the training.

Parameters
CLI parametersPython parametersR parameters
--one-hot-max-sizeone_hot_max_sizeone_hot_max_size

Random subspace method

For datasets with hundreds of features this parameter speeds up the training and usually does not affect the quality. It is not recommended to change the default value of this parameter for datasets with few (10-20) features.

For example, set the parameter to “0.1”. In this case, the training requires roughly 20% more iterations to converge. But each iteration is performed roughly ten times faster. Therefore, the training time is much shorter even though the resulting model contains more trees.

Parameters
CLI parametersPython parametersR parameters

--rsm

rsm

Alias: colsample_bylevel

rsm

Leaf estimation iterations

This parameter defines the rules for calculating leaf values after selecting the tree structures. The default value depends on the training objective and can slow down the training for datasets with a small number of features (for example, 10 features).

Try setting the value to “1” or “5” to speed up the training on datasets with a small number of features.

Parameters
CLI parametersPython parametersR parameters

--leaf-estimation-iterations

leaf_estimation_iterationsleaf_estimation_iterations

Number of categorical features to combine

By default, the combinations of categorical features are generated in a greedy way. This slows down the training.

Try turning off the generation of categorical feature combinations or limiting the number of categorical features that can be combined to two to speed up the training.

This parameter can affect the training time only if the dataset contains categorical features.

Parameters
CLI parametersPython parametersR parameters

--max-ctr-complexity

max_ctr_complexitymax_ctr_complexity

Number of splits for numerical features

This parameter defines the number of splits considered for each feature. By default, it is set to 254 (if training is performed on CPU) or 128 (if training is performed on GPU).

The value of this parameter significantly impacts the speed of training on GPU. The smaller the value, the faster the training is performed.

Try to set the value of this parameter to 32 if training is performed on GPU. In many cases, this does not affect the quality of the model but significantly speeds up the training.

The value of this parameter does not significantly impact the speed of training on CPU. Try to set it to 254 for the best possible quality.

Parameters
CLI parametersPython parametersR parameters

-x

--border-count

border_count

Alias: max_bin

border_count