Parameter tuning

CatBoost provides a flexible interface for parameter tuning and can be configured to suit different tasks.

One-hot encoding

Attention. Do not use one-hot encoding during preprocessing. This affects both the training speed and the resulting quality.

Sometimes when categorical features don't have a lot of values, one-hot encoding works well.

Usually one-hot encoding does not significantly improve the quality of the model. But if it is required, use the inbuilt parameters instead of preprocessing the dataset.
Parameters
CLI parametersPython parametersR parametersDescription
--one-hot-max-sizeone_hot_max_sizeone_hot_max_size

Use one-hot encoding for all features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features.

Number of trees

It is recommended to check that there is no obvious underfitting or overfitting before tuning any other parameters. In order to do this it is necessary to analyze the metric value on the test dataset and select the appropriate number of iterations.

This can be done by setting the number of iterations to a large value, using the overfitting detector parameters and turning the use best model options on. In this case the resulting model contains only the first k best iterations, where k is the iteration with the best loss value on the test dataset.

Also, the metric for choosing the best model may differ from the one used for optimizing the objective value. For example, it is possible to set the optimized function to Logloss and use the AUC function for the overfitting detector. To do so, use the evaluation metric parameter.

Parameters
CLI parametersPython parametersR parametersDescription

-i

--iterations

iterationsiterations

The maximum number of trees that can be built when solving machine learning problems.

When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.

--use-best-modeluse_best_modeluse_best_model
If this parameter is set, the number of trees that are saved in the resulting model is defined as follows:
  1. Build the number of trees defined by the training parameters.
  2. Use the validation dataset to identify the iteration with the optimal value of the metric specified in  --eval-metric (eval_metric).

No trees are saved after this iteration.

This option requires a validation dataset to be provided.

--eval-metriceval_metriceval_metric

The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).

Format:
<Metric>[:<parameter 1>=<value>;..;<parameter N>=<value>]
Supported metrics:
  • RMSE
  • Logloss
  • MAE
  • CrossEntropy
  • Quantile
  • LogLinQuantile
  • Lq
  • MultiClass
  • MultiClassOneVsAll
  • MAPE
  • Poisson
  • PairLogit
  • PairLogitPairwise
  • QueryRMSE
  • QuerySoftMax
  • SMAPE
  • Recall
  • Precision
  • F1
  • TotalF1
  • Accuracy
  • BalancedAccuracy
  • BalancedErrorRate
  • Kappa
  • WKappa
  • LogLikelihoodOfPrediction
  • AUC
  • R2
  • MCC
  • BrierScore
  • HingeLoss
  • HammingLoss
  • ZeroOneLoss
  • MSLE
  • MedianAbsoluteError
  • PairAccuracy
  • AverageGain
  • PFound
  • NDCG
  • PrecisionAt
  • RecallAt
  • MAP
Examples:
R2
Quantile:alpha=0.3
Overfitting detection settings

--od-type

od_typeod_type

The type of the overfitting detector to use.

Possible values:
  • IncToDec
  • Iter

--od-pval

od_pvalod_pval

The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input.

For best results, it is recommended to set a value in the range .

The larger the value, the earlier overfitting is detected.

Restriction.

Do not use this parameter with the Iter overfitting detector type.

--od-wait

od_waitod_waitThe number of iterations to continue the training after the iteration with the optimal metric value.
The purpose of this parameter differs depending on the selected overfitting detector type:
  • IncToDec — Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value.
  • Iter — Consider the model overfitted and stop training after the specified number of iterations since the iteration with the optimal metric value.

Learning rate

This setting is used for reducing the gradient step. It affects the overall time of training: the smaller the value, the more iterations are required for training. Choose the value based on the performance expectations.

By default, the learning rate is defined automatically based on the dataset properties and the number of iterations. The automatically defined value should be close to the optimal one.

Possible ways of adjusting the learning rate depending on the overfitting results:
  • There is no overfitting on the last iterations of training (the training does not converge) — increase the learning rate.
  • Overfitting is detected — decrease the learning rate.
Parameters
CLI parametersPython parametersR parametersDescription

-w

--learning-rate

learning_ratelearning_rate

The learning rate.

Used for reducing the gradient step.

Tree depth

In most cases, the optimal depth ranges from 4 to 10. Values in the range from 6 to 10 are recommended.

Note.

The maximum depth of the trees is limited to 8 for pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) when the training is performed on GPU.

Parameters
CLI parametersPython parametersR parametersDescription

-n

--depth

depthdepth

Depth of the tree.

The range of supported values depends on the processing unit type and the type of the selected loss function:
  • CPU — Any integer up to  16.

  • GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to   16 for all other loss functions.

L2 regularization

Try different values for the regularizer to find the best possible.

Parameters
CLI parametersPython parametersR parametersDescription
--l2-leaf-regl2_leaf_regl2_leaf_reg

L2 regularization coefficient. Used for leaf value calculation.

Any positive values are allowed.

Random strength

Try setting different values for the random_strength parameter.

Parameters
CLI parametersPython parametersR parametersDescription
--random-strengthrandom_strengthrandom_strength

Score the standard deviation multiplier. Use this parameter to avoid overfitting the model.

The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected.

The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance.

Bagging temperature

Try setting different values for the bagging_temperature parameter

Parameters
CLI parametersPython parametersR parametersDescription
--bagging-temperaturebagging_temperaturebagging_temperature

Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes.

Use the Bayesian bootstrap to assign random weights to objects.

The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”.

Possible values are in the range . The higher the value the more aggressive the bagging is.

Border count

The number of splits for numerical features.

By default, it is set to 254 (if training is performed on CPU) or 128 (if training is performed on GPU).

The value of this parameter significantly impacts the speed of training on GPU. The smaller the value, the faster the training is performed (refer to the Number of splits for numerical features section for details).

128 splits are enough for many datasets. However, try to set the value of this parameter to 254 when training on GPU if the best possible quality is required.

The value of this parameter does not significantly impact the speed of training on CPU. Try to set it to 254 for the best possible quality.

Parameters
CLI parametersPython parametersR parametersDescription

-x

--border-count

border_count

Alias: max_bin

border_count

The number of splits for numerical features. Allowed values are integers from 1 to 255 inclusively.

Internal dataset order

Use this option if the objects in your dataset are given in the required order. In this case, random permutations are not performed during the Transforming categorical features to numerical features and Choosing the tree structure stages.

Parameters
CLI parametersPython parametersR parametersDescription
--has-timehas_timehas_time

Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages).

The Timestamp column type is used to determine the order of objects if specified in the input data.