# Training parameters

Training on GPU requires NVIDIA Driver of version 390.xx or higher.

Parameter | Description | Default value |
---|---|---|

Common parameters | ||

loss_function | The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
- RMSE
- Logloss
- MAE
- CrossEntropy
- Quantile
- LogLinQuantile
- Lq
- MultiClass
- MultiClassOneVsAll
- MAPE
- Poisson
- PairLogit
- PairLogitPairwise
- QueryRMSE
- QuerySoftMax
- YetiRank
- YetiRankPairwise
Supported metrics: For example, use the following construction to calculate the value of Quantile with the coefficient :
| RMSE |

custom_loss | Metric values to output during training. These functions are not optimized and are displayed for informational purposes only. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).. Format:
- RMSE
- Logloss
- MAE
- CrossEntropy
- Quantile
- LogLinQuantile
- Lq
- MultiClass
- MultiClassOneVsAll
- MAPE
- Poisson
- PairLogit
- PairLogitPairwise
- QueryRMSE
- QuerySoftMax
- SMAPE
- Recall
- Precision
- F1
- TotalF1
- Accuracy
- BalancedAccuracy
- BalancedErrorRate
- Kappa
- WKappa
- LogLikelihoodOfPrediction
- AUC
- R2
- MCC
- BrierScore
- HingeLoss
- HammingLoss
- ZeroOneLoss
- MSLE
- MedianAbsoluteError
- PairAccuracy
- AverageGain
- PFound
- NDCG
- PrecisionAt
- RecallAt
- MAP
- CtrFactor
Supported metrics: Examples: Calculate the value of CrossEntropy: `c('CrossEntropy')` Or simply:`'CrossEntropy'` Calculate the values of Logloss and AUC: `c('Logloss', 'AUC')` - Calculate the value of Quantile with the coefficient
`c('Quantile:alpha=0.1')`
Values of all custom metrics for learn and validation datasets are saved to the Metric output files (learn_error.tsv and test_error.tsv respectively). The directory for these files is specified in the --train-dir (train_dir) parameter. | None (use one of the metrics supported by the library) |

eval_metric | The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
- RMSE
- Logloss
- MAE
- CrossEntropy
- Quantile
- LogLinQuantile
- Lq
- MultiClass
- MultiClassOneVsAll
- MAPE
- Poisson
- PairLogit
- PairLogitPairwise
- QueryRMSE
- QuerySoftMax
- SMAPE
- Recall
- Precision
- F1
- TotalF1
- Accuracy
- BalancedAccuracy
- BalancedErrorRate
- Kappa
- WKappa
- LogLikelihoodOfPrediction
- AUC
- R2
- MCC
- BrierScore
- HingeLoss
- HammingLoss
- ZeroOneLoss
- MSLE
- MedianAbsoluteError
- PairAccuracy
- AverageGain
- PFound
- NDCG
- PrecisionAt
- RecallAt
- MAP
Supported metrics:
| Optimized objective is used |

iterations | The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter. | 1000 |

learning_rate | The learning rate. Used for reducing the gradient step. | The default value is defined automatically based on the dataset properties and training parameters if all of the following conditions are met: The binary classification machine learning problem is being solved. Some parameters are not set (refer to the list)
The value is set to 0.03 otherwise. |

random_seed | The random seed used for training. | 0 |

l2_leaf_reg | L2 regularization coefficient. Used for leaf value calculation. Any positive values are allowed. | 3 |

bootstrap_type | Bootstrap type. Defines the method for sampling the weights of objects. Supported methods: - Poisson (supported for GPU only)
- Bayesian
- Bernoulli
- No
| Bayesian |

bagging_temperature | Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range . The higher the value the more aggressive the bagging is. | 1 |

subsample | Sample rate for bagging. This parameter can be used if one of the following bootstrap types is defined: - Poisson
- Bernoulli
| 0.66 |

sampling_frequency | Frequency to sample weights and objects when building trees. Supported values: - PerTree
- PerTreeLevel
| PerTreeLevel |

random_strength | Score the standard deviation multiplier. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. | 1 |

use_best_model | If this parameter is set, the number of trees that are saved in the resulting model is defined as follows: - Build the number of trees defined by the training parameters.
- Use the validation dataset to identify the iteration with the optimal value of the metric specified in --eval-metric (eval_metric).
No trees are saved after this iteration. This option requires a validation dataset to be provided. | True if a validation set is input (the train_pool parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise. |

best_model_min_trees | The minimal number of trees that the best model should have. If set, the output model contains at least the given number of trees even if the best model is located within these trees. Should be used with the use_best_model parameter. | None (The minimal number of trees for the best model is not set) |

train_pool | The validation set for the following processes: - overfitting detector
- best iteration selection
- monitoring metrics' changes
| None |

depth | Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function: CPU — Any integer up to 16. GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.
| 6 |

ignored_features | Indices of features to exclude from training. The non-negative indices that do not match any features are successfully ignored. For example, if five features are defined for the objects in the dataset and this parameter is set to “42”, the corresponding non-existing feature is successfully ignored. The identifier corresponds to the feature's index. Feature indices used in train and feature importance are numbered from 0 to featureCount – 1. If a file is used as input data then any non-feature column types are ignored when calculating these indices. For example, each row in the input file contains data in the following order: The identifiers of features to exclude should be enumerated at vector. For example, if training should exclude features with the identifiers 1, 2, 7, 42, 43, 44, 45, the value of this parameter should be set to | None (use all features) |

one_hot_max_size | Use one-hot encoding for all features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features. | 2 |

has_time | Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data. | FALSE (not used; generate random permutations) |

rsm | Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. The value must be in the range (0;1]. | 1 |

nan_mode | The method to process NaN values in the input dataset. Possible values: - “Forbidden” — NaN values are not supported, their presence raises an exception.
- “Min” — Each NaN float feature is processed as the minimum value from the dataset.
- “Max” — Each NaN float feature is processed as the maximum value from the dataset.
Note. The method for processing NaN values can also be set in the Custom quantization borders and NaN modes input file. Such values override the ones specified in this parameter. | Min |

fold_permutation_block_size | Objects in the dataset are grouped in blocks before the random permutations. This parameter defines the size of the blocks. The smaller is the value, the slower is the training. Large values may result in quality degradation. | Default value differs depending on the dataset size and ranges from 1 to 256 inclusively |

leaf_estimation_iterations | The number of gradient steps when calculating the values in leaves. | Depends on the training objective |

leaf_estimation_method | The method used to calculate the values in leaves. Possible values: - Newton
- Gradient
| Default value depends on the selected metric |

name | The experiment name to display in visualization tools. | experiment |

fold_len_multiplier | Coefficient for changing the length of folds. The value must be greater than 1. The best validation result is achieved with minimum values. With values close to 1 (for example, ), each iteration takes a quadratic amount of memory and time for the number of objects in the iteration. Thus, low values are possible only when there is a small number of objects. | 2 |

approx_on_full_history | The principles for calculating the approximated values. Possible values: - “TRUE” — Use all the preceding rows in the fold for calculating the approximated values. This mode is slower and in rare cases slightly more accurate.
- “FALSE” — Use only а fraction of the fold for calculating the approximated values. The size of the fraction is calculated as follows: , where X is the specified coefficient for changing the length of folds. This mode is faster and in rare cases slightly less accurate
| TRUE |

class_weights | Class weights. The values are used as multipliers for the object weights. This parameter can be used for solving classification and multiclassification problems. For example, | None (the weight for all classes is set to 1) |

boosting_type | Boosting scheme. Possible values: - Ordered — Usually provides better quality on small datasets, but it may be slower than the Plain scheme.
- Plain — The classic gradient boosting scheme.
| Depends on the number of objects in the training dataset and the selected learning mode |

allow_const_label | Use it to train models with datasets that have equal label values for all objects. | False |

cat_features | A vector of categorical features indices. The indices are zero-based and can differ from the ones given in the Column descriptions file. | NULL (it is assumed that all columns are the values of numerical features) |

Overfitting detection settings | ||

od_type | The type of the overfitting detector to use. Possible values: - IncToDec
- Iter
| IncToDec |

od_pval | The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range . The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type. | 0 (the overfitting detection is turned off) |

od_wait | The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type: - IncToDec — Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value.
- Iter — Consider the model overfitted and stop training after the specified number of iterations since the iteration with the optimal metric value.
| 20 |

early_stopping_rounds | Set the overfitting detector type to Iter and stop the training after the specified number of iterations since the iteration with the optimal metric value. | FALSE |

Binarization settings | ||

border_count | The number of splits for numerical features. Allowed values are integers from 1 to 255 inclusively. | 254 (if training is performed on CPU) or 128 (if training is performed on GPU) |

feature_border_type | The binarization mode for numerical features. Possible values: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
| GreedyLogSum |

Multiclassification settings | ||

classes_count | The upper limit for the numeric class label. Defines the number of classes for multiclassification. Only non-negative integers can be specified. The given integer should be greater than any of the label values. If this parameter is specified the labels for all classes in the input dataset should be smaller than the given value | maximum class label + 1 |

Performance settings | ||

thread_count | The number of threads to use during training. Optimizes the speed of execution. This parameter doesn't affect results. | -1 (the number of threads is equal to the number of cores) (The number of processor cores) |

Processing units settings | ||

task_type | The processing unit type to use for training. Possible values: - CPU
- GPU
| CPU |

devices | IDs of the GPU devices to use for training (indices are zero-based). Format `<unit ID>` for one device (for example,`3` )`<unit ID1>:<unit ID2>:..:<unit IDN>` for multiple devices (for example,`devices='0:1:3'` )`<unit ID1>-<unit IDN>` for a range of devices (for example,`devices='0-3'` )
| -1 (all GPU devices are used if the corresponding processing unit type is selected) |

Output settings | ||

logging_level | The logging level to output to stdout. Possible values: Silent — Do not output any logging information to stdout. Verbose — Output the following data to stdout: - optimized metric
- elapsed time of training
- remaining time of training
Info — Output additional information and the number of trees. - Debug — Output debugging information.
| Verbose |

metric_period | The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. | 1 |

verbose | The frequency of iterations to print the information to stdout. The value of this parameter should be divisible by the value of the frequency of iterations to calculate the values of objectives and metrics. Restriction. Do not use this parameter with the logging_level parameter. | 1 |

train_dir | The directory for storing the files generated during training. | catboost_info |

model_size_reg | The model size regularization coefficient. The larger the value, the smaller the model size. Possible values are in the range . Large values reduce the number of feature combinations in the model. Note that the resulting quality of the model can be affected. Set the value to 0 to turn off the model size optimization option. | 0.5 |

allow_writing_files | Allow to write analytical and snapshot files during training. If set to “False”, the snapshot and data visualization tools are unavailable. | TRUE |

save_snapshot | Enable snapshotting for restoring the training progress after an interruption. | None |

snapshot_file | The name of the file to save the training progress information in. This file is used for recovering training after an interruption. Depending on whether the specified file exists in the file system: - Missing — Write information about training progress to the specified file.
- Exists — Load data from the specified file and continue training from where it left off.
| File can't be generated or read. If the value is omitted, the file name is experiment.cbsnapshot. |

snapshot_interval | The interval between saving snapshots in seconds. The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training. | 600 |

CTR settings | ||

simple_ctr | Binarization settings for simple categorical features. Format:
Components: `CtrType` — The method for transforming categorical features to numerical features.Supported methods for training on CPU: - Borders
- Buckets
- BinarizedTargetMeanValue
- Counter
Supported methods for training on GPU: - Borders
- Buckets
- FeatureFreq
- FloatTargetMeanValue
`TargetBorderCount` — The number of borders for label value binarization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.This option is available for training on CPU only. `TargetBorderType` — The binarization type for the label value. Only used for regression problems.Possible values: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively.`CtrBorderType` — The binarization type for categorical features.Supported values for training on CPU:- Uniform
Supported values for training on GPU: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
`Prior` — Use the specified priors during training (several values can be specified).Possible formats:- One number — Adds the value to the numerator.
- Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.
| |

combinations_ctr | Binarization settings for combinations of categorical features. Format:
Components: `CtrType` — The method for transforming categorical features to numerical features.Supported methods for training on CPU: - Borders
- Buckets
- BinarizedTargetMeanValue
- Counter
Supported methods for training on GPU: - Borders
- Buckets
- FeatureFreq
- FloatTargetMeanValue
`TargetBorderCount` — The number of borders for label value binarization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.This option is available for training on CPU only. `TargetBorderType` — The binarization type for the label value. Only used for regression problems.Possible values: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively.`CtrBorderType` — The binarization type for categorical features.Supported values for training on CPU:- Uniform
Supported values for training on GPU:- Uniform
- Median
`Prior` — Use the specified priors during training (several values can be specified).Possible formats:- One number — Adds the value to the numerator.
- Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.
| |

counter_calc_method | The method for calculating the Counter CTR type. Possible values: - SkipTest — Objects from the validation dataset are not considered at all
- Full — All objects from both learn and validation datasets are considered
| Full |

max_ctr_complexity | The maximum number of categorical features that can be combined. | 4 |

ctr_leaf_count_limit | The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded. The leaves to be discarded are selected as follows: - The leaves are sorted by the frequency of the values.
- The top N leaves are selected, where N is the value specified in the parameter.
- All leaves starting from N+1 are discarded.
This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected. | None The number of leafs with categorical features is not limited |

store_all_simple_ctr | Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion. Use this parameter with ctr_leaf_count_limit only. | False Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features |

final_ctr_computation_mode | Final CTR computation mode. Possible values: - Default — Compute final CTRs for learn and validation datasets.
- Skip — Do not compute final CTRs for learn and validation datasets. In this case, the resulting model can not be applied. This mode decreases the size of the resulting model. It can be useful for research purposes when only the metric values have to be calculated.
| CPU and GPU |

Parameter | Description | Default value |
---|---|---|

Common parameters | ||

loss_function | The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
- RMSE
- Logloss
- MAE
- CrossEntropy
- Quantile
- LogLinQuantile
- Lq
- MultiClass
- MultiClassOneVsAll
- MAPE
- Poisson
- PairLogit
- PairLogitPairwise
- QueryRMSE
- QuerySoftMax
- YetiRank
- YetiRankPairwise
Supported metrics: For example, use the following construction to calculate the value of Quantile with the coefficient :
| RMSE |

custom_loss | Metric values to output during training. These functions are not optimized and are displayed for informational purposes only. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).. Format:
- RMSE
- Logloss
- MAE
- CrossEntropy
- Quantile
- LogLinQuantile
- Lq
- MultiClass
- MultiClassOneVsAll
- MAPE
- Poisson
- PairLogit
- PairLogitPairwise
- QueryRMSE
- QuerySoftMax
- SMAPE
- Recall
- Precision
- F1
- TotalF1
- Accuracy
- BalancedAccuracy
- BalancedErrorRate
- Kappa
- WKappa
- LogLikelihoodOfPrediction
- AUC
- R2
- MCC
- BrierScore
- HingeLoss
- HammingLoss
- ZeroOneLoss
- MSLE
- MedianAbsoluteError
- PairAccuracy
- AverageGain
- PFound
- NDCG
- PrecisionAt
- RecallAt
- MAP
- CtrFactor
Supported metrics: Examples: Calculate the value of CrossEntropy: `c('CrossEntropy')` Or simply:`'CrossEntropy'` Calculate the values of Logloss and AUC: `c('Logloss', 'AUC')` - Calculate the value of Quantile with the coefficient
`c('Quantile:alpha=0.1')`
Values of all custom metrics for learn and validation datasets are saved to the Metric output files (learn_error.tsv and test_error.tsv respectively). The directory for these files is specified in the --train-dir (train_dir) parameter. | None (use one of the metrics supported by the library) |

eval_metric | The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
- RMSE
- Logloss
- MAE
- CrossEntropy
- Quantile
- LogLinQuantile
- Lq
- MultiClass
- MultiClassOneVsAll
- MAPE
- Poisson
- PairLogit
- PairLogitPairwise
- QueryRMSE
- QuerySoftMax
- SMAPE
- Recall
- Precision
- F1
- TotalF1
- Accuracy
- BalancedAccuracy
- BalancedErrorRate
- Kappa
- WKappa
- LogLikelihoodOfPrediction
- AUC
- R2
- MCC
- BrierScore
- HingeLoss
- HammingLoss
- ZeroOneLoss
- MSLE
- MedianAbsoluteError
- PairAccuracy
- AverageGain
- PFound
- NDCG
- PrecisionAt
- RecallAt
- MAP
Supported metrics:
| Optimized objective is used |

iterations | The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter. | 1000 |

learning_rate | The learning rate. Used for reducing the gradient step. | The default value is defined automatically based on the dataset properties and training parameters if all of the following conditions are met: The binary classification machine learning problem is being solved. Some parameters are not set (refer to the list)
The value is set to 0.03 otherwise. |

random_seed | The random seed used for training. | 0 |

l2_leaf_reg | L2 regularization coefficient. Used for leaf value calculation. Any positive values are allowed. | 3 |

bootstrap_type | Bootstrap type. Defines the method for sampling the weights of objects. Supported methods: - Poisson (supported for GPU only)
- Bayesian
- Bernoulli
- No
| Bayesian |

bagging_temperature | Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range . The higher the value the more aggressive the bagging is. | 1 |

subsample | Sample rate for bagging. This parameter can be used if one of the following bootstrap types is defined: - Poisson
- Bernoulli
| 0.66 |

sampling_frequency | Frequency to sample weights and objects when building trees. Supported values: - PerTree
- PerTreeLevel
| PerTreeLevel |

random_strength | Score the standard deviation multiplier. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. | 1 |

use_best_model | If this parameter is set, the number of trees that are saved in the resulting model is defined as follows: - Build the number of trees defined by the training parameters.
- Use the validation dataset to identify the iteration with the optimal value of the metric specified in --eval-metric (eval_metric).
No trees are saved after this iteration. This option requires a validation dataset to be provided. | True if a validation set is input (the train_pool parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise. |

best_model_min_trees | The minimal number of trees that the best model should have. If set, the output model contains at least the given number of trees even if the best model is located within these trees. Should be used with the use_best_model parameter. | None (The minimal number of trees for the best model is not set) |

train_pool | The validation set for the following processes: - overfitting detector
- best iteration selection
- monitoring metrics' changes
| None |

depth | Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function: CPU — Any integer up to 16. GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.
| 6 |

ignored_features | Indices of features to exclude from training. The non-negative indices that do not match any features are successfully ignored. For example, if five features are defined for the objects in the dataset and this parameter is set to “42”, the corresponding non-existing feature is successfully ignored. The identifier corresponds to the feature's index. Feature indices used in train and feature importance are numbered from 0 to featureCount – 1. If a file is used as input data then any non-feature column types are ignored when calculating these indices. For example, each row in the input file contains data in the following order: The identifiers of features to exclude should be enumerated at vector. For example, if training should exclude features with the identifiers 1, 2, 7, 42, 43, 44, 45, the value of this parameter should be set to | None (use all features) |

one_hot_max_size | Use one-hot encoding for all features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features. | 2 |

has_time | Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data. | FALSE (not used; generate random permutations) |

rsm | Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. The value must be in the range (0;1]. | 1 |

nan_mode | The method to process NaN values in the input dataset. Possible values: - “Forbidden” — NaN values are not supported, their presence raises an exception.
- “Min” — Each NaN float feature is processed as the minimum value from the dataset.
- “Max” — Each NaN float feature is processed as the maximum value from the dataset.
Note. The method for processing NaN values can also be set in the Custom quantization borders and NaN modes input file. Such values override the ones specified in this parameter. | Min |

fold_permutation_block_size | Objects in the dataset are grouped in blocks before the random permutations. This parameter defines the size of the blocks. The smaller is the value, the slower is the training. Large values may result in quality degradation. | Default value differs depending on the dataset size and ranges from 1 to 256 inclusively |

leaf_estimation_iterations | The number of gradient steps when calculating the values in leaves. | Depends on the training objective |

leaf_estimation_method | The method used to calculate the values in leaves. Possible values: - Newton
- Gradient
| Default value depends on the selected metric |

name | The experiment name to display in visualization tools. | experiment |

fold_len_multiplier | Coefficient for changing the length of folds. The value must be greater than 1. The best validation result is achieved with minimum values. With values close to 1 (for example, ), each iteration takes a quadratic amount of memory and time for the number of objects in the iteration. Thus, low values are possible only when there is a small number of objects. | 2 |

approx_on_full_history | The principles for calculating the approximated values. Possible values: - “TRUE” — Use all the preceding rows in the fold for calculating the approximated values. This mode is slower and in rare cases slightly more accurate.
- “FALSE” — Use only а fraction of the fold for calculating the approximated values. The size of the fraction is calculated as follows: , where X is the specified coefficient for changing the length of folds. This mode is faster and in rare cases slightly less accurate
| TRUE |

class_weights | Class weights. The values are used as multipliers for the object weights. This parameter can be used for solving classification and multiclassification problems. For example, | None (the weight for all classes is set to 1) |

boosting_type | Boosting scheme. Possible values: - Ordered — Usually provides better quality on small datasets, but it may be slower than the Plain scheme.
- Plain — The classic gradient boosting scheme.
| Depends on the number of objects in the training dataset and the selected learning mode |

allow_const_label | Use it to train models with datasets that have equal label values for all objects. | False |

cat_features | A vector of categorical features indices. The indices are zero-based and can differ from the ones given in the Column descriptions file. | NULL (it is assumed that all columns are the values of numerical features) |

Overfitting detection settings | ||

od_type | The type of the overfitting detector to use. Possible values: - IncToDec
- Iter
| IncToDec |

od_pval | The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range . The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type. | 0 (the overfitting detection is turned off) |

od_wait | The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type: - IncToDec — Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value.
- Iter — Consider the model overfitted and stop training after the specified number of iterations since the iteration with the optimal metric value.
| 20 |

early_stopping_rounds | Set the overfitting detector type to Iter and stop the training after the specified number of iterations since the iteration with the optimal metric value. | FALSE |

Binarization settings | ||

border_count | The number of splits for numerical features. Allowed values are integers from 1 to 255 inclusively. | 254 (if training is performed on CPU) or 128 (if training is performed on GPU) |

feature_border_type | The binarization mode for numerical features. Possible values: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
| GreedyLogSum |

Multiclassification settings | ||

classes_count | The upper limit for the numeric class label. Defines the number of classes for multiclassification. Only non-negative integers can be specified. The given integer should be greater than any of the label values. If this parameter is specified the labels for all classes in the input dataset should be smaller than the given value | maximum class label + 1 |

Performance settings | ||

thread_count | The number of threads to use during training. Optimizes the speed of execution. This parameter doesn't affect results. | -1 (the number of threads is equal to the number of cores) (The number of processor cores) |

Processing units settings | ||

task_type | The processing unit type to use for training. Possible values: - CPU
- GPU
| CPU |

devices | IDs of the GPU devices to use for training (indices are zero-based). Format `<unit ID>` for one device (for example,`3` )`<unit ID1>:<unit ID2>:..:<unit IDN>` for multiple devices (for example,`devices='0:1:3'` )`<unit ID1>-<unit IDN>` for a range of devices (for example,`devices='0-3'` )
| -1 (all GPU devices are used if the corresponding processing unit type is selected) |

Output settings | ||

logging_level | The logging level to output to stdout. Possible values: Silent — Do not output any logging information to stdout. Verbose — Output the following data to stdout: - optimized metric
- elapsed time of training
- remaining time of training
Info — Output additional information and the number of trees. - Debug — Output debugging information.
| Verbose |

metric_period | The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. | 1 |

verbose | The frequency of iterations to print the information to stdout. The value of this parameter should be divisible by the value of the frequency of iterations to calculate the values of objectives and metrics. Restriction. Do not use this parameter with the logging_level parameter. | 1 |

train_dir | The directory for storing the files generated during training. | catboost_info |

model_size_reg | The model size regularization coefficient. The larger the value, the smaller the model size. Possible values are in the range . Large values reduce the number of feature combinations in the model. Note that the resulting quality of the model can be affected. Set the value to 0 to turn off the model size optimization option. | 0.5 |

allow_writing_files | Allow to write analytical and snapshot files during training. If set to “False”, the snapshot and data visualization tools are unavailable. | TRUE |

save_snapshot | Enable snapshotting for restoring the training progress after an interruption. | None |

snapshot_file | The name of the file to save the training progress information in. This file is used for recovering training after an interruption. Depending on whether the specified file exists in the file system: - Missing — Write information about training progress to the specified file.
- Exists — Load data from the specified file and continue training from where it left off.
| File can't be generated or read. If the value is omitted, the file name is experiment.cbsnapshot. |

snapshot_interval | The interval between saving snapshots in seconds. The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training. | 600 |

CTR settings | ||

simple_ctr | Binarization settings for simple categorical features. Format:
Components: `CtrType` — The method for transforming categorical features to numerical features.Supported methods for training on CPU: - Borders
- Buckets
- BinarizedTargetMeanValue
- Counter
Supported methods for training on GPU: - Borders
- Buckets
- FeatureFreq
- FloatTargetMeanValue
`TargetBorderCount` — The number of borders for label value binarization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.This option is available for training on CPU only. `TargetBorderType` — The binarization type for the label value. Only used for regression problems.Possible values: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively.`CtrBorderType` — The binarization type for categorical features.Supported values for training on CPU:- Uniform
Supported values for training on GPU: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
`Prior` — Use the specified priors during training (several values can be specified).Possible formats:- One number — Adds the value to the numerator.
- Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.
| |

combinations_ctr | Binarization settings for combinations of categorical features. Format:
Components: `CtrType` — The method for transforming categorical features to numerical features.Supported methods for training on CPU: - Borders
- Buckets
- BinarizedTargetMeanValue
- Counter
Supported methods for training on GPU: - Borders
- Buckets
- FeatureFreq
- FloatTargetMeanValue
`TargetBorderCount` — The number of borders for label value binarization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.This option is available for training on CPU only. `TargetBorderType` — The binarization type for the label value. Only used for regression problems.Possible values: - Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively.`CtrBorderType` — The binarization type for categorical features.Supported values for training on CPU:- Uniform
Supported values for training on GPU:- Uniform
- Median
`Prior` — Use the specified priors during training (several values can be specified).- One number — Adds the value to the numerator.
| |

counter_calc_method | The method for calculating the Counter CTR type. Possible values: - SkipTest — Objects from the validation dataset are not considered at all
- Full — All objects from both learn and validation datasets are considered
| Full |

max_ctr_complexity | The maximum number of categorical features that can be combined. | 4 |

ctr_leaf_count_limit | The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded. The leaves to be discarded are selected as follows: - The leaves are sorted by the frequency of the values.
- The top N leaves are selected, where N is the value specified in the parameter.
- All leaves starting from N+1 are discarded.
This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected. | None The number of leafs with categorical features is not limited |

store_all_simple_ctr | Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion. Use this parameter with ctr_leaf_count_limit only. | False Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features |

final_ctr_computation_mode | Final CTR computation mode. Possible values: - Default — Compute final CTRs for learn and validation datasets.
- Skip — Do not compute final CTRs for learn and validation datasets. In this case, the resulting model can not be applied. This mode decreases the size of the resulting model. It can be useful for research purposes when only the metric values have to be calculated.
| CPU and GPU |