Train a model
Training on GPU requires NVIDIA Driver of version 390.xx or higher.
Execution format
catboost fit -f <file path> [optional parameters]
Options
Option | Description | Default value | Supported processing units |
---|---|---|---|
Input file settings | |||
-f --learn-set | The path to the input file that contains the dataset description. | Required parameter (the path must be specified). | CPU and GPU |
-t --test-set | A comma-separated list of input files that contain the validation dataset description (the format must be the same as used in the training dataset). | Omitted. If this parameter is omitted, the validation dataset isn't used. | CPU and GPU Restriction. Only a single validation dataset can be input if the training is performed on GPU (--task-type is set to GPU) |
--cd --column-description | The path to the input file that contains the column descriptions. | If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features. | CPU and GPU |
--learn-pairs | The path to the input file that contains the pair descriptions. This information is used for calculation and optimization of Pairwise metrics. | Required parameter for the groupwise metrics (the path must be specified) | CPU and GPU |
--test-pairs | The path to the input file that contains the description of test pairs (the format must be the same as used for describing the training pairs). This information is used for calculation and optimization of Pairwise metrics. | Omitted (the test dataset is not used) | CPU and GPU |
--learn-group-weights | The path to the input file that contains the weights of groups. The dataset must contain the GroupId column in order to apply the file with the group weights. The weights from this file take precedence if they are also specified in the Dataset description file. | Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset) | CPU and GPU |
--test-group-weights | The path to the input file that contains the weights of groups for the validation dataset. The dataset must contain the GroupId column in order to apply the file with the group weights. The weights from this file take precedence if they are also specified in the Dataset description file. | Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset) | CPU and GPU |
--delimiter | The delimiter character used to separate the data in the dataset description input file. Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used. | The input data is assumed to be tab-separated | CPU and GPU |
--has-header | Read the column names from the first line if this parameter is set to | False | CPU and GPU |
--params-file | The path to the input JSON file that contains the training parameters, for example: Names of training parameters are the same as for the Python package or the R package. If a parameter is specified in both the JSON file and the corresponding command-line parameter, the command-line value is used. | Omitted | CPU and GPU |
--nan-mode | The method for processing missing values in the input dataset. Possible values:
Using the Min or Max value of this parameter guarantees that a split between missing values and other values is considered when selecting a new split in the tree. Note. The method for processing missing values can be set individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the ones specified in this parameter. | Min | CPU and GPU |
Training parameters | |||
--loss-function | The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
Supported metrics: For example, use the following construction to calculate the value of Quantile with the coefficient
| RMSE | CPU and GPU |
--custom-metric | Metric values to output during training. These functions are not optimized and are displayed for informational purposes only. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).. Format:
Supported metrics: Examples:
Values of all custom metrics for learn and validation datasets are saved to the Metric output files (learn_error.tsv and test_error.tsv respectively). The directory for these files is specified in the --train-dir (train_dir) parameter. | None (do not output additional metric values) | CPU |
--eval-metric | The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
Supported metrics: Examples:
| Optimized objective is used | CPU |
-i --iterations | The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter. | 1000 | CPU and GPU |
-w --learning-rate | The learning rate. Used for reducing the gradient step. | The default value is defined automatically for binary classification based on the dataset properties and the number of iterations if none of these parameters is set. In this case, the selected learning rate is printed to stdout and saved in the model. In other cases, the default value is 0.03. | CPU and GPU |
-r --random-seed | The random seed used for training. | 0 | CPU and GPU |
--l2-leaf-reg l2-leaf-regularizer | L2 regularization coefficient. Used for leaf value calculation. Any positive values are allowed. | 3 | CPU and GPU |
--bootstrap-type | Bootstrap type. Defines the method for sampling the weights of objects. Supported methods:
| Bayesian | CPU and GPU |
--bagging-temperature | Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range | 1 | CPU and GPU |
--subsample | Sample rate for bagging. This parameter can be used if one of the following bootstrap types is defined:
| 0.66 | CPU and GPU |
--sampling-frequency | Frequency to sample weights and objects when building trees. Supported values:
| PerTreeLevel | CPU and GPU |
--random-strength | The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. Note. This parameter is not supported for the following loss functions:
| 1 | CPU and GPU |
--use-best-model | If this parameter is set, the number of trees that are saved in the resulting model is defined as follows:
No trees are saved after this iteration. This option requires a validation dataset to be provided. | True if a validation set is input (the -t or the --test-set parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise. | CPU and GPU |
--best-model-min-trees | The minimal number of trees that the best model should have. If set, the output model contains at least the given number of trees even if the best model is located within these trees. Should be used with the --use-best-model parameter. | The minimal number of trees for the best model is not set | CPU and GPU |
-n --depth | Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function:
| 6 | CPU and GPU |
-I --ignore-features | Indices of features to exclude from training. The non-negative indices that do not match any features are successfully ignored. For example, if five features are defined for the objects in the dataset and this parameter is set to “42”, the corresponding non-existing feature is successfully ignored. The identifier corresponds to the feature's index. Feature indices used in train and feature importance are numbered from 0 to featureCount – 1. If a file is used as input data then any non-feature column types are ignored when calculating these indices. For example, each row in the input file contains data in the following order: Supported operators:
For example, if training should exclude features with the identifiers 1, 2, 7, 42, 43, 44, 45, use the following construction:
| None (use all features) | CPU and GPU |
--one-hot-max-size | Use one-hot encoding for all features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features. | 2 | CPU and GPU |
--has-time | Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data. | False (not used; generates random permutations) | CPU and GPU |
--rsm | Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. The value must be in the range (0;1]. | 1 | CPU |
--fold-permutation-block | Objects in the dataset are grouped in blocks before the random permutations. This parameter defines the size of the blocks. The smaller is the value, the slower is the training. Large values may result in quality degradation. | Default value differs depending on the dataset size and ranges from 1 to 256 inclusively | CPU and GPU |
--leaf-estimation-iterations | The number of gradient steps when calculating the values in leaves. | Depends on the training objective | CPU and GPU |
--leaf-estimation-method | The method used to calculate the values in leaves. Possible values:
| Depends on the mode:
| CPU and GPU |
--name | The experiment name to display in visualization tools. | experiment | CPU and GPU |
--prediction-type | A comma-separated list of prediction types to output during training for the validation dataset. This information is output if a validation dataset is provided. Supported prediction types:
| RawFormulaVal | CPU |
--fold-len-multiplier | Coefficient for changing the length of folds. The value must be greater than 1. The best validation result is achieved with minimum values. With values close to 1 (for example, | 2 | CPU and GPU |
--approx-on-full-history | The principles for calculating the approximated values. Possible values:
| False | CPU |
--class-weights | Class weights. The values are used as multipliers for the object weights. This parameter can be used for solving classification and multiclassification problems. For imbalanced datasets with binary classification, the weight multiplier can be set to 1 for class 0 and to Tip.
Format:
For example:
| None (the weight for all classes is set to 1) | CPU and GPU |
--boosting-type | Boosting scheme. Possible values:
| Depends on the number of objects in the training dataset and the selected learning mode | CPU and GPU Only the Plain mode is supported for the MultiClass loss on GPU |
--allow-const-label | Use it to train models with datasets that have equal label values for all objects. | False | CPU and GPU |
Overfitting detection settings | |||
--od-type | The type of the overfitting detector to use. Possible values:
| IncToDec | CPU and GPU |
--od-pval | The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type. | 0 (the overfitting detection is turned off) | CPU and GPU |
--od-wait | The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type:
| 20 | CPU and GPU |
Binarization settings | |||
-x --border-count | The number of splits for numerical features. Allowed values are integers from 1 to 255 inclusively. | 254 (if training is performed on CPU) or 128 (if training is performed on GPU) | CPU and GPU |
--feature-border-type | The binarization mode for numerical features. Possible values:
| GreedyLogSum | CPU and GPU |
--output-borders-file | Save quantization borders for the current dataset to a file. Refer to the file format description. | The file is not saved | GPU |
--input-borders-file | Load custom quantization borders and nanModes from a file (do not generate them). Borders are automatically generated before training if this parameter is not set. Refer to the file format description. | The results are not loaded | GPU |
Multiclassification settings | |||
--classes-count | The upper limit for the numeric class label. Defines the number of classes for multiclassification. Only non-negative integers can be specified. The given integer should be greater than any of the label values. If this parameter is specified and the --class-names is not the labels for all classes in the input dataset should be smaller than the given value. |
| CPU and GPU |
--class-names | Classes names. Allows to redefine the default values when using the MultiClass and Logloss metrics. If the upper limit for the numeric class label is specified, the number of classes names should match this value. Attention. The quantity of classes names must match the quantity of classes weights specified in the --class-weights parameter and the number of classes specified in the --classes-count parameter. Format:
For example:
| The classes names are integers from 0 to classes_count – 1 | CPU and GPU |
Performance settings | |||
-T --thread-count | The number of threads to use during training.
| The number of processor cores | CPU and GPU |
--used-ram-limit | Attempt to limit the amount of used CPU RAM. Restriction.
Format:
Supported measures of information (non case-sensitive):
For example:
| None (memory usage is no limited) | CPU |
--gpu-ram-part | How much of the GPU RAM to use for training. | 0.95 | GPU |
--pinned-memory-size | How much pinned (page-locked) CPU RAM to use per GPU. | 1073741824 | GPU |
--gpu-cat-features-storage | The method for storing the categorical features' values. Possible values:
Tip. Use the CpuPinnedMemory value if feature combinations are used and the available GPU RAM is not sufficient. | GpuRam | GPU |
--data-partition | The method for splitting the input dataset between multiple workers. Possible values:
| Depends on the learning mode and the input dataset | GPU |
Processing unit settings | |||
--task-type | The processing unit type to use for training. Possible values:
| CPU | CPU and GPU |
--devices | IDs of the GPU devices to use for training (indices are zero-based). Format
| -1 (use all devices) | GPU |
Output settings | |||
--logging-level | The logging level to output to stdout. Possible values:
| Verbose | CPU and GPU |
--metric-period | The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. Note. It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used. | 1 | CPU and GPU |
--verbose | The frequency of iterations to print the information to stdout. The value of this parameter should be divisible by the value of the frequency of iterations to calculate the values of objectives and metrics. Restriction. Do not use this parameter with the --logging-level parameter. | 1 | CPU and GPU |
--train-dir | The directory for storing the files generated during training. | Current directory | CPU and GPU |
--model-size-reg | The model size regularization coefficient. The larger the value, the smaller the model size. Possible values are in the range Large values reduce the number of feature combinations in the model. Note that the resulting quality of the model can be affected. Set the value to 0 to turn off the model size optimization option. | 0.5 | CPU |
--snapshot-file | Settings for recovering training after an interruption. Depending on whether the specified file exists in the file system:
| File can't be generated or read. If the value is omitted, the file name is experiment.cbsnapshot. | CPU and GPU |
-m --model-file | The name of the resulting files with the model description. Used for solving other machine learning problems (for instance, applying a model) or defining the names of models in different output formats. Corresponding file extensions are added to the given value if several output formats are defined in the --model-format parameter. | model.* (model.bin if the model is output in Catboost format only) | CPU and GPU |
--model-format | A comma-separated list of output model formats. Possible values:
| CatboostBinary | CPU and GPU |
--fstr-file | The name of the resulting file that contains regular feature importance data (see Feature importance). | The file is not generated | CPU |
--fstr-internal-file | The name of the resulting file that contains internal feature importance data (see Feature importance). | The file is not generated | CPU |
--eval-file | The name of the resulting file that contains the model values on the validation datasets. The format of the output file depends on the problem being solved and the number of input validation datasets. | Save the file to the current directory. The name of the file differs depending on the machine learning problem being solved and the selected metric. The file extensions is eval. | CPU and GPU |
--json-log | The name of the resulting file that contains metric values and time information. | catboost_training.json | CPU and GPU |
--detailed-profile | Generate a file that contains profiler information. | The file is not generated | CPU and GPU |
--profiler-log | The name of the resulting file that contains profiler information. | catboost_profile.log | CPU and GPU |
--learn-err-log | The name of the resulting file that contains the metric value for the training dataset. | learn_error.tsv | CPU and GPU |
--test-err-log | The name of the resulting file that contains the metric value for the validation dataset. | test_error.tsv | CPU and GPU |
CTR settings | |||
--simple-ctr | Binarization settings for simple categorical features. Format: Components:
| CPU and GPU | |
--combinations-ctr | Binarization settings for combinations of categorical features. Format: Components:
| CPU and GPU | |
--per-feature-ctr | Per-feature binarization settings for categorical features. Format: Components:
| CPU and GPU | |
--ctr-target-border-count | The maximum number of borders to use in target binarization for categorical features that need it. Allowed values are integers from 1 to 255 inclusively. The value of the
| Number_of_classes - 1 for Multiclassification problems when training on CPU, 1 otherwise | CPU and GPU |
--counter-calc-method | The method for calculating the Counter CTR type. Possible values:
| Full | CPU and GPU |
--max-ctr-complexity | The maximum number of categorical features that can be combined. | 4 | CPU and GPU |
--ctr-leaf-count-limit | The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded. The leaves to be discarded are selected as follows:
This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected. | The number of leafs with categorical features is not limited | CPU |
--store-all-simple-ctr | Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion. Use this parameter with --ctr_leaf_count_limit only. | Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features | CPU |
--final-ctr-computation-mode | Final CTR computation mode. Possible values:
| Default | CPU and GPU |
Option | Description | Default value | Supported processing units |
---|---|---|---|
Input file settings | |||
-f --learn-set | The path to the input file that contains the dataset description. | Required parameter (the path must be specified). | CPU and GPU |
-t --test-set | A comma-separated list of input files that contain the validation dataset description (the format must be the same as used in the training dataset). | Omitted. If this parameter is omitted, the validation dataset isn't used. | CPU and GPU Restriction. Only a single validation dataset can be input if the training is performed on GPU (--task-type is set to GPU) |
--cd --column-description | The path to the input file that contains the column descriptions. | If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features. | CPU and GPU |
--learn-pairs | The path to the input file that contains the pair descriptions. This information is used for calculation and optimization of Pairwise metrics. | Required parameter for the groupwise metrics (the path must be specified) | CPU and GPU |
--test-pairs | The path to the input file that contains the description of test pairs (the format must be the same as used for describing the training pairs). This information is used for calculation and optimization of Pairwise metrics. | Omitted (the test dataset is not used) | CPU and GPU |
--learn-group-weights | The path to the input file that contains the weights of groups. The dataset must contain the GroupId column in order to apply the file with the group weights. The weights from this file take precedence if they are also specified in the Dataset description file. | Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset) | CPU and GPU |
--test-group-weights | The path to the input file that contains the weights of groups for the validation dataset. The dataset must contain the GroupId column in order to apply the file with the group weights. The weights from this file take precedence if they are also specified in the Dataset description file. | Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset) | CPU and GPU |
--delimiter | The delimiter character used to separate the data in the dataset description input file. Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used. | The input data is assumed to be tab-separated | CPU and GPU |
--has-header | Read the column names from the first line if this parameter is set to | False | CPU and GPU |
--params-file | The path to the input JSON file that contains the training parameters, for example: Names of training parameters are the same as for the Python package or the R package. If a parameter is specified in both the JSON file and the corresponding command-line parameter, the command-line value is used. | Omitted | CPU and GPU |
--nan-mode | The method for processing missing values in the input dataset. Possible values:
Using the Min or Max value of this parameter guarantees that a split between missing values and other values is considered when selecting a new split in the tree. Note. The method for processing missing values can be set individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the ones specified in this parameter. | Min | CPU and GPU |
Training parameters | |||
--loss-function | The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
Supported metrics: For example, use the following construction to calculate the value of Quantile with the coefficient
| RMSE | CPU and GPU |
--custom-metric | Metric values to output during training. These functions are not optimized and are displayed for informational purposes only. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).. Format:
Supported metrics: Examples:
Values of all custom metrics for learn and validation datasets are saved to the Metric output files (learn_error.tsv and test_error.tsv respectively). The directory for these files is specified in the --train-dir (train_dir) parameter. | None (do not output additional metric values) | CPU |
--eval-metric | The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
Supported metrics: Examples:
| Optimized objective is used | CPU |
-i --iterations | The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter. | 1000 | CPU and GPU |
-w --learning-rate | The learning rate. Used for reducing the gradient step. | The default value is defined automatically for binary classification based on the dataset properties and the number of iterations if none of these parameters is set. In this case, the selected learning rate is printed to stdout and saved in the model. In other cases, the default value is 0.03. | CPU and GPU |
-r --random-seed | The random seed used for training. | 0 | CPU and GPU |
--l2-leaf-reg l2-leaf-regularizer | L2 regularization coefficient. Used for leaf value calculation. Any positive values are allowed. | 3 | CPU and GPU |
--bootstrap-type | Bootstrap type. Defines the method for sampling the weights of objects. Supported methods:
| Bayesian | CPU and GPU |
--bagging-temperature | Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range | 1 | CPU and GPU |
--subsample | Sample rate for bagging. This parameter can be used if one of the following bootstrap types is defined:
| 0.66 | CPU and GPU |
--sampling-frequency | Frequency to sample weights and objects when building trees. Supported values:
| PerTreeLevel | CPU and GPU |
--random-strength | The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. Note. This parameter is not supported for the following loss functions:
| 1 | CPU and GPU |
--use-best-model | If this parameter is set, the number of trees that are saved in the resulting model is defined as follows:
No trees are saved after this iteration. This option requires a validation dataset to be provided. | True if a validation set is input (the -t or the --test-set parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise. | CPU and GPU |
--best-model-min-trees | The minimal number of trees that the best model should have. If set, the output model contains at least the given number of trees even if the best model is located within these trees. Should be used with the --use-best-model parameter. | The minimal number of trees for the best model is not set | CPU and GPU |
-n --depth | Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function:
| 6 | CPU and GPU |
-I --ignore-features | Indices of features to exclude from training. The non-negative indices that do not match any features are successfully ignored. For example, if five features are defined for the objects in the dataset and this parameter is set to “42”, the corresponding non-existing feature is successfully ignored. The identifier corresponds to the feature's index. Feature indices used in train and feature importance are numbered from 0 to featureCount – 1. If a file is used as input data then any non-feature column types are ignored when calculating these indices. For example, each row in the input file contains data in the following order: Supported operators:
For example, if training should exclude features with the identifiers 1, 2, 7, 42, 43, 44, 45, use the following construction:
| None (use all features) | CPU and GPU |
--one-hot-max-size | Use one-hot encoding for all features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features. | 2 | CPU and GPU |
--has-time | Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data. | False (not used; generates random permutations) | CPU and GPU |
--rsm | Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. The value must be in the range (0;1]. | 1 | CPU |
--fold-permutation-block | Objects in the dataset are grouped in blocks before the random permutations. This parameter defines the size of the blocks. The smaller is the value, the slower is the training. Large values may result in quality degradation. | Default value differs depending on the dataset size and ranges from 1 to 256 inclusively | CPU and GPU |
--leaf-estimation-iterations | The number of gradient steps when calculating the values in leaves. | Depends on the training objective | CPU and GPU |
--leaf-estimation-method | The method used to calculate the values in leaves. Possible values:
| Depends on the mode:
| CPU and GPU |
--name | The experiment name to display in visualization tools. | experiment | CPU and GPU |
--prediction-type | A comma-separated list of prediction types to output during training for the validation dataset. This information is output if a validation dataset is provided. Supported prediction types:
| RawFormulaVal | CPU |
--fold-len-multiplier | Coefficient for changing the length of folds. The value must be greater than 1. The best validation result is achieved with minimum values. With values close to 1 (for example, | 2 | CPU and GPU |
--approx-on-full-history | The principles for calculating the approximated values. Possible values:
| False | CPU |
--class-weights | Class weights. The values are used as multipliers for the object weights. This parameter can be used for solving classification and multiclassification problems. For imbalanced datasets with binary classification, the weight multiplier can be set to 1 for class 0 and to Tip.
Format:
For example:
| None (the weight for all classes is set to 1) | CPU and GPU |
--boosting-type | Boosting scheme. Possible values:
| Depends on the number of objects in the training dataset and the selected learning mode | CPU and GPU Only the Plain mode is supported for the MultiClass loss on GPU |
--allow-const-label | Use it to train models with datasets that have equal label values for all objects. | False | CPU and GPU |
Overfitting detection settings | |||
--od-type | The type of the overfitting detector to use. Possible values:
| IncToDec | CPU and GPU |
--od-pval | The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type. | 0 (the overfitting detection is turned off) | CPU and GPU |
--od-wait | The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type:
| 20 | CPU and GPU |
Binarization settings | |||
-x --border-count | The number of splits for numerical features. Allowed values are integers from 1 to 255 inclusively. | 254 (if training is performed on CPU) or 128 (if training is performed on GPU) | CPU and GPU |
--feature-border-type | The binarization mode for numerical features. Possible values:
| GreedyLogSum | CPU and GPU |
--output-borders-file | Save quantization borders for the current dataset to a file. Refer to the file format description. | The file is not saved | GPU |
--input-borders-file | Load custom quantization borders and nanModes from a file (do not generate them). Borders are automatically generated before training if this parameter is not set. Refer to the file format description. | The results are not loaded | GPU |
Multiclassification settings | |||
--classes-count | The upper limit for the numeric class label. Defines the number of classes for multiclassification. Only non-negative integers can be specified. The given integer should be greater than any of the label values. If this parameter is specified and the --class-names is not the labels for all classes in the input dataset should be smaller than the given value. |
| CPU and GPU |
--class-names | Classes names. Allows to redefine the default values when using the MultiClass and Logloss metrics. If the upper limit for the numeric class label is specified, the number of classes names should match this value. Attention. The quantity of classes names must match the quantity of classes weights specified in the --class-weights parameter and the number of classes specified in the --classes-count parameter. Format:
For example:
| The classes names are integers from 0 to classes_count – 1 | CPU and GPU |
Performance settings | |||
-T --thread-count | The number of threads to use during training.
| The number of processor cores | CPU and GPU |
--used-ram-limit | Attempt to limit the amount of used CPU RAM. Restriction.
Format:
Supported measures of information (non case-sensitive):
For example:
| None (memory usage is no limited) | CPU |
--gpu-ram-part | How much of the GPU RAM to use for training. | 0.95 | GPU |
--pinned-memory-size | How much pinned (page-locked) CPU RAM to use per GPU. | 1073741824 | GPU |
--gpu-cat-features-storage | The method for storing the categorical features' values. Possible values:
Tip. Use the CpuPinnedMemory value if feature combinations are used and the available GPU RAM is not sufficient. | GpuRam | GPU |
--data-partition | The method for splitting the input dataset between multiple workers. Possible values:
| Depends on the learning mode and the input dataset | GPU |
Processing unit settings | |||
--task-type | The processing unit type to use for training. Possible values:
| CPU | CPU and GPU |
--devices | IDs of the GPU devices to use for training (indices are zero-based). Format
| -1 (use all devices) | GPU |
Output settings | |||
--logging-level | The logging level to output to stdout. Possible values:
| Verbose | CPU and GPU |
--metric-period | The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. Note. It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used. | 1 | CPU and GPU |
--verbose | The frequency of iterations to print the information to stdout. The value of this parameter should be divisible by the value of the frequency of iterations to calculate the values of objectives and metrics. Restriction. Do not use this parameter with the --logging-level parameter. | 1 | CPU and GPU |
--train-dir | The directory for storing the files generated during training. | Current directory | CPU and GPU |
--model-size-reg | The model size regularization coefficient. The larger the value, the smaller the model size. Possible values are in the range Large values reduce the number of feature combinations in the model. Note that the resulting quality of the model can be affected. Set the value to 0 to turn off the model size optimization option. | 0.5 | CPU |
--snapshot-file | Settings for recovering training after an interruption. Depending on whether the specified file exists in the file system:
| File can't be generated or read. If the value is omitted, the file name is experiment.cbsnapshot. | CPU and GPU |
-m --model-file | The name of the resulting files with the model description. Used for solving other machine learning problems (for instance, applying a model) or defining the names of models in different output formats. Corresponding file extensions are added to the given value if several output formats are defined in the --model-format parameter. | model.* (model.bin if the model is output in Catboost format only) | CPU and GPU |
--model-format | A comma-separated list of output model formats. Possible values:
| CatboostBinary | CPU and GPU |
--fstr-file | The name of the resulting file that contains regular feature importance data (see Feature importance). | The file is not generated | CPU |
--fstr-internal-file | The name of the resulting file that contains internal feature importance data (see Feature importance). | The file is not generated | CPU |
--eval-file | The name of the resulting file that contains the model values on the validation datasets. The format of the output file depends on the problem being solved and the number of input validation datasets. | Save the file to the current directory. The name of the file differs depending on the machine learning problem being solved and the selected metric. The file extensions is eval. | CPU and GPU |
--json-log | The name of the resulting file that contains metric values and time information. | catboost_training.json | CPU and GPU |
--detailed-profile | Generate a file that contains profiler information. | The file is not generated | CPU and GPU |
--profiler-log | The name of the resulting file that contains profiler information. | catboost_profile.log | CPU and GPU |
--learn-err-log | The name of the resulting file that contains the metric value for the training dataset. | learn_error.tsv | CPU and GPU |
--test-err-log | The name of the resulting file that contains the metric value for the validation dataset. | test_error.tsv | CPU and GPU |
CTR settings | |||
--simple-ctr | Binarization settings for simple categorical features. Format: Components:
| CPU and GPU | |
--combinations-ctr | Binarization settings for combinations of categorical features. Format: Components:
| CPU and GPU | |
--per-feature-ctr | Per-feature binarization settings for categorical features. Format: Components:
| CPU and GPU | |
--ctr-target-border-count | The maximum number of borders to use in target binarization for categorical features that need it. Allowed values are integers from 1 to 255 inclusively. The value of the
| Number_of_classes - 1 for Multiclassification problems when training on CPU, 1 otherwise | CPU and GPU |
--counter-calc-method | The method for calculating the Counter CTR type. Possible values:
| Full | CPU and GPU |
--max-ctr-complexity | The maximum number of categorical features that can be combined. | 4 | CPU and GPU |
--ctr-leaf-count-limit | The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded. The leaves to be discarded are selected as follows:
This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected. | The number of leafs with categorical features is not limited | CPU |
--store-all-simple-ctr | Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion. Use this parameter with --ctr_leaf_count_limit only. | Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features | CPU |
--final-ctr-computation-mode | Final CTR computation mode. Possible values:
| Default | CPU and GPU |
Usage examples
Train a classification model with 100 trees on a comma-separated pool with header:
catboost fit --learn-set train.csv --test-set test.csv --column-description train.cd --loss-function RMSE --iterations 100 --delimiter=',' --has-header
catboost fit --learn-set ../pytest/data/adult/train_small --column-description ../pytest/data/adult/train.cd --task-type GPU