Cross-validation

Purpose

Training can be launched in cross-validation mode. In this case only the training dataset is required (the validation dataset should be omitted).

Execution format

catboost fit -f <file path> [-X <value>|-Y <value>] [--cv-rand <value>] [other parameters]

Options

OptionDescriptionDefault value
-f

The path to the dataset to cross-validate.

Required parameter (the path must be specified).

-X

Perform cross validation to score the model by excluding a small fold from the training dataset and using it for testing.

Format for specifying values:

<n>/<k>
  •  is the identifier of the fold to exclude from training and use for testing (numbering starts from zero).
  • is the number of folds to split the input data into.

The inequality must be true.

For example, to exclude the fold indexed 13 from the training dataset and use it for testing, set the value 13/15.

The data is randomly shuffled before splitting.

Cross validation is not performed

-Y

Perform cross validation to score the model by excluding a small fold from the validation dataset and using it for training.

Format for specifying values:

<n>/<k>
  •  is the identifier of the fold to exclude from testing and use for training (numbering starts from zero).
  • is the number of folds to split the input data into.

The inequality must be true.

For example, to exclude the fold indexed 13 from the training dataset and use it for training, set the value 13/15.

The data is randomly shuffled before splitting.

Cross validation is not performed

--cv-rand

Use this as the seed value for random permutation of the data.

Permutation is performed before splitting the data for cross validation.

Each seed generates unique data splits.

It must be used with -X <value> or -Y <value> parameters.

0
other parametersAny combination of the training parameters.See the full list of default values in the Train a model section.