fit
Train a model.
Method call format
fit(X,
y=None,
cat_features=None,
pairs=None,
sample_weight=None,
group_id=None,
group_weight=None,
subgroup_id=None,
pairs_weight=None
baseline=None,
use_best_model=None,
eval_set=None,
verbose=None,
logging_level=None,
plot=False,
column_description=None,
verbose_eval=None,
metric_period=None,
silent=None,
early_stopping_rounds=None
save_snapshot=None,
snapshot_file=None,
snapshot_interval=None)
Parameters
Some parameters duplicate the ones specified in the constructor of the CatBoost class. In these cases the values specified for the fit method take precedence. The rest of the training parameters must be set in the constructor of the CatBoost class.
Parameter | Possible types | Description | Default value | Supported processing units |
---|---|---|---|---|
X | catboost.Pool | The input training dataset in the form of a pool object. | Required parameter | CPU and GPU |
| The input training dataset in the form of a two-dimensional feature matrix. | |||
y |
| The target variables (in other words, the objects' label values) for the training dataset. Must be in the form of a one-dimensional array. The type of data in the array depends on the machine learning task being solved:
Note. Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool. | None | CPU and GPU |
cat_features |
| A one-dimensional array of categorical columns indices. Categorical features of the catboost.Pool object must be equal to those of the model if a catboost.Pool object is used for training. Note. Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool. | None (all features are considered numerical) | CPU and GPU |
pairs |
| The pairs description in the form of a two-dimensional matrix of shape
This information is used for calculation and optimization of Pairwise metrics. | None | CPU |
sample_weight |
| The weight of each object in the input data in the form of a one-dimensional array-like data. By default, it is set to 1 for all objects. | None | CPU and GPU |
group_id |
| Group identifiers for all input objects. Supported identifier types are:
| None | CPU |
group_weight |
| The weights of all objects within the defined groups from the input data in the form of one-dimensional array-like data. Used for calculating the final values of trees. By default, it is set to 1 for all objects in all groups. Restriction. Only one of the following parameters can be used at a time:
| None | CPU |
subgroup_id |
| Subgroup identifiers for all input objects. Supported identifier types are:
| None | CPU |
pairs_weight |
| The weight of each input pair of objects in the form of one-dimensional array-like pairs. The number of given values must match the number of specified pairs. This information is used for calculation and optimization of Pairwise metrics. By default, it is set to 1 for all pairs. | None | CPU |
baseline |
| Array of formula values for all input objects. The training starts from these values for all input objects instead of starting from zero. Note. Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool. | None | CPU and GPU |
use_best_model | bool | If this parameter is set, the number of trees that are saved in the resulting model is defined as follows:
No trees are saved after this iteration. This option requires a validation dataset to be provided. | True if a validation set is input (the eval_set parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise. | CPU |
eval_set |
| The validation dataset or datasets used for the following processes:
| None | CPU and GPU Note. Only a single validation dataset can be input if the training is performed on GPU |
verbose Alias: verbose_eval |
| The purpose of this parameter depends on the type of the given value:
Restriction. Do not use this parameter with the logging_level parameter. | 1 | CPU and GPU |
logging_level | string | The logging level to output to stdout. Possible values:
| None (corresponds to the Verbose logging level) | CPU and GPU |
plot | bool | Plot the following information during training:
| False | CPU |
column_description | string | The path to the input file that contains the column descriptions. The given file is used to build pools from the train and/or validation datasets, which are input from files. | None | CPU and GPU |
metric_period | int | The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. Note. It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used. | 1 | CPU and GPU |
silent | bool | Defines the logging level:
| False | CPU and GPU |
early_stopping_rounds | int | Set the overfitting detector type to Iter and stop the training after the specified number of iterations since the iteration with the optimal metric value. | False | CPU and GPU |
save_snapshot | bool | Enable snapshotting for restoring the training progress after an interruption. | None | CPU and GPU |
snapshot_file | string | The name of the file to save the training progress information in. This file is used for recovering training after an interruption. Depending on whether the specified file exists in the file system:
| CPU and GPU | |
snapshot_interval | int | The interval between saving snapshots in seconds. The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training. | 600 | CPU and GPU |
Parameter | Possible types | Description | Default value | Supported processing units |
---|---|---|---|---|
X | catboost.Pool | The input training dataset in the form of a pool object. | Required parameter | CPU and GPU |
| The input training dataset in the form of a two-dimensional feature matrix. | |||
y |
| The target variables (in other words, the objects' label values) for the training dataset. Must be in the form of a one-dimensional array. The type of data in the array depends on the machine learning task being solved:
Note. Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool. | None | CPU and GPU |
cat_features |
| A one-dimensional array of categorical columns indices. Categorical features of the catboost.Pool object must be equal to those of the model if a catboost.Pool object is used for training. Note. Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool. | None (all features are considered numerical) | CPU and GPU |
pairs |
| The pairs description in the form of a two-dimensional matrix of shape
This information is used for calculation and optimization of Pairwise metrics. | None | CPU |
sample_weight |
| The weight of each object in the input data in the form of a one-dimensional array-like data. By default, it is set to 1 for all objects. | None | CPU and GPU |
group_id |
| Group identifiers for all input objects. Supported identifier types are:
| None | CPU |
group_weight |
| The weights of all objects within the defined groups from the input data in the form of one-dimensional array-like data. Used for calculating the final values of trees. By default, it is set to 1 for all objects in all groups. Restriction. Only one of the following parameters can be used at a time:
| None | CPU |
subgroup_id |
| Subgroup identifiers for all input objects. Supported identifier types are:
| None | CPU |
pairs_weight |
| The weight of each input pair of objects in the form of one-dimensional array-like pairs. The number of given values must match the number of specified pairs. This information is used for calculation and optimization of Pairwise metrics. By default, it is set to 1 for all pairs. | None | CPU |
baseline |
| Array of formula values for all input objects. The training starts from these values for all input objects instead of starting from zero. Note. Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool. | None | CPU and GPU |
use_best_model | bool | If this parameter is set, the number of trees that are saved in the resulting model is defined as follows:
No trees are saved after this iteration. This option requires a validation dataset to be provided. | True if a validation set is input (the eval_set parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise. | CPU |
eval_set |
| The validation dataset or datasets used for the following processes:
| None | CPU and GPU Note. Only a single validation dataset can be input if the training is performed on GPU |
verbose Alias: verbose_eval |
| The purpose of this parameter depends on the type of the given value:
Restriction. Do not use this parameter with the logging_level parameter. | 1 | CPU and GPU |
logging_level | string | The logging level to output to stdout. Possible values:
| None (corresponds to the Verbose logging level) | CPU and GPU |
plot | bool | Plot the following information during training:
| False | CPU |
column_description | string | The path to the input file that contains the column descriptions. The given file is used to build pools from the train and/or validation datasets, which are input from files. | None | CPU and GPU |
metric_period | int | The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. Note. It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used. | 1 | CPU and GPU |
silent | bool | Defines the logging level:
| False | CPU and GPU |
early_stopping_rounds | int | Set the overfitting detector type to Iter and stop the training after the specified number of iterations since the iteration with the optimal metric value. | False | CPU and GPU |
save_snapshot | bool | Enable snapshotting for restoring the training progress after an interruption. | None | CPU and GPU |
snapshot_file | string | The name of the file to save the training progress information in. This file is used for recovering training after an interruption. Depending on whether the specified file exists in the file system:
| CPU and GPU | |
snapshot_interval | int | The interval between saving snapshots in seconds. The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training. | 600 | CPU and GPU |