select_threshold
Return the probability boundary required to achieve the specified false positive or false negative rate.
Method call format
select_threshold(model=None,
data=None,
curve=None,
FPR=None,
FNR=None,
thread_count=-1)
Parameters
Parameter | Possible types | Description | Default value |
---|---|---|---|
model | catboost.CatBoost | The trained model. | None |
data |
| A set of samples to build the ROC curve with. Should not be used with the curve parameter. | None |
curve | tuple of three arrays (fpr, tpr, thresholds) | ROC curve points. Should not be used with the data parameter. Required if the data and model parameters are set to None. It is strictly recommended to use the output of the get_roc_curve function as the value of this parameter.
The input data must meet the following criteria: | None |
FPR | float | Return the boundary at which the given FPR value is reached. Possible values of the parameter are in the range [0; 1]. Should not be used with the FNR parameter. | None. In this case the conditions for measuring the boundary depend on the value of the FNR parameter:
|
FNR | float | Return the boundary at which the given FNR value is reached. Possible values of the parameter are in the range [0; 1]. Should not be used with the FPR parameter. | None. In this case the conditions for measuring the boundary depend on the value of the FPR parameter:
|
thread_count | int | The number of threads to use. Optimizes the speed of execution. This parameter doesn't affect results. | -1 (the number of threads is equal to the number of processor cores) |
Parameter | Possible types | Description | Default value |
---|---|---|---|
model | catboost.CatBoost | The trained model. | None |
data |
| A set of samples to build the ROC curve with. Should not be used with the curve parameter. | None |
curve | tuple of three arrays (fpr, tpr, thresholds) | ROC curve points. Should not be used with the data parameter. Required if the data and model parameters are set to None. It is strictly recommended to use the output of the get_roc_curve function as the value of this parameter.
The input data must meet the following criteria: | None |
FPR | float | Return the boundary at which the given FPR value is reached. Possible values of the parameter are in the range [0; 1]. Should not be used with the FNR parameter. | None. In this case the conditions for measuring the boundary depend on the value of the FNR parameter:
|
FNR | float | Return the boundary at which the given FNR value is reached. Possible values of the parameter are in the range [0; 1]. Should not be used with the FPR parameter. | None. In this case the conditions for measuring the boundary depend on the value of the FPR parameter:
|
thread_count | int | The number of threads to use. Optimizes the speed of execution. This parameter doesn't affect results. | -1 (the number of threads is equal to the number of processor cores) |
Type of return value
float
Usage examples
from catboost import CatBoostClassifier, Pool
from catboost.utils import get_roc_curve, select_threshold
train_data = [[1,4],
[2,5],
[4,3],
[0,4]]
train_labels = [1,1,0,1]
catboost_pool = Pool(train_data, train_labels)
model = CatBoostClassifier(learning_rate=0.03)
model.fit(train_data, train_labels, verbose=False)
roc_curve_values = get_roc_curve(model, catboost_pool)
boundary = select_threshold(model,
curve=roc_curve_values,
FPR=0.01)
print(boundary)
0.506369291052