select_threshold

Return the probability boundary required to achieve the specified false positive or false negative rate.

Method call format

select_threshold(model=None, 
                 data=None, 
                 curve=None, 
                 FPR=None,
                 FNR=None,
                 thread_count=-1)

Parameters

ParameterPossible typesDescriptionDefault value
modelcatboost.CatBoostThe trained model.None
data
  • catboost.Pool
  • list of catboost.Pool

A set of samples to build the ROC curve with.

Should not be used with the curve parameter.

None
curvetuple of three arrays (fpr, tpr, thresholds)

ROC curve points.

Should not be used with the data parameter.

Required if the data and model parameters are set to None.

It is strictly recommended to use the output of the get_roc_curve function as the value of this parameter.

The input data must meet the following criteria:
  • The threshold values should not increase.
  • There should not be any repetitions of the fpr-tpr- threshold triplets.
None
FPRfloat

Return the boundary at which the given FPR value is reached. Possible values of the parameter are in the range [0; 1].

Should not be used with the FNR parameter.

None.

In this case the conditions for measuring the boundary depend on the value of the FNR parameter:

  • None — The boundary should satisfy the FNR=FPR expression
  • float in the [0; 1] range — The boundary should satisfy the given FNR value
FNRfloat

Return the boundary at which the given FNR value is reached. Possible values of the parameter are in the range [0; 1].

Should not be used with the FPR parameter.

None.

In this case the conditions for measuring the boundary depend on the value of the FPR parameter:

  • None — The boundary should satisfy the FNR=FPR expression
  • float in the [0; 1] range — The boundary should satisfy the given FPR value
thread_countint

The number of threads to use.

Optimizes the speed of execution. This parameter doesn't affect results.

-1 (the number of threads is equal to the number of cores)

Type of return value

float

Usage examples

from catboost import CatBoostClassifier, Pool
from catboost.utils import get_roc_curve, select_threshold

train_data = [[1,4],
              [2,5],
              [4,3],
              [0,4]]
train_labels = [1,1,0,1]
catboost_pool = Pool(train_data, train_labels)

model = CatBoostClassifier(learning_rate=0.03)
model.fit(train_data, train_labels, verbose=False)
roc_curve_values = get_roc_curve(model, catboost_pool)

boundary = select_threshold(model, 
                            curve=roc_curve_values,  
                            FPR=0.01)
print boundary 
Output:
0.506369291052