create_cd

Generate the column descriptions file with the given structure.

Method call format

create_cd(label=None,
          cat_features=None, 
          weight=None,
          baseline=None,
          doc_id=None,
          group_id=None, 
          subgroup_id=None, 
          timestamp=None,
          auxiliary_columns=None,
          feature_names=None,
          output_path='train.cd')

Parameters

ParameterPossible typesDescriptionDefault value
labelint

A zero-based index of the column that defines the target variable (in other words, the object's label value).

None
cat_features
  • int
  • list of int

Zero-based indices of columns that define categorical features.

None
weightint

A zero-based index of the column that defines the object's weight.

None
baselineint

A zero-based index of the column that defines the initial formula values for all input objects.

None
doc_idint

A zero-based index of the column that defines the alphanumeric ID of the object.

None
group_idint

A zero-based index of the column that defines the identifier of the object's group.

None
subgroup_idint

A zero-based index of the column that defines the identifier of the object's subgroup.

None
timestampint

A zero-based index of the column that defines the timestamp of the object.

None
auxiliary_columns
  • int
  • list of int

Zero-based indices of columns that define arbitrary data.

None
feature_namesdict

A dictionary with the list of column indices and the corresponding feature names.

For example, use the feature_names dictionary to set the names of features in the columns indexed as 4, 5 and 12:
feature_names = {
    4: 'Categ1',
    5: 'Categ2',
    12: 'Num1'
}
None
output_pathstring

The path to the output file with column descriptions.

train.cd
Note. A parameter for creating columns of the Num type is not provided, because columns that contain numerical features don't require descriptions.

Usage examples

from catboost.utils import create_cd
feature_names = {
    4: 'Categ1',
    5: 'Categ2',
    12: 'Num1'
}

create_cd(
    label=0,
    cat_features=(4, 5, 6),
    weight=1,
    baseline=2,
    doc_id=3,
    group_id=7,
    subgroup_id=8,
    timestamp=9,
    auxiliary_columns=(10, 11),
    feature_names=feature_names,
    output_path='train.cd'
)