Dataset description

For each object:
  • A list of features.
  • The target (optional).
  • Other types of data.

Feature indices used in training and feature importance are numbered from 0 to featureCount – 1. Any non-feature column types are ignored when calculating these indices.

  • List each object on a new line.
  • All objects in the dataset must be grouped by group identifiers if they are present. I.e., the objects with the same group identifier should follow each other in the dataset.

  • If the group weight is specified, it must be the same for all objects in one group.
  • Use any single char delimiters to separate data about a single object. The required delimiter can be specified in the training parameters. Tabs are used as the default separator.
  • Use the feature types that are specified in the column descriptions.
  • List features in the same order for all the objects.
  • Feature numbering starts from zero.

The dataset consists of 6 columns.

The first column (indexed 0) contains label values.

The label (target) takes binary values:
  • “0” stands for the absence of precipitation
  • “1” stands for the presence of precipitation

Columns indexed 1, 2, 3 and 5 contain features.

The column indexed 4 contains arbitrary data.

The file with the column descriptions with tab-separated data looks like this:
3<\t>Categ<\t>wind direction

The feature indexed 3 is categorical, so the value in the second column is set to Categ. The name of this feature is set to “wind direction” in the third column of the description file.

Other features are numerical and are omitted from the column descriptions file.

The dataset description looks like this:
1<\t>–10<\t>5<\t>north<\t>Memphis TN<\t>753
0<\t>30<\t>1<\t>south<\t>Los Angeles CA<\t>760
0<\t>40<\t>0.1<\t>south<\t>Las Vegas NV<\t>705