Python library installation

Note. Installation is only supported by the 64-bit version of Python.
Dependencies:
To install the Python package:
  1. Choose an installation method:

  2. (Optionally) Install additional packages for data visualization support.

  3. (Optionally) Test CatBoost

pip install

Run the following command:

pip install catboost
Note.

Only CUDA 8.0 is supported. Build the binary from source if GPU support is required and the installed version of CUDA differs from 8.0.

Build from source

The following packages are required for installation:

  • python2.7
  • python2.7-dev

The system compiler must be compatible with CUDA Toolkit if GPU support is required. For example, gcc-6 and clang-39 are compatible with CUDA Toolkit 9.* while gcc-7 is not. The required steps to change the system compiler depend on the OS.

To build the Python package from source:
  1. Install the libc header files on macOS and Linux:
    • macOS: xcode-select --install
    • Linux: install the appropriate package (for example, libc6-dev on Ubuntu)
  2. Clone the repository:

    git clone https://github.com/catboost/catboost.git
  3. Open the catboost/catboost/python-package/catboost catalog from the local copy of the CatBoost repository.
  4. Compile the library:
    ../../../ya make -r -DUSE_ARCADIA_PYTHON=no -DPYTHON_CONFIG=<depends on the Python version> [-DCUDA_ROOT=<path to CUDA>] [-DHAVE_CUDA=no] [-o <output catalog>]
    • -DPYTHON_CONFIG should be set to:
      • python2-config for Python 2
      • python3-config for Python 3
    • -DCUDA_ROOT is the path to CUDA. This is an optional parameter required to support training on GPU.
    • -DHAVE_CUDA=no disables the CUDA support. This speeds up the compilation.

      By default, the package is built with CUDA support if CUDA Toolkit is installed.

    • -o defines the catalog to output the compiled library. By default, the current catalog is used.
    For example, the following command builds the package for Python 3 with training on GPU support:
    ../../../ya make -r -DUSE_ARCADIA_PYTHON=no -DPYTHON_CONFIG=python3-config -DCUDA_ROOT=/usr/local/cuda
    Note.
    • To build on Windows it is necessary to explicitly specify the PYTHON_INCLUDE and PYTHON_LIBRARIES variables:
      ../../../ya make -r -DUSE_ARCADIA_PYTHON=no -DPYTHON_INCLUDE="/I C:/Python27/include/" -DPYTHON_LIBRARIES="C:/Python27/libs/python27.lib"
    • The required version of Xcode for building on macOS is specified on the NVIDIA site when downloading the CUDA toolkit.
  5. Add the current catalog to PYTHONPATH to use the built module on macOS or Linux:
    cd ../; export PYTHONPATH=$PYTHONPATH:$(pwd)

Build a wheel package

To build a self-contained Python Wheel run the catboost/catboost/python-package/mk_wheel.py script.

Optional parameters:
  • -DCUDA_ROOT is the path to CUDA. This is an optional parameter required to support training on GPU.

For example, to build and install a wheel on Windows for Anaconda with training on GPU support run:

python.exe mk_wheel.py -DPYTHON_INCLUDE="/I C:\Anaconda2\include" -DPYTHON_LIBRARIES="C:\Anaconda2\libs\python27.lib" -DCUDA_ROOT="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0"
C:\Anaconda2\Scripts\pip.exe install catboost-0.1.0.6-cp27-none-win_amd64.whl

Additional packages for data visualization support

Execute the following steps to support the data visualization feature in Jupyter Notebook:
  1. Install the ipywidgets package:

    pip install ipywidgets
  2. Turn on the widgets extension:

    jupyter nbextension enable --py widgetsnbextension

Test CatBoost

Use the following example to test CatBoost:

import numpy 
from catboost import CatBoostRegressor

dataset = numpy.array([[1,4,5,6],[4,5,6,7],[30,40,50,60],[20,15,85,60]])
train_labels = [1.2,3.4,9.5,24.5]
model = CatBoostRegressor(learning_rate=1, depth=6, loss_function='RMSE')
fit_model = model.fit(dataset, train_labels)

print fit_model.get_params()