Connecting the Logs API to ClickHouse

Attention.

Access tokens will not be accepted in URL parameters starting February 13, 2019. To continue working with the Yandex.Metrica API, set up authorization by passing the token in the HTTP header.

The outdated authorization method will be temporarily disabled on January 23, January 30, and February 6 for maintenance. Authorization using URL parameters will be unavailable on these dates.

ClickHouse allows you to work with non-aggregated statistical data from Yandex.Metrica that you receive via the Logs API. To connect the Logs API to ClickHouse:

  1. Download the Python integration script. You can use the git clone command to do this.

    git clone https://github.com/yndx-metrika/logs_api_integration.git
  2. Make changes to the config file (config.json) that is located in the configs directory:

    {
        "token" : "<your_token>", // access token for the Yandex.Metrica API
        "counter_id": "<your_counter_id>", // counter number
        "visits_fields": [ // list of session parameters
            "ym:s:counterID",
            "ym:s:dateTime",
            "ym:s:date",
            "ym:s:firstPartyCookie"
        ],
        "hits_fields": [ // list of hit parameters
            "ym:pv:counterID",
            "ym:pv:dateTime",
            "ym:pv:date",
            "ym:pv:firstPartyCookie"
        ],
        "log_level": "INFO", // logging level
        "retries": 1, // number of attempts to restart the script after an error
        "retries_delay": 60, // interval between attempts
        "clickhouse": {
            "host": "http://localhost:8123", // address of a running instance of ClickHouse
            "user": "", // username for accessing the database
            "password": "", // password for accessing the database
            "visits_table": "visits_all", // name of the table for storing sessions
            "hits_table": "hits_all", // name of the table for storing hits
            "database": "default" // name of the database for tables
        }
    }
  3. Start the script. When you run the script, you must use the -source option to specify the data source (pageviews or sessions). The script has several modes available:

    • history — Loads all data from the date when the Yandex.Metrica counter was created until the day before yesterday.
    • regular— Loads data for the day before yesterday (we recommend this mode for regular downloads).
    • regular_early — Loads data for yesterday.

    Example of running the program:

    
    python metrica_logs_api.py -mode history -source visits

    In addition, you can get data for a specific time period:

    
    python metrica_logs_api.py -source hits -start_date 2016-10-10 -end_date 2016-10-18