Sampling

When a counter has collected a large amount of statistical data, Yandex Metrica is able to use just part of the data. For example, it can process 1/10 of all sessions (and then multiply the results by 10 where necessary).

The process of forming this data selection is called sampling. Sampling lets you adjust the balance between the speed of getting the results and their accuracy.

For example, as the result of sampling, a report might not contain data on very rarely visited URLs or uncommon keywords.

You can use the accuracy request parameter to manage sampling by setting the sample size to use for calculations.

This parameter can accept several values:

  • low — Returns a fast result based on a limited data sample.
  • medium — Returns the result based on a sample that combines speed and data accuracy.
  • high — Returns the most precise value by using the largest data sample. In this mode more time may be required to process your data request.
  • full — Returns all data.

This parameter can also take a numerical value from the interval (0,1]:

  • 1 — No sampling (corresponds to the full value).
  • 0.1 or 0.01 — The share of returned data(10%, 1%). Any value (for example, 0.42) will be rounded to the nearest degree of 10.

By default, the accuracy parameter is set to medium.

In returned results, the applied sampling is described using the following parameters:

  • sample_share: The share of data used for calculating the result (value from 0 to 1).
  • sample_size — Number of rows in the data sample.
  • sample_space — Total number of rows in the source data (without sampling).