General questions

  1. Who can use the features of Yandex SpeechKit?
  2. Can I use Yandex SpeechKit for commercial purposes?
  3. Who can get free access to SpeechKit?
  4. How do I get an API key?
  5. What is a UUID and how do I get one?
  6. Where can I use Yandex SpeechKit?
  7. We have a project that needs to use speech technologies. We are prepared to pay for the technology. Can you help?

Who can use the features of Yandex SpeechKit?

Anyone who wants to can use Yandex SpeechKit.

Commercial use of the SpeechKit Mobile SDK is available for businesses.

Individuals who use the SpeechKit Mobile SDK technology for private and community projects can also use Yandex speech technologies free of charge.

Free usage of the SpeechKit Mobile SDK is regulated by the user agreement.

Can I use Yandex SpeechKit for commercial purposes?

Commercial use of the SpeechKit Mobile SDK is available for businesses.

To switch to commercial use, fill out the application to set up an agreement and send it to us.

For commercial use, the rate is 20 kopecks per request for speech synthesis or speech recognition. One request is equal to 20 seconds. If a shorter data packet is processed for a server request, it is automatically rounded up to 20 seconds.

At the end of the billing period, an invoice is issued based on the actual number of requests sent. The minimum charge is 200 rubles.

Who can get free access to SpeechKit?

The SpeechKit Mobile SDK is free of charge if you make less than 10,000 requests per day.

The first month of using the SpeechKit Cloud API is free for everyone. During the free period, there is a limit of no more than 1000 requests per day.

If you want to use automatic speech recognition in research or educational projects or for commercial purposes, write to us. Specify the key you are using and describe your project.

How do I get an API key?

To get a unique API key, send a request to speechkit@support.yandex.ru.

To confirm your request, we will ask you to tell us what tasks you are planning to perform and give us an estimate of the expected load.

What is a UUID and how do I get one?

A UUID (Universally Unique Identifier) is a universal user ID. It is unique for each user or device.

The developer generates the ID randomly. For our API, you need to pass it as a 32-digit hexadecimal string (without hyphens).

Where can I use Yandex SpeechKit?

The SpeechKit library is currently used in Yandex mobile apps and services, as well as in projects of other app developers for iOS and Android.

SpeechKit is also integrated in industrial systems where automatic speech recognition is necessary.

We have a project that needs to use speech technologies. We are prepared to pay for the technology. Can you help?

At the moment, the SpeechKit Mobile SDK is provided “as is”. We do not customize the technology on request.

The features of the SpeechKit Mobile SDK are described in the documentation.

Speech technologies

  1. Where does speech recognition occur?
  2. What determines the quality of automatic speech recognition?
  3. Is it possible to improve the quality of recognition for a specific user?
  4. Is it possible to create new language models for speech recognition?
  5. We want to use SpeechKit to record meetings. Is this possible?
  6. We want to use SpeechKit to convert phone conversations or interviews to text and to flag certain words. Is this possible?

Where does speech recognition occur?

Speech recognition occurs on Yandex servers.

What determines the quality of automatic speech recognition?

The quality of recognition depends on the quality of the incoming sound, the encoding quality, the rate and clarity of speech, and the complexity and length of phrases. The topic of a voice query is also important, since it should match the chosen language model as well as possible.

Is it possible to improve the quality of recognition for a specific user?

The quality of recognition can be improved by the quality of the input audio.

The Yandex acoustic models are trained on hundreds of thousands of speech recordings from different people and account for differences in pronunciation and accents. The models are continually being retrained with fresh data, so there is probably no need to adapt the system to a particular user.

Is it possible to create new language models for speech recognition?

We are not currently developing additional language models on request.

We want to use SpeechKit to record meetings. Is this possible?

Recognizing long audio files is a complex and difficult task. We do not currently accept projects like this.

We want to use SpeechKit to convert phone conversations or interviews to text and to flag certain words. Is this possible?

No. SpeechKit is designed to recognize short speech fragments that are a maximum of 30 seconds long.

SpeechKit functionality can't be used for speech analysis tasks such as identifying specific words, evaluating the emotional tone of conversations, or matching a conversation to a script.

SpeechKit Mobile SDK

  1. What platforms can I use with the SpeechKit Mobile SDK?
  2. Do I need to create my own UI for speech recognition dialogs?
  3. What do I need to do in order to use the SDK in Delphi applications for Android?
  4. What is a UUID and how do I get one?
  5. Can I display the standard speech recognition UI in Russian for a system with a different localization?
  6. Why does the SDK request permission to get the user's coordinates?
  7. Why does the quality of speech recognition suffer when there is a bad internet connection?
  8. How can I get a voice activation model for English?
  9. In iOS, can I choose a virtual device to use for recording speech?
  10. Can I use the same API key for iOS and Android?

What platforms can I use with the SpeechKit Mobile SDK?

The SpeechKit Mobile SDK can be used for iOS and Android.

Do I need to create my own UI for speech recognition dialogs?

The library provides a standard user interface, so there is almost nothing you need to code yourself.

At the same time, the library has everything you need for embedding speech recognition in an existing or new interface.

What do I need to do in order to use the SDK in Delphi applications for Android?

You only need to create bridge files using these utilities:

What is a UUID and how do I get one?

A UUID (Universally Unique Identifier) is a universal user ID. It is unique for each user or device.

The developer generates the ID randomly. For our API, you need to pass it as a 32-digit hexadecimal string (without hyphens).

Can I display the standard speech recognition UI in Russian for a system with a different localization?

When creating the UI, you can explicitly specify which language to use.

Why does the SDK request permission to get the user's coordinates?

Information about the user's current coordinates provides better speech recognition results for certain types of queries (primarily geo-dependent ones). For example, this is used when recognizing street names in small towns.

This also helps us get information about where the majority of voice queries are coming from and display this information on a map.

Why does the quality of speech recognition suffer when there is a bad internet connection?

For slow internet connections, Yandex SpeechKit uses a compression format with some loss of audio data. You can choose the desired compression format manually.

How can I get a voice activation model for English?

We are working on voice activation for other languages, and we are interested in discussing collaboration. To get a model for English, Turkish, Ukrainian, German, French, or Spanish, write to us at voice@support.yandex.ru.

In iOS, can I choose a virtual device to use for recording speech?

You can. To do this, write a class that implements the YSKAudioSource protocol (for more information, see Version 3.12.2. List of changes ).

Can I use the same API key for iOS and Android?

Yes.