Request format

Transmit text data using the HTTPS GET method.

https://tts.voicetech.yandex.net/generate? 

key=<API key>

& text=<text>

& format=<mp3|wav|opus>

& [quality=<hi|lo>]

& lang=<ru-RU|en-US|uk-UK|tr-TR>

& speaker=<jane|oksana|alyss|omazh|zahar|ermil>

& [speed=<rate of speech>]

& [emotion=<good|neutral|evil>]
key

API key. To get an API key, send a request to speechkit@support.yandex.ru.

text

The text that you want to voice. Any characters other than English letters and numbers must be URL encoded. For homographs, use + before the stressed vowel: def+ect.

Maximum length of the string: 2000 bytes.

format

File extension (format) of the synthesized file. Acceptable values:

  • mp3 — Audio in MPEG format, MPEG-1 Audio Layer 3 media container.

  • wav — Audio in PCM 16-bit format, WAV media container.

  • opus — Audio in Opus format, using OGG as a container.

quality (optional)

Sampling rate and bit rate of the synthesized PCM audio (WAV container). Acceptable values:

  • hi — Sampling rate of 48 kHz and bit rate of 768 kbit/s.

  • lo — Sampling rate of 8 kHz and bit rate of 128 kbit/s.

Default value: hi. Note that the quality parameter only affects the audio characteristics for format=wav.

lang

Language.

Allowed values: ru-RU — Russian (default), en-US — English, uk-UK — Ukrainian, tr-TR — Turkish.

The language is not detected automatically.

Default value: ru-RU.

speaker

The voice for the synthesized speech. You can choose one of the following voices:

  • Female voices: jane, oksana, alyss and omazh.
  • Male voices: zahar and ermil.
speed (optional)

The rate (tempo) of the synthesized speech. The rate of speech is set as a fractional number in the range from 0.1 to 3.0. Where:

  • 3.0 is the fastest speech.

  • 1.0 is the average rate of human speech.

  • 0.1 is the slowest rate of speech.

emotion (optional)

The emotional connotation of the voice. Acceptable values:

  • good — Cheerful, friendly.
  • evil — Irritated.
  • neutral — Neutral.

Default value: neutral.

Note.

The neutral value was previously called mixed (variable intonation). The mixed value is still supported, but it is considered deprecated.

Example

The URL shown below is an example of a request to synthesize a phrase in Russian.

https://tts.voicetech.yandex.net/generate?text=This%20text%20is%20ready&format=mp3&lang=ru-RU&speaker=zahar&emotion=good&key=<API‑key>

The response uses the format specified in the request (see the format parameter).