Speech synthesis (text-to-speech)

 is the process of generating speech from printed text. The SpeechKit Mobile SDK allows you to produce speech for any texts in several languages. You can also choose the voice (male or female) and the intonation.


  • Russian

  • English

  • Ukrainian

  • Turkish

Quality of speech synthesis

The quality of synthesized speech refers to how well it resembles a human voice and conveys emotion through intonation.

The Yandex speech synthesis technology doesn't piece together fragments of actual speech – it trains an acoustic model on a dictor's speech. To do this, we use a statistical approach with a recurrent neural network. This creates a somewhat artificial-sounding voice, but the resulting speech is smooth, with natural intonation.

The statistical approach also allows us to change the parameters of existing voices. This means that you can choose the intonation for vocalization of a text.