Speech synthesis (text-to-speech)
Speech synthesis (text-to-speech) is the process of generating speech from printed text. SpeechKit Cloud can produce speech for any text in several languages. You can also choose the voice (male or female) and the intonation.
Quality of speech synthesis
The quality of synthesized speech refers to how well it resembles a human voice and conveys emotion through intonation.
The Yandex speech synthesis technology doesn't piece together fragments of actual speech: it trains an acoustic model on a dictor's speech. To do this, we use a statistical approach with a recurrent neural network. This creates a somewhat artificial-sounding voice, but the resulting speech is smooth, with natural intonation.
The statistical approach also allows us to change the parameters of existing voices. This means that you can choose the intonation for vocalization of a text.