Yandex SpeechKit Mobile SDK 3.12.2 for iOS reference guide

Yandex SpeechKit is a multi-platform library for integrating speech functionality in your mobile apps with minimal effort. The ultimate goal of SpeechKit is to provide users with the entire range of Yandex speech technologies.

SpeechKit architecture

The SpeechKit library supports several mobile platforms using the same implementation of the basic logic. The differences between platforms are in the platform abstraction layer (recording audio, networking, etc.), API wrappers, and platform-specific components such as GUI implementation. This approach simplifies development for multiple platforms and allows for ideal synchronization of functionality between them.

Mobile platforms differ in their culture and development practices. This affects such aspects as naming of classes and methods, object instantiation, error handling, and so on. We try to minimize these differences while also making sure that SpeechKit fits naturally into the ecosystem of each of the supported platforms.

Working with the SDK

  1. Initializing the SDK
  2. Speech recognition
  3. Speech recognition + UI
  4. Speech synthesis (text-to-speech)
  5. Voice activation

Initializing the SDK

Before you begin using any SpeechKit functionality, you need to configure YSKSpeechKit with your API key (you can get a key in the Developer Dashboard). To do this, call the apiKey property in the app:
[YSKSpeechKit sharedInstance].apiKey = @"developer_api_key";
YSKSpeechKit.sharedInstance().apiKey = "developer_api_key"

Speech recognition

Speech recognition uses an object of the YSKOnlineRecognizer class:
YSKOnlineRecognizerSettings *settings = [[YSKOnlineRecognizerSettings alloc] initWithLanguage:[YSKLanguage russian] model:[YSKOnlineModel queries]]; // 1
YSKOnlineRecognizer *recognizer = [[YSKOnlineRecognizer alloc] initWithSettings: settings];
recognizer.delegate = self; // 2
[recognizer prepare]; //3
[recognizer startRecording]; // 3

let settings = YSKOnlineRecognizerSettings(language: YSKLanguage.russian(), model: YSKOnlineModel.queries()) // 1
let recognizer = YSKOnlineRecognizer(settings: settings)
recognizer.delegate = self // 2
recognizer.prepare() // 3
recognizer.startRecording() // 4
  1. To create the YSKOnlineRecognizer object, specify which settings it will work with. The required settings are the language of recognized speech and the model. For the full list of settings, see the YSKOnlineRecognizerSettings class.
  2. To monitor changes to the state of the YSKOnlineRecognizer object, specify the delegate that will receive notifications about the recognition process.
  3. YSKOnlineRecognizer requires a network connection. Because of this, it may take slightly longer to start the recognition process the first time. To avoid this, call the -prepare method in advance so it can make all the necessary configurations.
    Note.

    If the -prepare method wasn't called explicitly, it will run automatically on the first start.

  4. The start of speech recognition. Asynchronous execution.

To get recognition results and monitor changes in the state of the YSKOnlineRecognizer object, implement the YSKRecognizerDelegate protocol. Main methods of the protocol:

  1. -recognizerDidStartRecording: — Notifies when audio recording begins.
  2. -recognizer:didReceivePartialResults:withEndOfUtterance: — Notifies when intermediate speech recognition results are available. The endOfUtterance flag indicates the end of the sentence. If true, recognition is complete.
  3. -recognizerDidFinishRecognition: — Notifies when the recognition process is complete.
  4. -recognizer: didFailWithError: — Notifies that an error occurred when the YSKOnlineRecognizer class object was working.

The YSKOnlineRecognizer object can be used for repeated speech recognition. If you need to stop the recognition process before it finishes, call cancel().

Speech recognition + UI

You can also use the YSKRecognizerActivity UI dialog to make it easier to integrate speech recognition into an app. It manages the entire recognition process, including the user interface for recognition and management of the YSKOnlineRecognizer and YSKPhraseSpotter objects. YSKRecognizerDialogController starts recognition immediately after opening. The dialog window closes automatically in the following cases:

  • The recognition result was received.
  • An error occurred.
  • The user closed or minimized the app.

The dialog handles when the screen is rotated, the app is minimized, and any other events that may affect the appearance of the dialog or the behavior of the YSKOnlineRecognizer object. The YSKRecognizerDialogController class object can be reused. All necessary resources are captured when the window opens, and they are released when closed.

YSKOnlineRecognizerSettings *settings = [[YSKOnlineRecognizerSettings alloc] initWithLanguage:[YSKLanguage russian] model:[YSKOnlineModel queries]]; // 1
YSKRecognizerDialogController *dialog = [[YSKRecognizerDialogController alloc] initWithRecognizerSettings: settings];
dialog.delegate = self // 2
dialog.shouldDisplayPartialResults = YES; // 3
dialog.shouldDisplayHypothesesList = YES; // 3
dialog.skin = [YSKLightDialogSkin new]; // 3
[dialog presentRecognizerDialogOverPresentingController:self animated:YES completion:nil]; // 4

let settings = YSKOnlineRecognizerSettings(language: YSKLanguage.russian(), model: YSKOnlineModel.queries()) // 1
let dialog = YSKRecognizerDialogController(recognizerSettings: settings)
dialog.delegate = self // 2
dialog.shouldDisplayPartialResults = true // 3
dialog.shouldDisplayHypothesesList = true // 3
dialog.skin = YSKLightDialogSkin() // 3
dialog.presentRecognizerDialogOverPresenting(self, animated:true, completion:nil) // 4
  1. To create the YSKRecognizerDialogController class object, specify which settings it will work with. The required settings are the language of recognized speech and the model. For the full list of settings, see the YSKOnlineRecognizerSettings class.
  2. To monitor changes to the state of the YSKRecognizerDialogController object, specify the delegate that will receive notifications about the recognition process.
  3. You can specify additional settings for the dialog:
    • Show partial recognition results or a list of hypotheses if the result is ambiguous.
    • Set the appearance of the window to a light or dark theme.
  4. Opening the dialog window and starting speech recognition. You should only use this method to display the dialog window, because it calls the settings needed for speech recognition. Using standard methods for opening UIViewController may cause the dialog to function incorrectly.

To get notifications about the main events that occur in the speech recognition process, implement the YSKRecognizerDialogControllerDelegate protocol. Main methods of the protocol:

  1. -recognizerDialogController:didFinishWithResult: — Called when the recognition process finishes successfully.
  2. -recognizerDialogController:didFailWithError: — Called if the recognition process failed with an error.
  3. -recognizerDialogControllerDidClose:automatically: — Called at the end of the animation for closing the dialog window. The dialog window closes automatically when recognition results or errors are received. The user can close the window without waiting for speech recognition results.

Speech synthesis (text-to-speech)

Speech synthesis and vocalization uses the YSKOnlineVocalizer class object:

YSKOnlineVocalizerSettings *settings = [[YSKOnlineVocalizerSettings alloc] initWithLanguage:[YSKLanguage english]]; // 1YSKOnlineVocalizer *vocalizer = [[YSKOnlineVocalizer alloc] initWithSettings: settings];vocalizer.delegate = self; // 2[vocalizer prepare]; // 3[vocalizer synthesize:@"Tomorrow's weather" mode:YSKTextSynthesizingModeAppend]; // 3let settings = YSKOnlineVocalizerSettings(language: YSKLanguage.english()) // 1let vocalizer = YSKOnlineVocalizer(settings: settings)vocalizer.delegate = self // 2vocalizer.prepare() // 3vocalizer.synthesize("Tomorrow's weather", mode: .append) // 4
  1. To create the YSKOnlineVocalizer class object, specify which settings it will work with. The language of synthesized speech is a mandatory setting. For the full list of settings, see the YSKOnlineVocalizerSettings class.
  2. To monitor changes to the state of the YSKOnlineVocalizer class object, specify the delegate that will receive notifications about the beginning and end of speech synthesis, the beginning and end of playback of synthesized speech, and errors.
  3. YSKOnlineVocalizer requires a network connection. Because of this, it may take slightly longer to start the speech synthesis process the first time. To avoid this, call the -prepare method in advance so it can make all the necessary configurations.
    Note.

    If the -prepare method wasn't called explicitly, it will be executed automatically at the time of the first speech synthesis.

  4. Speech synthesis of the transmitted text. Asynchronous execution.

To get speech synthesis results and monitor changes in the state of the YSKOnlineVocalizer object, implement the YSKVocalizerDelegate protocol. Main methods of the protocol:

  1. -vocalizer:didReceivePartialSynthesis: — Notifies when partial speech synthesis results are received. Depending on the task, you can save them to a file or play them using the built-in player.
  2. -vocalizerDidSynthesisDone: — Notifies when the speech synthesis process is completed.
  3. -vocalizer:didFailWithError: — An error occurred in the YSKOnlineVocalizerprocess.

The YSKOnlineVocalizer class object can be used for repeated speech synthesis. If you need to end the speech synthesis or vocalization process before it finishes, call the -cancel method.

Voice activation

For voice activation, use the YSKPhraseSpotter class object. Voice activation detects a specific word or phrase in the incoming stream for speech recognition. The activation phrase is set in the language model of the YSKPhraseSpotter class object.

YSKPhraseSpotterSettings *settings = [[YSKPhraseSpotterSettings alloc] initWithModelPath:@"path/to/model"]; // 1
YSKPhraseSpotter *phraseSpotter = [[YSKPhraseSpotter alloc] initWithSettings: settings];
phraseSpotter.delegate = self; // 2
[phraseSpotter prepare]; // 3
[phraseSpotter start]; // 3

let settings = YSKPhraseSpotterSettings(modelPath: "path/to/model") // 1
let phraseSpotter = YSKPhraseSpotter(settings: settings)
phraseSpotter.delegate = self // 2
phraseSpotter.prepare() // 3
phraseSpotter.start() // 4
  1. To create the YSKPhraseSpotter class object, specify which settings it will work with. The mandatory setting is the path to the model for the YSKPhraseSpotter object. For the full list of settings, see the YSKPhraseSpotterSettings class.
  2. To monitor changes to the state of the YSKPhraseSpotter class object, specify the delegate that will receive notifications about the beginning of detection, recognition of the activation phrase, and errors.
  3. YSKPhraseSpotter does not require a network connection, but it may take some time to load the model. To avoid this, call the -prepare method in advance.
    Note.

    If the -prepare method wasn't called explicitly, it will run automatically on the first start.

  4. Starting the YSKPhraseSpotter class object. Asynchronous execution.

To get voice activation results and monitor changes in the state of the YSKPhraseSpotter class object, implement the YSKPhraseSpotterDelegate protocol. Main methods of the protocol:

  1. -phraseSpotterDidStarted: — Notifies when audio recording begins.
  2. -phraseSpotter:didSpotPhrase:withIndex: — Notifies when the activation phrase is detected in the audio stream.
  3. -phraseSpotter:didFailWithError: — Notifies that an error occurred when the YSKPhraseSpotter class object was working.

After the specified phrase is detected, the YSKPhraseSpotter object continues working. To stop it, call -stop.

Need help?

If you experience problems with the SpeechKit Mobile SDK, try enabling logging using the logLevel property of the YSKSpeechKit class. This will provide additional information about what is happening with the system at the moment, and may help you answer any questions you might have.

[YSKSpeechKit sharedInstance].logLevel = YSKLogLevelDebug;
YSKSpeechKit.sharedInstance().logLevel = .debug

If the logs don't give you enough information, search the FAQ for an answer to your question or a description of a similar problem and solution.