Yandex SpeechKit Mobile SDK 3.12.2 for iOS reference guide
Yandex SpeechKit is a multi-platform library for integrating speech functionality in your mobile apps with minimal effort. The ultimate goal of SpeechKit is to provide users with the entire range of Yandex speech technologies.
The SpeechKit library supports several mobile platforms using the same implementation of the basic logic. The differences between platforms are in the platform abstraction layer (recording audio, networking, etc.), API wrappers, and platform-specific components such as GUI implementation. This approach simplifies development for multiple platforms and allows for ideal synchronization of functionality between them.
Mobile platforms differ in their culture and development practices. This affects such aspects as naming of classes and methods, object instantiation, error handling, and so on. We try to minimize these differences while also making sure that SpeechKit fits naturally into the ecosystem of each of the supported platforms.
Working with the SDK
- Initializing the SDK
- Speech recognition
- Speech recognition + UI
- Speech synthesis (text-to-speech)
- Voice activation
Initializing the SDK
[YSKSpeechKit sharedInstance].apiKey = @"developer_api_key"; YSKSpeechKit.sharedInstance().apiKey = "developer_api_key"
YSKOnlineRecognizerSettings *settings = [[YSKOnlineRecognizerSettings alloc] initWithLanguage:[YSKLanguage russian] model:[YSKOnlineModel queries]]; // 1 YSKOnlineRecognizer *recognizer = [[YSKOnlineRecognizer alloc] initWithSettings: settings]; recognizer.delegate = self; // 2 [recognizer prepare]; //3 [recognizer startRecording]; // 3 let settings = YSKOnlineRecognizerSettings(language: YSKLanguage.russian(), model: YSKOnlineModel.queries()) // 1 let recognizer = YSKOnlineRecognizer(settings: settings) recognizer.delegate = self // 2 recognizer.prepare() // 3 recognizer.startRecording() // 4
- To monitor changes to the state of the YSKOnlineRecognizer object, specify the delegate that will receive notifications about the recognition process.
- YSKOnlineRecognizer requires a network connection. Because of this, it may take slightly longer to start the recognition process the first time. To avoid this, call the -prepare method in advance so it can make all the necessary configurations.Note.
If the -prepare method wasn't called explicitly, it will run automatically on the first start.
- The start of speech recognition. Asynchronous execution.
- -recognizerDidStartRecording: — Notifies when audio recording begins.
- -recognizer:didReceivePartialResults:withEndOfUtterance: — Notifies when intermediate speech recognition results are available. The
endOfUtteranceflag indicates the end of the sentence. If
true, recognition is complete.
- -recognizerDidFinishRecognition: — Notifies when the recognition process is complete.
Speech recognition + UI
You can also use the YSKRecognizerActivity UI dialog to make it easier to integrate speech recognition into an app. It manages the entire recognition process, including the user interface for recognition and management of the YSKOnlineRecognizer and YSKPhraseSpotter objects. YSKRecognizerDialogController starts recognition immediately after opening. The dialog window closes automatically in the following cases:
- The recognition result was received.
- An error occurred.
- The user closed or minimized the app.
The dialog handles when the screen is rotated, the app is minimized, and any other events that may affect the appearance of the dialog or the behavior of the YSKOnlineRecognizer object. The YSKRecognizerDialogController class object can be reused. All necessary resources are captured when the window opens, and they are released when closed.
YSKOnlineRecognizerSettings *settings = [[YSKOnlineRecognizerSettings alloc] initWithLanguage:[YSKLanguage russian] model:[YSKOnlineModel queries]]; // 1 YSKRecognizerDialogController *dialog = [[YSKRecognizerDialogController alloc] initWithRecognizerSettings: settings]; dialog.delegate = self // 2 dialog.shouldDisplayPartialResults = YES; // 3 dialog.shouldDisplayHypothesesList = YES; // 3 dialog.skin = [YSKLightDialogSkin new]; // 3 [dialog presentRecognizerDialogOverPresentingController:self animated:YES completion:nil]; // 4 let settings = YSKOnlineRecognizerSettings(language: YSKLanguage.russian(), model: YSKOnlineModel.queries()) // 1 let dialog = YSKRecognizerDialogController(recognizerSettings: settings) dialog.delegate = self // 2 dialog.shouldDisplayPartialResults = true // 3 dialog.shouldDisplayHypothesesList = true // 3 dialog.skin = YSKLightDialogSkin() // 3 dialog.presentRecognizerDialogOverPresenting(self, animated:true, completion:nil) // 4
- To monitor changes to the state of the YSKRecognizerDialogController object, specify the delegate that will receive notifications about the recognition process.
- You can specify additional settings for the dialog:
- Show partial recognition results or a list of hypotheses if the result is ambiguous.
- Set the appearance of the window to a light or dark theme.
- Opening the dialog window and starting speech recognition. You should only use this method to display the dialog window, because it calls the settings needed for speech recognition. Using standard methods for opening
UIViewControllermay cause the dialog to function incorrectly.
To get notifications about the main events that occur in the speech recognition process, implement the YSKRecognizerDialogControllerDelegate protocol. Main methods of the protocol:
- -recognizerDialogController:didFinishWithResult: — Called when the recognition process finishes successfully.
- -recognizerDialogController:didFailWithError: — Called if the recognition process failed with an error.
- -recognizerDialogControllerDidClose:automatically: — Called at the end of the animation for closing the dialog window. The dialog window closes automatically when recognition results or errors are received. The user can close the window without waiting for speech recognition results.
Speech synthesis (text-to-speech)
Speech synthesis and vocalization uses the YSKOnlineVocalizer class object:
YSKOnlineVocalizerSettings *settings = [[YSKOnlineVocalizerSettings alloc] initWithLanguage:[YSKLanguage english]]; // 1YSKOnlineVocalizer *vocalizer = [[YSKOnlineVocalizer alloc] initWithSettings: settings];vocalizer.delegate = self; // 2[vocalizer prepare]; // 3[vocalizer synthesize:@"Tomorrow's weather" mode:YSKTextSynthesizingModeAppend]; // 3let settings = YSKOnlineVocalizerSettings(language: YSKLanguage.english()) // 1let vocalizer = YSKOnlineVocalizer(settings: settings)vocalizer.delegate = self // 2vocalizer.prepare() // 3vocalizer.synthesize("Tomorrow's weather", mode: .append) // 4
- To monitor changes to the state of the YSKOnlineVocalizer class object, specify the delegate that will receive notifications about the beginning and end of speech synthesis, the beginning and end of playback of synthesized speech, and errors.
- YSKOnlineVocalizer requires a network connection. Because of this, it may take slightly longer to start the speech synthesis process the first time. To avoid this, call the -prepare method in advance so it can make all the necessary configurations.Note.
If the -prepare method wasn't called explicitly, it will be executed automatically at the time of the first speech synthesis.
- Speech synthesis of the transmitted text. Asynchronous execution.
- -vocalizer:didReceivePartialSynthesis: — Notifies when partial speech synthesis results are received. Depending on the task, you can save them to a file or play them using the built-in player.
- -vocalizerDidSynthesisDone: — Notifies when the speech synthesis process is completed.
For voice activation, use the YSKPhraseSpotter class object. Voice activation detects a specific word or phrase in the incoming stream for speech recognition. The activation phrase is set in the language model of the YSKPhraseSpotter class object.
YSKPhraseSpotterSettings *settings = [[YSKPhraseSpotterSettings alloc] initWithModelPath:@"path/to/model"]; // 1 YSKPhraseSpotter *phraseSpotter = [[YSKPhraseSpotter alloc] initWithSettings: settings]; phraseSpotter.delegate = self; // 2 [phraseSpotter prepare]; // 3 [phraseSpotter start]; // 3 let settings = YSKPhraseSpotterSettings(modelPath: "path/to/model") // 1 let phraseSpotter = YSKPhraseSpotter(settings: settings) phraseSpotter.delegate = self // 2 phraseSpotter.prepare() // 3 phraseSpotter.start() // 4
- To monitor changes to the state of the YSKPhraseSpotter class object, specify the delegate that will receive notifications about the beginning of detection, recognition of the activation phrase, and errors.
- Starting the YSKPhraseSpotter class object. Asynchronous execution.
- -phraseSpotterDidStarted: — Notifies when audio recording begins.
- -phraseSpotter:didSpotPhrase:withIndex: — Notifies when the activation phrase is detected in the audio stream.
If you experience problems with the SpeechKit Mobile SDK, try enabling logging using the logLevel property of the YSKSpeechKit class. This will provide additional information about what is happening with the system at the moment, and may help you answer any questions you might have.
[YSKSpeechKit sharedInstance].logLevel = YSKLogLevelDebug; YSKSpeechKit.sharedInstance().logLevel = .debug
If the logs don't give you enough information, search the FAQ for an answer to your question or a description of a similar problem and solution.