Versions of the SpeechKit Mobile SDK

The changes to the SpeechKit Mobile SDK are listed below. The release date of the corresponding version is shown in parentheses.

  1. Android
  2. iOS

Android

Repository on GitHub.

  1. Version 3.12.2
  2. Version 2.5
  3. Version 2.2
  4. Version 2.1

Version 3.12.2

Reference guide for version 3.12.2 (April 10, 2018)
List of changes
General
  • New audio sources (classes that implement the AudioSource interface). They previously existed in SpeechKit, but they were hidden inside the library components. Now you can set the audio source yourself (this may not be the standard audio input on the device) and configure the audio session. But if there aren't any special circumstances, you can use the audio sources that have been added to the library. They are represented by the AutoStartStopAudioSource and ManualStartStopAudioSource classes. You can read more about audio sources and how to work with them in the documentation.
  • New component settings. Now each component (Recognizer, Vocalizer, PhraseSpotter) has its own settings, which are set when creating an object of the corresponding class, and not via a global SpeechKit object.
  • The main library components are reusable. The Recognizer, Vocalizer, and PhraseSpotter components now have the prepare() method for performing lengthy operations that are necessary for configuring a component. This method is called automatically when the component is first started. If the timing of the first start of the component is critical, this method can be called in advance.
Migration guide from version 2.5 to version 3.12.2
Recognizer

The class for speech recognition is now called OnlineRecognizer to emphasize that a network connection is required. OnlineRecognizer has its own settings, which are defined in the OnlineRecognizer.Builder class.

To create an object of the OnlineRecognizer class, you must set the settings and the audio source that will send audio to the OnlineRecognizer class object. You can use one of the two built-in classes to do this (AutoStartStopAudioSource or ManualStartStopAudioSource), or create your own class that implements the AudioSource interface. The default is AutoStartStopAudioSource.

The OnlineRecognizer class object is now reusable, meaning the startRecording() method, which starts the speech recognition process, can be called repeatedly. The onPartialResults method of the RecognizerListener interface now also contains the final speech recognition result if endOfUtterance = = true. The onRecognitionDone method of the RecognizerListener interface is called when the speech recognition process is finished.

Version 2.5
Recognizer recognizer = Recognizer.create(Recognizer.Language.RUSSIAN, Recognizer.Model.MAPS, new RecognizerListener() {}, false);
recognizer.start()
...
new RecognizerListener() {
            //...
            @Override
            public void onRecognitionDone(Recognizer recognizer, Recognition recognition) {
                reportRecognitionDone(recognition.getBestResultText());
                currentRecognizer = null;
            }
        };
Version 3.12.2
OnlineRecognizer recognizer = new OnlineRecognizer.Builder(Language.RUSSIAN, OnlineModel.MAPS, false, new RecognizerListener() {})
                .setAudioSource(new AudioSource() {})
                .setSoundFormat(SoundFormat.AUTO)
                .setAllowPlatformRecognizer(false)
                .setDisableAntimat(true)
                .setMusic(false)
                .setVadEnabled(true)
                .setEnablePunctuation(false)
                //...
                .build();
recognizer.prepare();
recognizer.startRecording();
...
new RecognizerListener() {
            @Override
            public void onPartialResults(Recognizer recognizer, Recognition recognition, boolean endOfUtterance) {
                if (endOfUtterance) {
                    // Result.
                }
            };
            @Override
            public void onRecognitionDone(@NonNull Recognizer recognizer) {
            }
}
Phrase Spotter

PhraseSpotter is no longer a singleton class. It can be created, launched, and destroyed as many times as necessary. Work with the language model has been completely moved to PhraseSpotter. You just need to specify the path to the model.

To create an object of this class, define the settings that are described in PhraseSpotter.Builder. You can also set the audio source in the same way as OnlineRecognizer.

Version 2.5
PhraseSpotterModel phraseSpotterModel = new PhraseSpotterModel("your_path_here");
final Error loadError = phraseSpotterModel.load();
if (loadError.getCode() != Error.ERROR_OK) {
    //fix
}

final Error setModelError = PhraseSpotter.setModel(phraseSpotterModel);
if (setModelError.getCode() != Error.ERROR_OK) {
    //fix
}
PhraseSpotter.start()
Version 3.12.2
PhraseSpotter phraseSpotter = PhraseSpotter.Builder("your_path_here", new PhraseSpotterListener(){}).build();
phraseSpotter.prepare()
phraseSpotter.start()
Vocalizer

The class for speech synthesis is now called OnlineVocalizer to emphasize that a network connection is required. To create an object of the OnlineVocalizer class, you must set the settings that are described in the OnlineVocalizer.Builder class.

OnlineRecognizer is now reusable, and the start() method has been replaced with synthesize, which allows to specify the text for speech synthesis.

Version 2.5
Vocalizer vocalizer = Vocalizer.createVocalizer(Vocalizer.Language.RUSSIAN, true, Vocalizer.Voice.OKSANA);
vocalizer.start()
Version 3.12.2
OnlineVocalizer.Builder vocalizer = new OnlineVocalizer.Builder(Language.ENGLISH, this)                .setEmotion(Emotion.GOOD)                .setVoice(Voice.ERMIL)                .build();vocalizer.prepare();vocalizer.synthesize("Tomorrow's weather", Vocalizer.TextSynthesizingMode.APPEND);
Recognizer GUI

The API for creating and starting RecognizerActivity almost hasn't changed, but the UI has changed completely and there are some useful new settings.

Version 2.5
final Intent intent = new Intent(getContext(), RecognizerDialogActivity.class);
        intent.putExtra(RecognizerDialogActivity.EXTRA_LANGUAGE, Recognizer.Language.RUSSIAN);
        intent.putExtra(RecognizerDialogActivity.EXTRA_MODEL, Recognizer.Model.MAPS);
        intent.putExtra(RecognizerDialogActivity.EXTRA_NIGHT_THEME, false);
        getActivity().startActivityForResult(intent, RECOGNIZER_REQUEST_CODE);
Version 3.12.2
final Intent intent = new Intent(getContext(), RecognizerActivity.class);
        intent.putExtra(RecognizerActivity.EXTRA_LANGUAGE, Language.RUSSIAN.getValue());
        intent.putExtra(RecognizerActivity.EXTRA_MODEL, OnlineModel.MAPS.getName());
        intent.putExtra(RecognizerActivity.EXTRA_NIGHT_THEME, false);
        getActivity().startActivityForResult(intent, RECOGNIZER_REQUEST_CODE);

Version 2.5

Version 2.5 reference (15 February 2016)
List of changes
General
  • Added support for Android 6.
  • The library is able in Maven Central.
Speech recognition. New models and languages

Dictation mode is now available for all models.

You can include profanity in recognition results (see disableAntimat).

You don't have to stop voice activation before starting speech recognition. This significantly reduces the startup time for recognition.

Text-to-speech. New voices and languages
Voice activation
  • Better quality of command recognition (improved the acoustic model).
  • The model can be switched without stopping PhraseSpotter.
Other improvements
Migrating from version 2.2 to version 2.5
  1. Replace the constant names for the models:

    Recognizer.Model.freeform →  Recognizer.Model.NOTES
    
    Recognizer.Model.general →  Recognizer.Model.QUERIES
    
    Recognizer.Model.maps →  Recognizer.Model.MAPS
    
    Recognizer.Model.music →  Recognizer.Model.MUSIC

    Example:

    Recognizer rec = Recognizer.create(yourLng, Recognizer.Model.general, yourListener);// Replace with:Recognizer rec = Recognizer.create(yourLng, Recognizer.Model.QUERIES, yourListener);
  2. Replace the constant names for Russian and Turkish:

    Recognizer.Language.russian →  Recognizer.Language.RUSSIAN
    
    Recognizer.Language.turkish →  Recognizer.Language.TURKISH
  3. Add the onSpeechEnds method to the class that implements the RecognizerListener interface.

Version 2.2

Version 2.2 reference (30 October 2014)
List of changes
Added:
  • Voice activation.
  • Text-to-speech.
  • Dictation mode.

Version 2.1

Version 2.1 reference (9 April 2014)
List of changes
Added:
  • The GNU STL library can now be shared.
Fixed:
  • Improved stability on different devices.

iOS

Repository on GitHub.

  1. Version 3.12.2
  2. Version 2.5
  3. Version 2.2
  4. Version 2.1

Version 3.12.2

Reference guide for version 3.12.2 (April 10, 2018)
List of changes
General
  • New audio sources (classes that implement the YSKAudioSource protocol). They previously existed in SpeechKit, but they were hidden inside the library components. This created a lot of problems because the audio sources required working with AVAudioSession, but there is only one of these audio sessions created for the entire app, and activating or deactivating the audio session can take a long time. Now you can set the audio source yourself (this may not be the standard audio input on the device) and configure the audio session. But if there aren't any special circumstances, you can use the audio sources that have been added to the library. They are represented by the YSKAutoAudioSource and YSKManualAudioSource classes. You can read more about audio sources and how to work with them in the documentation.
  • New component settings. Now each component (Recognizer, Vocalizer, PhraseSpotter) has its own settings, which are set when creating an object of the corresponding class, and not via a global YSKSpeechKit object.
  • The main library components are reusable. The Recognizer, Vocalizer, and PhraseSpotter components now have the -prepare method for performing lengthy operations that are necessary for configuring a component. This method is called automatically when the component is first started. If the timing of the first start of the component is critical, this method can be called in advance.
Migration guide from version 2.5 to version 3.12.2
Audio Session

The YSKAudioSessionHandler class has been created for working with an audio session. It allows you to configure and activate the audio session for the library components to work correctly.

If your app uses an audio session only when working with the SpeechKit library, we recommend using this class to configure the audio session.

If the app uses the audio session outside of the library (for audio and video playback, recording audio, and so on), you can also use this class, or configure the audio session independently.

You must set up and activate an audio session before you can start working with audio, and deactivate it when you have finished working with the audio source. You need an audio session if you:
  1. Use one of the built-in audio sources.
  2. Create an object of the YSKOnlineRecognizer or YSKPhraseSpotter class with the default audio source (YSKAutoAudioSource is used).
  3. Play synthesized speech through the built-in YSKOnlineVocalizer player.
  4. Interact with the audio in your app in a different way (playing sounds or videos, etc.).
Version 2.5
YSKSpeechKit.sharedInstance().setParameter(YSKDisableAudioSessionChanging, withValue: "true");// Configuring the audio session.
// Configuring the audio session....
// Deactivating the audio session.
Version 3.12.2
func activateAudioSession() {
  do {
    try YSKAudioSessionHandler.sharedInstance().activateAudioSession()
  }
  catch {
  }
}
func deactivateAudioSession() {
  do {
    try YSKAudioSessionHandler.sharedInstance().deactivateAudioSession()
  }
  catch {
  }
}
Recognizer

The class for speech recognition is now called YSKOnlineRecognizer to emphasize that a network connection is required. YSKOnlineRecognizer has its own settings, which are defined in the YSKOnlineRecognizerSettings class.

To create an object of the YSKOnlineRecognizer class, you must set the settings and the audio source that will send audio to the YSKOnlineRecognizer class object. You can use one of the two built-in classes to do this (YSKAutoAudioSource or YSKManualAudioSource), or create your own class that implements the YSKAudioSource interface. If you use the -initWithSettings: method, the YSKAutoAudioSource audio source will be created by default.

The YSKOnlineRecognizer class object is now reusable, meaning the -startRecording method, which starts the speech recognition process, can be called repeatedly. The -recognizer:didReceivePartialResults:withEndOfUtterance: method of the YSKRecognizerDelegate protocol now also contains the final speech recognition result if endOfUtterance = = true. The -recognizerDidFinishRecognition: method of the YSKRecognizerDelegate protocol is called when the speech recognition process is finished.

For the YSKOnlineRecognizer class object to be able to work with YSKAutoAudioSource, YSKManualAudioSource or another audio source that uses the device's standard audio input, you must configure and activate the audio session (see Audio Session).

Version 2.5
fileprivate var recognizer: YSKRecognizer?
fileprivate var recognitionResults: YSKRecognition?...func buildAndRunRecognizer() {  YSKSpeechKit.sharedInstance().setParameter(YSKDisableAntimat, withValue: "false");  YSKSpeechKit.sharedInstance().setParameter(YSKEnablePunctuation, withValue: "false");  recognizer = YSKRecognizer(language: YSKRecognitionLanguageRussian, model: YSKRecognitionModelQueries)  recognizer?.delegate = self  recognizer?.start();}...func recognizer(_ recognizer: YSKRecognizer!, didCompleteWithResults results: YSKRecognition!) {  recognitionResults = results  // Speech recognition process finished.
}
Version 3.12.2
fileprivate var recognizer: YSKOnlineRecognizer?
fileprivate var recognitionResults: YSKRecognition?...func buildAndRunRecognizer() {  let recognizerSettings = YSKOnlineRecognizerSettings(language: YSKLanguage.russian(), model: YSKOnlineModel.queries())  recognizerSettings.disableAntimat = false  recognizerSettings.enablePunctuation = false  recognizer = YSKOnlineRecognizer(settings: recognizerSettings)  recognizer?.delegate = self  recognizer?.startRecording();}func recognizer(_ recognizer: YSKRecognizing, didReceivePartialResults results: YSKRecognition, withEndOfUtterance endOfUtterance: Bool) {  if endOfUtterance{    recognitionResults = results  }}func recognizerDidFinishRecognition(_ recognizer: YSKRecognizing) {  // Speech recognition process finished.
}
Phrase Spotter

YSKPhraseSpotter is no longer a singleton class. It can be created, launched, and destroyed as many times as necessary. Work with the language model has been completely moved to YSKPhraseSpotter. You just need to specify the path to the model.

To create an object of this class, define the settings that are described in YSKPhraseSpotterSettings. You can also set the audio source in the same way as YSKOnlineRecognizer.

For the YSKPhraseSpotter class object to be able to work with YSKAutoAudioSource, YSKManualAudioSource or another audio source that uses the device's standard audio input, you must configure and activate the audio session (see Audio Session).

Version 2.5
fileprivate var phraseSpotter: YSKPhraseSpotter?
fileprivate var phraseSpotterMode: YSKPhraseSpotterModel?...func buildAndRunPhraseSpotter() {  phraseSpotterModel = YSKPhraseSpotterModel(configDirectory: pathToPhraseSpotterModel)  phraseSpotterModel?.load()  YSKPhraseSpotter.setModel(phraseSpotterModel)  YSKPhraseSpotter.setDelegate(delegate)  YSKPhraseSpotter.start()}...func stopPhraseSpotterAndUnloadModel() {  phraseSpotterModel?.unload()  YSKPhraseSpotter.stop()}...func phraseSpotter(_ phraseSpotter: YSKPhraseSpotter, didSpotPhrase phrase: String, with phraseIndex: Int) {  // Phrase recognized.
}
Version 3.12.2
fileprivate var phraseSpotter: YSKPhraseSpotter?...func buildAndRunPhraseSpotter() {  let spotterSettings = YSKPhraseSpotterSettings(modelPath: pathToPhraseSpotterModel)  phraseSpotter = YSKPhraseSpotter(settings: spotterSettings)  phraseSpotter?.delegate = self  phraseSpotter?.start()}...func phraseSpotter(_ phraseSpotter: YSKPhraseSpotter, didSpotPhrase phrase: String, with phraseIndex: Int) {  // Phrase recognized.
}
Vocalizer

The class for speech synthesis is now called YSKOnlineVocalizer to emphasize that a network connection is required. To create an object of the YSKOnlineVocalizer class, you must define the settings that are described in the YSKOnlineVocalizerSettings class.

YSKOnlineRecognizer is now reusable, and the -start method has been replaced with -synthesize:mode:, which allows to specify the text for speech synthesis.

To play back synthesized speech through the built-in YSKOnlineVocalizer player, you need to configure and activate the audio session (see Audio Session).

Version 2.5
fileprivate var vocalizer: YSKVocalizer?...func buildAndRunVocalizer() {  vocalizer = YSKVocalizer(text: text, language: YSKVocalizerLanguageRussian, autoPlay: true, voice: YSKVocalizerVoiceJane)  vocalizer?.delegate = delegate  vocalizer?.start()}...func vocalizer(vocalizer: YSKVocalizer!, didFinishSynthesisWithResult result: YSKSynthesis!) {  // Accumulating the results.
  // End of the speech synthesis process.
}
Version 3.12.2
fileprivate var vocalizer: YSKOnlineVocalizer?...func buildAndRunVocalizer() {  let vocalizerSettings = YSKOnlineVocalizerSettings(language: YSKLanguage.russian())  vocalizerSettings.voice = YSKVoice.jane()  vocalizerSettings.autoPlaying = true  vocalizer = YSKOnlineVocalizer(settings: vocalizerSettings)  vocalizer?.delegate = self  vocalizer?.synthesize(text, mode: YSKSynthesizingMode.Append)}...func vocalizer(_ vocalizer: YSKVocalizing, didReceivePartialSynthesis result: YSKSynthesis) {  // Accumulating the results.}func vocalizerDidSynthesisDone(_ vocalizer: YSKVocalizing) {  // End of the speech synthesis process.
}
Recognizer GUI

YSKRecognizerDialogController gets the YSKOnlineRecognizer and YSKPhraseSpotter objects during initialization for speech recognition and voice activation on the error screen, respectively. If YSKPhraseSpotter was not set up, voice activation is not used on the error screen. This approach allows us to separate the UI part from the logic of speech recognition and voice activation.

YSKRecognizerDialogController is passed an object that implements the YSKRecognizing protocol (this can be YSKOnlineRecognizer or your own class that implements the YSKRecognizing protocol). If you want to use voice activation on the error screen, we recommend using a single object of the YSKManualAudioSource class when creating YSKOnlineRecognizer and YSKPhraseSpotter. In this case, YSKOnlineRecognizer and YSKPhraseSpotter will use the same audio source, which significantly speeds up the work of YSKRecognizerDialogController.

The old UI for YSKSpeechRecognitionViewController is no longer supported.

For the YSKPhraseSpotter and YSKPhraseSpotter class object to be able to work with YSKAutoAudioSource, YSKManualAudioSource or another audio source that uses the device's standard audio input, you must configure and activate the audio session (see Audio Session).

Version 2.5
func buildAndRunRecognizerDialog() {  let recognizerDialogController = YSKRecognizerDialogController(model: YSKRecognitionModelQueries, language: YSKRecognitionLanguageRussian)  recognizerDialogController?.delegate = self  recognizerDialogController?.shouldDisplayPartialResults = false  recognizerDialogController?.shouldDisplayHypothesesList = false  recognizerDialogController?.skin = YSKDarkDialogSkin()  recognizerDialogController?.usePhraseSpotterForRetry = true  recognizerDialogController.presentRecognizerDialogOverPresentingController(self, animated:true, completion:nil)}...func recognizerDialogController(controller: YSKRecognizerDialogController, didFinishWithResult result:String) {  // Ending the recognition process and closing the dialog.
}
Version 3.12.2
fileprivate var audioSource = YSKManualAudioSource()...func buildAndRunRecognizerDialog() {  let recognizerSettings = YSKOnlineRecognizerSettings(language: YSKLanguage.russian(), model: YSKOnlineModel.queries())  recognizer = YSKOnlineRecognizer(settings: recognizerSettings, audioSource: audioSource)  let spotterSettings = YSKPhraseSpotterSettings(modelPath: pathToPhraseSpotterModel)  phraseSpotter = YSKPhraseSpotter(settings: spotterSettings, audioSource: audioSource)  let recognizerDialogController = YSKRecognizerDialogController(recognizer: recognizer, phraseSpotter: phraseSpotter)  recognizerDialogController?.delegate = self  recognizerDialogController?.shouldDisplayPartialResults = false  recognizerDialogController?.shouldDisplayHypothesesList = false  recognizerDialogController?.skin = YSKDarkDialogSkin()  audioSource.start()  recognizerDialogController.presentRecognizerDialogOverPresentingController(self, animated:true, completion:nil)}...func recognizerDialogController(controller: YSKRecognizerDialogController, didFinishWithResult result:String) {  audioSource.stop()  // Ending the recognition process and closing the dialog.
}

Version 2.5

Version 2.5 reference (15 February 2016)
List of changes
General
Speech recognition. New models and languages

Dictation mode is now available for all models.

You can include profanity in recognition results (see YSKDisableAntimat).

You don't have to stop voice activation before starting speech recognition. This significantly reduces the startup time for recognition.

Text-to-speech. New voices and languages
Voice activation
  • Better quality of command recognition (improved the acoustic model).
  • The model can be switched without stopping YSKPhraseSpotter.
Other improvements:
Migrating from version 2.2 to version 2.5
  1. Replace the constant names for the models:

    YSKRcognitionModelFreeform →  YSKRecognitionModelNotes
    
    YSKRcognitionModelGeneral →  YSKRecognitionModelQueries

    Example:

    _recognizer = [[YSKRecognizer alloc] initWithLanguage:_recognizerLanguage model:YSKRecognitionModelGeneral];// Replace with:_recognizer = [[YSKRecognizer alloc] initWithLanguage:_recognizerLanguage model:YSKRecognitionModelQueries];
  2. Add the -recognizerDidDetectSpeechEnd: method to the class implementing the YSKRecognizerDelegate protocol.

Version 2.2

Version 2.2 reference (30 October 2014)
List of changes
Added:
  • Voice activation.
  • Text-to-speech.
  • Dictation mode.

Version 2.1

Version 2.1 reference (9 April 2014)
List of changes
Added:
  • Support for ARM-64.
  • Geolocation management.
Fixed:
  • Improved stability on different devices.