About the reference guide

SpeechKit is a multi-platform library for adding speech functionality to your mobile apps with minimal effort.

The ultimate goal of SpeechKit is to provide users with virtually the entire range of speech functionality available to Yandex.

In this release, Yandex SpeechKit offers server-side speech recognition functionality for short voice queries (only Russian and Turkish are supported).

Download the distribution

Components

In order to simplify integration while keeping detailed control over the functionality, Yandex.SpeechKit offers two versions of the API. The first is a simplified API (YSKSpeechRecognitionViewController) with a default GUI. The second is an advanced API (<YSKRecognizer and YSKInitializer) with extended management of the library functionality.

The simplified API consists of just one class:

The advanced API provides the following main classes:

  • YSKRecognizer — An interface for accessing the speech recognition function.

  • YSKInitializer — An interface for controlling the initialization process.

Regardless of which API you choose, SpeechKit must be configured using:

  • YSKSpeechKit — An interface for configuring the library and controlling overall operation.

SpeechKit

YSKSpeechKit is a tool for configuring and managing SpeechKit.

Before using any of the SpeechKit functionality, you must configure SpeechKit using configureWithAPIKey: with your API key.

YSKSpeechRecognitionViewController

This class is an iOS view controller and was developed in order to simplify integration of speech recognition in applications. YSKSpeechRecognitionViewController returns the string uttered by the user and resolves any problems that occur along the way. YSKSpeechRecognitionViewController manages the entire recognition process, including the user interface for speech recognition, management of the YSKRecognizer and YSKInitializer objects, and so on.

YSKRecognizer

YSKRecognizer is the central component of speech recognition in SpeechKit. YSKRecognizer is intended for single sessions of speech recognition. It manages the entire recognition process, including recording audio, detecting speech activity, communicating with the server, and so on. YSKRecognizer uses the YSKRecognizerDelegate interface for notification of important events in the recognition process, returning recognition results, and notification of errors.

The recognition result is represented by the YSKRecognition class, which is the “N-best list” of recognition hypotheses, sorted by confidence in descending order. A recognition hypothesis, in turn, is represented by the YSKRecognitionHypothesis class.

Errors that occur during the recognition process are described using the standard NSError mechanism.

YSKInitializer

Initialization is the internal process that SpeechKit uses for initializing internal mechanisms. Initialization may require executing lengthy read operations from permanent memory or network access, and generally takes a significant amount of time. This is why the YSKInitializer class has been introduced for performing initialization when it is convenient for the user.

In the current implementation, YSKInitializer sends a request to the server (the “startup request”) and gets a response with a set of parameters and configurations (for example, the confidence thresholds), which are then used during speech recognition.

Note. Users do not have to perform initialization explicitly. If it has not yet been done, SpeechKit initializes itself automatically when the first request for speech recognition is received. So YSKInitializer is used mainly in order to speed up the execution of the first request.

YSKInitializer uses the YSKRecognizerDelegate interface to notify you when it starts and finishes (with or without errors).

Errors that occur during the recognition process are described using the standard NSError mechanism.