Response format

The response contains a list of speech recognition hypotheses.

A recognition hypothesis is the recognition system's guess about what was said. There may be multiple hypotheses. The final list contains a maximum of five hypotheses.

The recognition system sorts the hypotheses in order of decreasing confidence. The hypothesis at the top of the list contains the most plausible result from the recognition system's point of view.

The recognized text is processed before sending: some punctuation is added (such as hyphens), and numbers are expressed as digits. This converted text is the final recognition result that is passed in the response body.

The data in the response body is in XML format.

Examples of recognition results for a speech fragment in Russian when using different language models.

<recognitionResults success="1">
     <variant confidence="0.69">твой номер 212-85-06</variant>
     <variant confidence="0.7">твой номер 213-85-06</variant>

Example of failed recognition (no hypotheses):

<recognitionResults success="0"/>

The value of the confidence attribute is not the numeric equivalent of the confidence or accuracy of speech recognition, and it is not used for sorting the list of hypotheses.


Example of a successful response:

HTTP/1.0 200 OK
Connection: close
Content-Length: 172
Content-Type: text/xml; charset=utf-8
Date: Tue, 15 Oct 2013 12:19:37 MSK
Server: YaVoiceProxy2
X-YaReqFinish: 1466143124.231587
X-YaRequestId: 8670a4aa-3572-11e3-98d2-266473e8d061
X-YaUuid: 01ae13cb744628b58fb536d496daa1e6

<?xml version="1.0" encoding="utf-8"?>
<recognitionResults success="1">
     <variant confidence="1.2">твой номер 212-85-06</variant>

Specify the X-YaRequestId header when contacting technical support. This header contains the unique request ID assigned by the server.