Speech analytics

In data streaming mode, you can get both the speech recognition results and information about certain characteristics (biometrics) of the recognized speech. Currently, the speech recognition system can guess the gender and age of the speaker and detect the language.

Attention.

Speech analytics can be used only with the Addresses (maps) and Short queries (queries) speech recognition models.

Transmitting parameters

To analyze speech, send a ConnectionRequest message with the field AdvancedASROptions. Structure of the AdvancedASROptions message:

message AdvancedASROptions
{ 
  optional bool partial_results = 1 [default = true];
 
 optional string biometry = 24;
}
FieldDescription
biometry

Characteristics to identify during voice analysis.

You can detect:

  • gender — The speaker's gender.

  • group — The speaker's age group (such as child or adult).

  • language — The language being spoken.

You can specify multiple characteristics at once in the biometry field (use commas and no spaces to separate them). For example, to get data about the speaker's gender and age, specify:

biometry=group,gender

Evaluating results

Results are sent with the final speech recognition result inside the BiometryResult class object.

message BiometryResult
{
   required string classname = 1;

   required float confidence = 2;

   optional string tag = 3;
}
FieldDescription
classname

Indicator for the biometric.

For example, two indicators are used for defining gender (the gender biometric). The voice is defined as “male” (the male indicator) or “female” (the female indicator).

All possible values of the classname field are listed below.

Gender. Indicators for the gender biometric:

  • male — The speaker is male.

  • female — The speaker is female.

Age group. Indicators for the group biometric:

  • c — Child (estimated age is under 14 years old).

  • ym — Young male (14–20 years old).

  • yf — Young female (14–20 years old).

  • am — Adult male (20–55 years old).

  • af — Adult female (20–55 years old).

  • sm — Male over 55.

  • sf — Female over 55.

Note.

The “c” (child) indicator does not differentiate between the genders.

Language. Indicators for the language biometric:

  • de — German

  • en — English

  • fr — French

  • ru — Russian

  • tr — Turkish

  • uk — Ukrainian

Each indicator has a number in the confidence field. The developer should use these numbers to calculate the final results.

confidence

Result of evaluating the indicator.

The speech recognition system evaluates each biometric by all of its indicators (the indicator names are listed in classname). The resulting number is shown for each of them in the confidence field. So for each biometric being analyzed, we get two or more numbers.

To determine the speaker's gender, subtract the female value from the male value. If the result is a positive number, this is a male voice; if it is a negative number, this is a female voice.

The larger the absolute difference, the more accurate the prediction is.

To determine the speaker's age, take the largest value. The same is true for the language biometric.

Note.

The confidence score is calculated for each biometric using different mathematical models, so you shouldn't compare the indicator values for different biometrics (for example, female and yf).

tag

A biometric that was analyzed, such as group or gender.

The AddDataResponse response contains BiometryResult bioResult only when endOfUtt=true. This indicates the end of the utterance or the end of the recording. Speech analysis is performed independently for each utterance, without consideration for previous values.

When performing speech recognition for short utterances (just one or two words), a response is returned, but the results are usually not very accurate (the absolute difference is small).