Asterisk Generic Speech API

Asterisk Generic Speech Recognizer API Description


The generic speech recognition engine is implemented in the res_speech.so module.
This module connects through the API to speech recognition software, that is
not included in the module.

To use the API, you must load the res_speech.so module before any connectors.
For your convenience, there is a preload line commented out in the modules.conf
sample file.

Useful links


LumenVox Asterisk FAQ
Digium hosted speech-rec mailing list
Lumenvox Pizza Demo example

Documentation


* Dialplan Applications:


The dialplan API is based around a single speech utilities application file,
which exports many applications to be used for speech recognition. These include an
application to prepare for speech recognition, activate a grammar, and play back a
sound file while waiting for the person to speak. Using a combination of these applications
you can easily make a dialplan use speech recognition without worrying about what
speech recognition engine is being used.

- SpeechCreate(Engine Name):

This application creates information to be used by all the other applications.
It must be called before doing any speech recognition activities such as activating a
grammar. It takes the engine name to use as the argument, if not specified the default
engine will be used.

If an error occurs are you are not able to create an object, the variable ERROR will be
set to 1. You can then exit your speech recognition specific context and play back an
error message, or resort to a DTMF based IVR.

- SpeechLoadGrammar(Grammar Name|Path):

Loads grammar locally on a channel. Note that the grammar is only available as long as the
channel exists, and you must call SpeechUnloadGrammar before all is done or you may cause a
memory leak. First argument is the grammar name that it will be loaded as and second
argument is the path to the grammar.

- SpeechUnloadGrammar(Grammar Name):

Unloads a locally loaded grammar and frees any memory used by it. The only argument is the
name of the grammar to unload.

- SpeechActivateGrammar(Grammar Name):

This activates the specified grammar to be recognized by the engine. A grammar tells the
speech recognition engine what to recognize, and how to portray it back to you in the
dialplan. The grammar name is the only argument to this application.

- SpeechStart():

Tell the speech recognition engine that it should start trying to get results from audio
being fed to it. This has no arguments.

- SpeechBackground(Sound File|Timeout):

This application plays a sound file and waits for the person to speak. Once they start
speaking playback of the file stops, and silence is heard. Once they stop talking the
processing sound is played to indicate the speech recognition engine is working. Note it is
possible to have more then one result. The first argument is the sound file and the second is the
timeout. Note the timeout will only start once the sound file has stopped playing.

- SpeechDeactivateGrammar(Grammar Name):

This deactivates the specified grammar so that it is no longer recognized. The
only argument is the grammar name to deactivate.

- SpeechProcessingSound(Sound File):

This changes the processing sound that SpeechBackground plays back when the speech
recognition engine is processing and working to get results. It takes the sound file as the
only argument.

- SpeechDestroy():

This destroys the information used by all the other speech recognition applications.
If you call this application but end up wanting to recognize more speech, you must call
SpeechCreate again before calling any other application. It takes no arguments.


  • Getting Result Information:

The speech recognition utilities module exports several dialplan functions that you can use to
examine results.

- ${SPEECH(status)}:

Returns 1 if SpeechCreate has been called. This uses the same check that applications do to see if a
speech object is setup. If it returns 0 then you know you can not use other speech applications.

- ${SPEECH(spoke)}:

Returns 1 if the speaker spoke something, or 0 if they were silent.

- ${SPEECH(results)}:

Returns the number of results that are available.

- ${SPEECH_SCORE(result number)}:

Returns the score of a result.

- ${SPEECH_TEXT(result number)}:

Returns the recognized text of a result.

- ${SPEECH_GRAMMAR(result number)}:

Returns the matched grammar of the result.

- SPEECH_ENGINE(name)=value

Sets a speech engine specific attribute.


  • Dialplan Flow:

1. Create a speech recognition object using SpeechCreate()
2. Activate your grammars using SpeechActivateGrammar(Grammar Name)
3. Call SpeechStart() to indicate you are going to do speech recognition immediately
4. Play back your audio and wait for recognition using SpeechBackground(Sound File|Timeout)
5. Check the results and do things based on them
6. Deactivate your grammars using SpeechDeactivateGrammar(Grammar Name)
7. Destroy your speech recognition object using SpeechDestroy()

* Dialplan Examples:


This is pretty cheeky in that it does not confirmation of results. As well the way the
grammar is written it returns the person's extension instead of their name so we can
just do a Goto based on the result text.

- Grammar: company-directory.gram

  1. ABNF 1.0;
language en-US;
mode voice;
tag-format <lumenvox/1.0>;
root $company_directory;
$josh = Joshua
  • Asterisk LumenVox via PHPAGI
  • Astierk with Nuance using UniMRCP
  • Asterisk Generic Speech Recognizer API Description


    The generic speech recognition engine is implemented in the res_speech.so module.
    This module connects through the API to speech recognition software, that is
    not included in the module.

    To use the API, you must load the res_speech.so module before any connectors.
    For your convenience, there is a preload line commented out in the modules.conf
    sample file.

    Useful links


    LumenVox Asterisk FAQ
    Digium hosted speech-rec mailing list
    Lumenvox Pizza Demo example

    Documentation


    * Dialplan Applications:


    The dialplan API is based around a single speech utilities application file,
    which exports many applications to be used for speech recognition. These include an
    application to prepare for speech recognition, activate a grammar, and play back a
    sound file while waiting for the person to speak. Using a combination of these applications
    you can easily make a dialplan use speech recognition without worrying about what
    speech recognition engine is being used.

    - SpeechCreate(Engine Name):

    This application creates information to be used by all the other applications.
    It must be called before doing any speech recognition activities such as activating a
    grammar. It takes the engine name to use as the argument, if not specified the default
    engine will be used.

    If an error occurs are you are not able to create an object, the variable ERROR will be
    set to 1. You can then exit your speech recognition specific context and play back an
    error message, or resort to a DTMF based IVR.

    - SpeechLoadGrammar(Grammar Name|Path):

    Loads grammar locally on a channel. Note that the grammar is only available as long as the
    channel exists, and you must call SpeechUnloadGrammar before all is done or you may cause a
    memory leak. First argument is the grammar name that it will be loaded as and second
    argument is the path to the grammar.

    - SpeechUnloadGrammar(Grammar Name):

    Unloads a locally loaded grammar and frees any memory used by it. The only argument is the
    name of the grammar to unload.

    - SpeechActivateGrammar(Grammar Name):

    This activates the specified grammar to be recognized by the engine. A grammar tells the
    speech recognition engine what to recognize, and how to portray it back to you in the
    dialplan. The grammar name is the only argument to this application.

    - SpeechStart():

    Tell the speech recognition engine that it should start trying to get results from audio
    being fed to it. This has no arguments.

    - SpeechBackground(Sound File|Timeout):

    This application plays a sound file and waits for the person to speak. Once they start
    speaking playback of the file stops, and silence is heard. Once they stop talking the
    processing sound is played to indicate the speech recognition engine is working. Note it is
    possible to have more then one result. The first argument is the sound file and the second is the
    timeout. Note the timeout will only start once the sound file has stopped playing.

    - SpeechDeactivateGrammar(Grammar Name):

    This deactivates the specified grammar so that it is no longer recognized. The
    only argument is the grammar name to deactivate.

    - SpeechProcessingSound(Sound File):

    This changes the processing sound that SpeechBackground plays back when the speech
    recognition engine is processing and working to get results. It takes the sound file as the
    only argument.

    - SpeechDestroy():

    This destroys the information used by all the other speech recognition applications.
    If you call this application but end up wanting to recognize more speech, you must call
    SpeechCreate again before calling any other application. It takes no arguments.


    • Getting Result Information:

    The speech recognition utilities module exports several dialplan functions that you can use to
    examine results.

    - ${SPEECH(status)}:

    Returns 1 if SpeechCreate has been called. This uses the same check that applications do to see if a
    speech object is setup. If it returns 0 then you know you can not use other speech applications.

    - ${SPEECH(spoke)}:

    Returns 1 if the speaker spoke something, or 0 if they were silent.

    - ${SPEECH(results)}:

    Returns the number of results that are available.

    - ${SPEECH_SCORE(result number)}:

    Returns the score of a result.

    - ${SPEECH_TEXT(result number)}:

    Returns the recognized text of a result.

    - ${SPEECH_GRAMMAR(result number)}:

    Returns the matched grammar of the result.

    - SPEECH_ENGINE(name)=value

    Sets a speech engine specific attribute.


    • Dialplan Flow:

    1. Create a speech recognition object using SpeechCreate()
    2. Activate your grammars using SpeechActivateGrammar(Grammar Name)
    3. Call SpeechStart() to indicate you are going to do speech recognition immediately
    4. Play back your audio and wait for recognition using SpeechBackground(Sound File|Timeout)
    5. Check the results and do things based on them
    6. Deactivate your grammars using SpeechDeactivateGrammar(Grammar Name)
    7. Destroy your speech recognition object using SpeechDestroy()

    * Dialplan Examples:


    This is pretty cheeky in that it does not confirmation of results. As well the way the
    grammar is written it returns the person's extension instead of their name so we can
    just do a Goto based on the result text.

    - Grammar: company-directory.gram

    1. ABNF 1.0;
    language en-US;
    mode voice;
    tag-format <lumenvox/1.0>;
    root $company_directory;
    $josh = Joshua
  • Asterisk LumenVox via PHPAGI
  • Astierk with Nuance using UniMRCP
  • Created by: Josh208, Last modification: Fri 26 of Nov, 2010 (14:44 UTC) by atheos
    Please update this page with new information, just login and click on the "Edit" or "Discussion" tab. Get a free login here: Register Thanks! - Find us on Google+