The shape of space: July 2008

Asterisk Speech Recognition API:

Architecture:

res_speech.c  <- Recognition engine

     |

     v

app_speech_utils.c

     |

     v

Dialplan functions

You must allocate and configure the recognition engine:

struct ast_speech_engine {

/*! Name of speech engine */

char *name;

/*! Set up the speech structure within the engine */

int (*create)(struct ast_speech *speech);

/*! Destroy any data set on the speech structure by the engine */

int (*destroy)(struct ast_speech *speech);

/*! Load a local grammar on the speech structure */

int (*load)(struct ast_speech *speech, char *grammar_name, char *grammar);

/*! Unload a local grammar */

int (*unload)(struct ast_speech *speech, char *grammar_name);

/*! Activate a loaded grammar */

int (*activate)(struct ast_speech *speech, char *grammar_name);

/*! Deactivate a loaded grammar */

int (*deactivate)(struct ast_speech *speech, char *grammar_name);

/*! Write audio to the speech engine */

int (*write)(struct ast_speech *speech, void *data, int len);

/*! Signal DTMF was received */

int (*dtmf)(struct ast_speech *speech, const char *dtmf);

/*! Prepare engine to accept audio */

int (*start)(struct ast_speech *speech);

/*! Change an engine specific setting */

int (*change)(struct ast_speech *speech, char *name, const char *value);

/*! Change the type of results we want back */

int (*change_results_type)(struct ast_speech *speech, enum ast_speech_results_type results_type);

/*! Try to get results */

struct ast_speech_result *(*get)(struct ast_speech *speech);

/*! Accepted formats by the engine */

int formats;

AST_LIST_ENTRY(ast_speech_engine) list;

};

Override the functions in the struct ast_speech_engine. Add the recognition engine that you has created in the engine list, or set default_engine to your engine.

You must change the recognizer state to AST_SPEECH_STATE_READY (speech->state) to recognizer to be ready. You control the recognizer states for it starts (ready to start), when it is waiting for results, when it has received the results, when it must stop, app_speech_utils.c works based on the recognizer states flag. I had implemented a recognition module for the enterprise that I work, the API functions better with function SpeechBackground (see the implementation in app_speech_utils.c - speech_background() function). SpeechStart (speech_start() in app_speech_utils.c) has a basic implementation of the recognizer and it doesn't start some resources needed as the speech_background does. SpeechBackground is more complete and convenient, you call it and change the recognizer states as needed, if you don't want to use a background playback (at this point can occur problems with echo depending on the telephony card that can be fixed with a echo cancellation if it is supported) you can use a empty audio file. See the references to AST_SPEECH_STATE_READY inside speech_background() in app_speech_utils.c and you will see the solution.

The hierarchical tree is:

app_speech_utils

|

v

res_speech

app_speech_utils.c implements functions that calls the res_speech.c functions. Attention to the recognizer states explanation. Begin implementing AST_SPEECH_STATE_READY, AST_SPEECH_STATE_NOT_READY and AST_SPEECH_STATE_DONE for a basic implementation, after this add the other states.

The API only works with short linear audio as showed in the code from speech_create in app_speech_utils.c:

speech = ast_speech_new (data, AST_FORMAT_SLINEAR);

See these parts of the documentation (attempt to the recoginizer state information):

-----

ast_speech_start(speech);

This essentially tells the speech recognition engine that you will be feeding audio to it from then on. It MUST be called every time before you start feeding audio to the speech structure.

- Send audio to be recognized:

int ast_speech_write(struct ast_speech *speech, void *data, int len)

res = ast_speech_write(speech, fr->data, fr->datalen);

This writes audio to the speech structure that will then be recognized. It must be written signed linear only at this time. In the future other formats may be supported.

- Checking for results:

The way the generic speech recognition API is written is that the speech structure will undergo state changes to indicate progress of recognition. The states are outlined below:

AST_SPEECH_STATE_NOT_READY - The speech structure is not ready to accept audio
AST_SPEECH_STATE_READY - You may write audio to the speech structure
AST_SPEECH_STATE_WAIT - No more audio should be written, and results will be available soon.
AST_SPEECH_STATE_DONE - Results are available and the speech structure can only be used again by calling ast_speech_start

It is up to you to monitor these states. Current state is available via a variable on the speech structure. (state)

- SpeechBackground(Sound File|Timeout):

This application plays a sound file and waits for the person to speak. Once they start speaking playback of the file stops, and silence is heard. Once they stop talking the processing sound is played to indicate the speech recognition engine is working. Note it is possible to have more then one result. The first argument is the sound file and the second is the timeout. Note the timeout will only start once the sound file has stopped playing.

The shape of space

Pages

Wednesday, July 02, 2008

Asterisk Speech Recognition API:

Blog Archive

Links