ISR » History » Version 2

Version 1 (D. Klotz, 06/27/2011 07:23 PM) → Version 2/3 (D. Klotz, 06/27/2011 07:25 PM)


h1. ISR

ISR (short for Incremental Speech Recognition) is a speech recognition component based on the "ESMERALDA framework":

h2. Interfacing ISR through RSB

ISR exposes several (input and output) interfaces through RSB, all of them under some sub-scope of the top-level scope @/isr@ (or another top-level scope that can be configured as described below).

h3. Building and starting ISR with RSB-support

To build Esmeralda/ISR with RSB-support, you can supply a configuration argument like the following to cmake:
<pre>cmake [...] -D WITH_RSB=1 [...]</pre>

To start ISR with RSB-support, you should supply the following options to the isr binary:
<pre>isr [...] -o rsb:isr -m rsb [...]</pre>
The @-o rsb:isr@ option tells ISR to use RSB for communication and publish at / listen for everything under the superscope @/isr@. If you want a different top-level scope, you can supply it like this: @-o rsb:ScopeName@. The rest of this document assumes @/isr@ as the top-level scope, so you may need to adjust the sub-scopes accordingly.

*TODO:* I'm not sure what the @-m rsb@ option is actually doing. Sebastian? Lars?

h3. Speech recognition hypotheses

ISR publishes its recognition results under the scope @/isr/hyp@. They are sent as XML-strings with a format like that which is shown in the following example. If isr is configured to work incrementally, incremental results will also be published there and the final stable hypotheses can be found by looking for hypotheses where the @stable@ attribute of the @speech_hyp@ element is @1@.

*Example data:*

This is an example of the recognition result of the sentence "hello robot" and the resulting grammar tree (with a specific pre-defined grammar) that would have been published under @/isr/hyp@:

<speech_hyp [...] generator="isr" origin="grm_track_xcf" stable="1" update="0" bxml:id="29">
<part begin="330" end="630" id="0" type="word">hello</part>
<score acoustic="362.48" combined="362.48" id="0" lm="0.00" />
<part begin="630" end="1030" id="1" type="word">robot</part>
<score acoustic="602.56" combined="602.56" id="1" lm="0.00" />
<grammartree cancel="0" fault="0" skip="0">
<nonterminal name="$S">
<nonterminal name="Greeting">
<terminal refid="0" />
<nonterminal name="$RobotName">
<terminal refid="1" />

h3. Signal information

*TODO:* TODO: Add information about the signal status optionally published at the scope @/isr/status@ (if enabled via the isr option @-x@).

h3. Changing the acoustic energy thresholds

*TODO:* TODO: Add information about the signal thresholds read unter the scope @/isr/param@