ISR

ISR (short for Incremental Speech Recognition) is a speech recognition component based on the ESMERALDA framework.

Interfacing ISR through RSB

ISR exposes several (input and output) interfaces through RSB, all of them under some sub-scope of the top-level scope /isr (or another top-level scope that can be configured as described below).

Building and starting ISR with RSB-support

To build Esmeralda/ISR with RSB-support, you can supply a configuration argument like the following to cmake:

cmake [...] -D WITH_RSB=1 [...]

To start ISR with RSB-support, you should supply the following options to the isr binary:

isr [...] -o rsb:isr -m rsb [...]

The -o rsb:isr option tells ISR to use RSB for communication and publish at / listen for everything under the superscope /isr. If you want a different top-level scope, you can supply it like this: -o rsb:ScopeName. The rest of this document assumes /isr as the top-level scope, so you may need to adjust the sub-scopes accordingly.

TODO: I'm not sure what the -m rsb option is actually doing. Sebastian? Lars?

Speech recognition hypotheses

ISR publishes its recognition results under the scope /isr/hyp. They are sent as XML-strings with a format like that which is shown in the following example. If isr is configured to work incrementally, incremental / intermediate results will also be published there and the final stable hypotheses can be found by looking for hypotheses where the stable attribute of the speech_hyp element is 1.

Example data:

This is an example of the recognition result of the sentence "hello robot" and the resulting grammar tree (with a specific pre-defined grammar) that would have been published under /isr/hyp:

<speech_hyp [...] generator="isr" origin="grm_track_xcf" stable="1" update="0" bxml:id="29">
<TIMESTAMP>1290275288296</TIMESTAMP>
<seq>
  <part begin="330" end="630" id="0" type="word">hello</part>
  <score acoustic="362.48" combined="362.48" id="0" lm="0.00" />
  <part begin="630" end="1030" id="1" type="word">robot</part>
  <score acoustic="602.56" combined="602.56" id="1" lm="0.00" />
</seq>
<grammartree cancel="0" fault="0" skip="0">
  <nonterminal name="$S">
    <nonterminal name="Greeting">
      <terminal refid="0" />
      <nonterminal name="$RobotName">
        <terminal refid="1" />
      </nonterminal>
    </nonterminal>
  </nonterminal>
</grammartree>
</speech_hyp>

Signal information

TODO: Add information about the signal status optionally published at the scope /isr/status (if enabled via the isr option -x).

Changing the acoustic energy thresholds

The ISR component also listens for requests to change its signal energy thresholds (to reduce or enhance its sensitivity, e.g. for "disabling" ISR while speech synthesis is running) under the scope /isr/param. It expects these to be XML-formatted strings with a syntax like that which is shown in the following example.

Example data:

<sr db_start="45" db_utt="46" />

In this example, db_start specifies the start-utterance energy threshold and db_utt the within-utterance energy threshold (in dB). They correspond to the -s and the -u command line options for isr, respectively.