ISR (short for Incremental Speech Recognition) is a speech recognition component based on the ESMERALDA framework.
Interfacing ISR through RSB¶
ISR exposes several (input and output) interfaces through RSB, all of them under some sub-scope of the top-level scope
/isr (or another top-level scope that can be configured as described below).
Building and starting ISR with RSB-support¶
To build Esmeralda/ISR with RSB-support, you can supply a configuration argument like the following to cmake:
cmake [...] -D WITH_RSB=1 [...]
To start ISR with RSB-support, you should supply the following options to the isr binary:
isr [...] -o rsb:isr -m rsb [...]
-o rsb:isroption tells ISR to use RSB for communication and publish at / listen for everything under the superscope
/isr. If you want a different top-level scope, you can supply it like this:
-o rsb:ScopeName. The rest of this document assumes
/isras the top-level scope, so you may need to adjust the sub-scopes accordingly.
TODO: I'm not sure what the
-m rsb option is actually doing. Sebastian? Lars?
Speech recognition hypotheses¶
ISR publishes its recognition results under the scope
/isr/hyp. They are sent as XML-strings with a format like that which is shown in the following example. If isr is configured to work incrementally, incremental / intermediate results will also be published there and the final stable hypotheses can be found by looking for hypotheses where the
stable attribute of the
speech_hyp element is
This is an example of the recognition result of the sentence "hello robot" and the resulting grammar tree (with a specific pre-defined grammar) that would have been published under
<speech_hyp [...] generator="isr" origin="grm_track_xcf" stable="1" update="0" bxml:id="29"> <TIMESTAMP>1290275288296</TIMESTAMP> <seq> <part begin="330" end="630" id="0" type="word">hello</part> <score acoustic="362.48" combined="362.48" id="0" lm="0.00" /> <part begin="630" end="1030" id="1" type="word">robot</part> <score acoustic="602.56" combined="602.56" id="1" lm="0.00" /> </seq> <grammartree cancel="0" fault="0" skip="0"> <nonterminal name="$S"> <nonterminal name="Greeting"> <terminal refid="0" /> <nonterminal name="$RobotName"> <terminal refid="1" /> </nonterminal> </nonterminal> </nonterminal> </grammartree> </speech_hyp>
TODO: Add information about the signal status optionally published at the scope
/isr/status (if enabled via the isr option
Changing the acoustic energy thresholds¶
The ISR component also listens for requests to change its signal energy thresholds (to reduce or enhance its sensitivity, e.g. for "disabling" ISR while speech synthesis is running) under the scope
/isr/param. It expects these to be XML-formatted strings with a syntax like that which is shown in the following example.
<sr db_start="45" db_utt="46" />
In this example,
db_start specifies the start-utterance energy threshold and
db_utt the within-utterance energy threshold (in dB). They correspond to the
-s and the
-u command line options for isr, respectively.