ISR » History » Version 2

D. Klotz, 06/27/2011 07:25 PM
Some formatting and a TOC

1 2 D. Klotz
{{>toc}}
2 2 D. Klotz
3 1 D. Klotz
h1. ISR
4 1 D. Klotz
5 1 D. Klotz
ISR (short for Incremental Speech Recognition) is a speech recognition component based on the "ESMERALDA framework":http://www.irf.tu-dortmund.de/cms/en/IS/Research/ESMERALDA1/index.html. 
6 1 D. Klotz
7 1 D. Klotz
h2. Interfacing ISR through RSB
8 1 D. Klotz
9 1 D. Klotz
ISR exposes several (input and output) interfaces through RSB, all of them under some sub-scope of the top-level scope @/isr@ (or another top-level scope that can be configured as described below).
10 1 D. Klotz
11 1 D. Klotz
h3. Building and starting ISR with RSB-support
12 1 D. Klotz
13 1 D. Klotz
To build Esmeralda/ISR with RSB-support, you can supply a configuration argument like the following to cmake:
14 1 D. Klotz
<pre>cmake [...] -D WITH_RSB=1 [...]</pre>
15 1 D. Klotz
16 1 D. Klotz
To start ISR with RSB-support, you should supply the following options to the isr binary:
17 1 D. Klotz
<pre>isr [...] -o rsb:isr -m rsb [...]</pre>
18 1 D. Klotz
The @-o rsb:isr@ option tells ISR to use RSB for communication and publish at / listen for everything under the superscope @/isr@. If you want a different top-level scope, you can supply it like this: @-o rsb:ScopeName@. The rest of this document assumes @/isr@ as the top-level scope, so you may need to adjust the sub-scopes accordingly.
19 1 D. Klotz
20 1 D. Klotz
*TODO:* I'm not sure what the @-m rsb@ option is actually doing. Sebastian? Lars?
21 1 D. Klotz
22 1 D. Klotz
h3. Speech recognition hypotheses
23 1 D. Klotz
24 1 D. Klotz
ISR publishes its recognition results under the scope @/isr/hyp@. They are sent as XML-strings with a format like that which is shown in the following example. If isr is configured to work incrementally, incremental results will also be published there and the final stable hypotheses can be found by looking for hypotheses where the @stable@ attribute of the @speech_hyp@ element is @1@.
25 1 D. Klotz
26 1 D. Klotz
*Example data:*
27 1 D. Klotz
28 1 D. Klotz
This is an example of the recognition result of the sentence "hello robot" and the resulting grammar tree (with a specific pre-defined grammar) that would have been published under @/isr/hyp@:
29 1 D. Klotz
30 1 D. Klotz
<pre>
31 1 D. Klotz
<speech_hyp [...] generator="isr" origin="grm_track_xcf" stable="1" update="0" bxml:id="29">
32 1 D. Klotz
<TIMESTAMP>1290275288296</TIMESTAMP>
33 1 D. Klotz
<seq>
34 1 D. Klotz
  <part begin="330" end="630" id="0" type="word">hello</part>
35 1 D. Klotz
  <score acoustic="362.48" combined="362.48" id="0" lm="0.00" />
36 1 D. Klotz
  <part begin="630" end="1030" id="1" type="word">robot</part>
37 1 D. Klotz
  <score acoustic="602.56" combined="602.56" id="1" lm="0.00" />
38 1 D. Klotz
</seq>
39 1 D. Klotz
<grammartree cancel="0" fault="0" skip="0">
40 1 D. Klotz
  <nonterminal name="$S">
41 1 D. Klotz
    <nonterminal name="Greeting">
42 1 D. Klotz
      <terminal refid="0" />
43 1 D. Klotz
      <nonterminal name="$RobotName">
44 1 D. Klotz
        <terminal refid="1" />
45 1 D. Klotz
      </nonterminal>
46 1 D. Klotz
    </nonterminal>
47 1 D. Klotz
  </nonterminal>
48 1 D. Klotz
</grammartree>
49 1 D. Klotz
</speech_hyp>
50 1 D. Klotz
</pre>
51 1 D. Klotz
52 1 D. Klotz
h3. Signal information
53 1 D. Klotz
54 2 D. Klotz
*TODO:* Add information about the signal status optionally published at the scope @/isr/status@ (if enabled via the isr option @-x@).
55 1 D. Klotz
56 1 D. Klotz
h3. Changing the acoustic energy thresholds
57 1 D. Klotz
58 2 D. Klotz
*TODO:* Add information about the signal thresholds read unter the scope @/isr/param@