MSS - MotionSpeechSync

The MotionSpeechSync (MSS) component provides a way to play back speech with synchronized motions and pauses on the Nao robot.

Dependencies

At the moment, MSS depends on RSB-Python, NAOqi(-Python) and (only if you want to use the XTT input interface) XTT-Python.

Building/Installing

MSS is written in Python. If the prerequisites are installed, building and installing MSS should basically be as simple as:

python setup.py build
python setup.py install --prefix=/whereever/you/want/to/install/it

If your Nao robot has the RSB "cross-image" installed, you should already have a current version of MSS installed on the robot and do not need to install anything (besides maybe RSB and XTT, if your clients want to use them) on your local machine.

Starting

The setup.py build installs a launcher script which should by default be called just mss. If you can't or don't want to use this generated script (see bug #278 for why this script is currently unusable), you can also just directly launch something like python mss/__init__.py

Starting automatically with NAOqi

TODO

Usage

This are the command line options at the time of this writing, but be sure to check out mss --help by yourself. They should hopefully be more or less self-explanatory:

Usage: mss [options]

Options:
  -h, --help            show this help message and exit
  --output=INTERFACE    Output interface to be used (default naoqi)
  --input=INTERFACE(S)  Input interface(s) to be used, comma separated
                        combination of xtt and rsbrpc (default 'rsbrpc,xtt')

  XTT options:
    Options specific to the XTT input interface.

    --xtt-scope=SCOPE   Scope on which the XTT task server operates (default
                        /mss/xtt)

  RSB RPC options:
    Options specific to the RSB RPC input interface.

    --rpc-scope=SCOPE   Scope on which the RSB RPC server listens (default
                        /mss/rsbrpc)

  NAOqi options:
    Options specific to the NAOqi output interface.

    -b IP               IP of the local broker (for the NaoQi interface,
                        default 0.0.0.0)
    -p PORT             Port of the local broker (for the NaoQi interface,
                        default 9699)
    --pip=IP            The parent broker IP (for the NaoQi interface, default
                        127.0.0.1)
    --pport=PORT        The parent broker port (for the NaoQi interface,
                        default 9559)

Interfaces (XTT/RPC)

MSS currently provides two interfaces for submitting tasks to the program. One is via the eXtensible Task Toolkit (XTT), the other through a simple RSB-based remote procedure call (RPC) server.

The RSB RPC interface provides an RPC server callable at the configured scope with one configured method, say(text). This methods returns when the robot is done saying the text and doing all the motions provided in the input text. As an example, calling this from e.g. Python is quite easy:

mss = rsb.createRemoteServer("/mss/rsbrpc")
mss.say("[bow] Hello, I am Nao!")

The XTT interface accepts tasks provided from task clients (using XTT-Java or XTT-Python) at the configured scope and sets the task state to COMPLETED once the robot is done. The task specification XML document should look like this:

<?xml version="1.0" encoding="utf-8"?>
<MSS>
  <say text="text" />
</MSS>

The text in both interface inputs should follow the rules described below.

Input format

The input strings provided to this component follow a simple text format, where:
  • any normal words (and basic punctuaction like ., comma, ? or !) are treated as text that should be said through Text-to-Speech (e.g. 'I am a talking robot!'),
  • anything in [square brackets] is taken to mean the name of a motion that should start playing at this point in the sentence (e.g. [lookRight])
  • and anything in (parentheses) is taken to be a pause, specified in milliseconds (e.g. (1000)).

The beginning of any motion is synchronized (as close as this is possible with the limitations of the NAOqi API) to the beginning of the word following the motion in the input string. If there is no such word, the motion is played without any accompanying text.

To give an example:

Hello, World! [bow] I am Nao, (2000) a small humanoid robot. [standardPose]
This translates basically to the following sequence of events happening on the robot:
  1. Say "Hello, World!",
  2. Start the motion with the name "bow", synchronized with saying the first word of the following sentence ("I" in this case),
  3. Say "I am Nao" (while in parallel the motion named "bow" executes as long as it takes)
  4. Wait two seconds (2000 ms)
  5. Say "a small humanoid robot"
  6. Do the motion named "standardPose".

It is of course also possible to specify just some text (e.g. 'Hello, World!') or just a motion (e.g. '[lookLeft]'), or basically any combination of any number of the three types (motions, pauses and normal words).

TODO: More examples?!

You can also create a new motion.

Supported Motions

TODO