Dataset Processing Tools

This project contains a collection of different scripts used for the recording, post-processing, annotation and use of multi-modal corpora (e.g. in HRI research).

DatasetProcessingTools (aka. the Vernissage toolchain)


This is the toolchain used in the recording and the post-processing of the Vernissage data set. More information about the data set and its collection can be found on the dataset homepage and in the following two publications:

Post-processing, conversion and view generation scripts

Most of this process is based on the tools from the RSBag toolchain, GStreamer, FFMPEG, Praat and some other useful tools.

An initial overview of the data collection, conversion and generation of views (e.g. for annotation tools) on the collected data can also be found in the presentation slides in this folder: source:talks/2012_03_Dataset_Recording_PostProc_RSB

Here is a brief description of the scripts used in this process (source:scripts/DatasetProcessingTools/src), most of them also have a lot of helpful comments.

Scripts for the main conversion / view generation process:

  • source:scripts/DatasetProcessingTools/src/ Contains a lot of functions for data conversion (e.g. video / audio format conversion, cutting, calling the bag-tools from python etc.). These individual small steps are used as the building blocks for the following three larger scripts.
  • source:scripts/DatasetProcessingTools/src/ Step 1 in the conversion process: Pre-processes the data, i.e. calculates missing timestamps for unsynchronized data, converts to more common data formats etc. without altering the content of the data in any way. Basically takes a folder with .tide recordings and raw video files from external cameras, tries to synchronize the external videos to the data from the .tide files (by using a Praat script to calculate cross-correlations on the audio tracks), converts some videos to more useful formats etc.
  • source:scripts/DatasetProcessingTools/src/ Step 2 in the process: Creates a "view" to a specific temporal part of the data by cutting all the video and audio streams to show the same stretch of time. This expects the data in the form generated by the script. Basically takes the output from the previous step, an offset and length of the view in one of the original videos (in this case the merged video with the audio/video from Nao is used as the reference) and generates smaller video and audio files that all show the same stretch of time.
  • source:scripts/DatasetProcessingTools/src/ Alternate version of step 2: Since we noticed that in our original recordings the timestamps of the audio data recorded from the Nao robot had some problems, using them as the reference for the synchronization and view generation introduced a lot of problems/errors. This is a version of the view generation script that does not depend on the robots audio data, but instead uses the fixed naovideo.avi without sound as a reference for the offset/length of the view.

Other helpful scripts / tools:

KuHa2011 tools


TODO jseele?



TODO Lars?