History

Dataset Processing Tools¶

Dataset Processing Tools

This project contains a collection of different scripts used for the recording, post-processing, annotation and use of multi-modal corpora (e.g. in HRI research).

DatasetProcessingTools (aka. the Vernissage toolchain)¶

source:scripts/DatasetProcessingTools

This is the toolchain used in the recording and the post-processing of the Vernissage data set. More information about the data set and its collection can be found on the dataset homepage and in the following two publications:

Wienke, J., Klotz, D., and Wrede, S., A Framework for the Acquisition of Multimodal Human-Robot Interaction Data Sets with a Whole-System Perspective, LREC 2012 Workshop on Multimodal Corpora for Machine Learning, Istanbul, Turkey: 2012
Jayagopi et al, D., The Vernissage Corpus: A Conversational Human-Robot-Interaction Dataset, Human Robot Interaction (HRI), 2013

Post-processing, conversion and view generation scripts¶

Most of this process is based on the tools from the RSBag toolchain, GStreamer, FFMPEG, Praat and some other useful tools.

An initial overview of the data collection, conversion and generation of views (e.g. for annotation tools) on the collected data can also be found in the presentation slides in this folder: source:talks/2012_03_Dataset_Recording_PostProc_RSB

Here is a brief description of the scripts used in this process (source:scripts/DatasetProcessingTools/src), most of them also have a lot of helpful comments.

Scripts for the main conversion / view generation process:

source:scripts/DatasetProcessingTools/src/converter.py: Contains a lot of functions for data conversion (e.g. video / audio format conversion, cutting, calling the bag-tools from python etc.). These individual small steps are used as the building blocks for the following three larger scripts.
source:scripts/DatasetProcessingTools/src/preprocess.py: Step 1 in the conversion process: Pre-processes the data, i.e. calculates missing timestamps for unsynchronized data, converts to more common data formats etc. without altering the content of the data in any way. Basically takes a folder with .tide recordings and raw video files from external cameras, tries to synchronize the external videos to the data from the .tide files (by using a Praat script to calculate cross-correlations on the audio tracks), converts some videos to more useful formats etc.
source:scripts/DatasetProcessingTools/src/elanview.py: Step 2 in the process: Creates a "view" to a specific temporal part of the data by cutting all the video and audio streams to show the same stretch of time. This expects the data in the form generated by the preprocess.py script. Basically takes the output from the previous step, an offset and length of the view in one of the original videos (in this case the merged video with the audio/video from Nao is used as the reference) and generates smaller video and audio files that all show the same stretch of time.
source:scripts/DatasetProcessingTools/src/fixedelanview.py: Alternate version of step 2: Since we noticed that in our original recordings the timestamps of the audio data recorded from the Nao robot had some problems, using them as the reference for the synchronization and view generation introduced a lot of problems/errors. This is a version of the view generation script that does not depend on the robots audio data, but instead uses the fixed naovideo.avi without sound as a reference for the offset/length of the view.

Other helpful scripts / tools:

source:scripts/DatasetProcessingTools/src/elantier2srt.py - Script that can convert an Elan tier with annotations to a .srt file with subtitles that can be loaded and shown together with the video by normal video players (e.g. VLC).
source:scripts/DatasetProcessingTools/src/findoffset.praat - Praat script that tries to find an offset (in seconds) between two audio recordings by calculating the cross-correlation between them. Used to synchronize video from external cameras (which has to have audio!) with the data recorded through the bag-tools (e.g. the audio data from robot or a sound card, needs to have recorded the same situation as the external cameras for a useful comparison), which has good timestamps.
source:scripts/DatasetProcessingTools/src/csv2textgrid.py - Converts intervals (e.g. speech transcriptions) specified in a .csv file to a Praat TextGrid (TODO Lars?)
source:scripts/DatasetProcessingTools/src/ivcheck.py - Simple consistency check for time interval tables (e.g. do the intervals overlap etc.) (TODO Lars?)

KuHa2011 tools¶

source:KuHa2011

TODO jseele?

fktool¶

source:scripts/fktool/trunk

TODO Lars?

Dataset Processing

Wiki

Dataset Processing Tools¶