Wiki

Dependencies

  • Flann (Fast Library for Approximate Nearest Neighbors) library version 1.7
  • FFTW3 (Fast Fourier Transform) library version 3.3
  • libsndfile library version 1.0
  • OpenCV library version 2.3 (minimal version)

Compilation

  • On INRIA machine (pollux)
    cmake .. -DFLANN_CUSTOM=/home/nao/software/flann-1.7.1-src -DOpenCV_CUSTOM=/home/nao/Work/OpenCV-2.3.0/
    

Simple gesture recognition application

It relies on 4 modules :
  • Visual descriptor module
  • Audio descriptor module
  • AV Features collector module
  • Decision taking module

Accepted Inputs

  • Visual descriptor module Inputs
    • rst.vision.Image: disparity map
    • rst.vision.Faces : left detected faces
    • file name for the file containing the audio/visual gesture descriptors to match
    • Value ?
    • Input Scope Name: "/SynchFaceDisp" (by default)
    • Output Scope Name : "/nao/VisualDescriptor" (by default)
  • Audio descriptor module Inputs
    • rst.audition.SoundChunk : audio data
    • file name for the file containing the audio/visual gesture descriptors to match
    • Input Scope Name : /nao/audio/all (by default)
    • Output Scope Name : /nao/AudioDescriptor (by default)
  • AV Features collector module Inputs
    • rst::math::Vec1DFloat : audio and visual synchronized descriptors
    • rst.vision.Image : left camera image
    • Input Scope Name : /SynchDescriptors (by default)
    • Output Scope Name : /nao/AVDescriptor (by default)
  • Decision taking module Inputs
    • rst::math::Vec1DFloat : audio and visual matched descriptors
    • file name for the file containing model signature
    • Input Scope Name : /nao/AVDescriptor (by default)

Provided Outputs

  • Visual descriptor module Outputs
    • rst::math::Vec1DFloat
  • Audio descriptor module Outputs
    • rst::math::Vec1DFloat
  • AV Features collector module Outputs
    • string
  • Decision taking module Output

User Manual

  • Steps to use the module.
  1. Run on live:
    rsb_timesync --outscope "/SynchOutput" --primscope "/nao/vision/LeftRect" --supscope "/nao/vision/RightRect" --strategy approxt &
    rsb_timesync --outscope "/SynchDescriptors" --primscope "/nao/VisualDescriptor" --supscope "/nao/AudioDescriptor" --strategy approxt &
    rsb_timesync --outscope "/SynchFaceDisp" --primscope "/DisparityImage" --supscope "/vision/faces/left" --strategy approxt &
  2. Run rectification program
    • rsb_rectify_image -i /nao/vision/0 -o /nao/vision/LeftRect -c $prefixStereoMatching/../test/left_homography_inria.txt -g &
    • rsb_rectify_image -i /nao/vision/1 -o /nao/vision/RightRect -c $prefixStereoMatching/../test/right_homography_inria.txt -g &
  3. Run disparity map computation program :
    • rsb_stereogcs -i /SynchOutput -r /nao/vision/RightRect -l /nao/vision/LeftRect -mindisp -100 -maxdisp 50 -thr 0.8 &
  4. Run face detector on left camera images :
    • rsbfacedetection -i /nao/vision/0 -o /vision/faces/left -m $FACEDETECTIONSRC/3rdparty/eyedea/lib64/eyefacesdk &
  5. Run visual descriptor
    • rsb_visual_descriptor -i /SynchFaceDisp -d /DisparityImage -f /nao/faces/left -o /nao/VisualDescriptor -c $prefixRobotGestures/../kmeans -k 500 &
  6. Run audio descriptor
    • rsb_auditory_descriptor -i /nao/audio/all -o /nao/AudioDescriptor -c $prefixRobotGestures/../kmeans -k 500 &
  7. Run AV features collector
    • rsb_collect_av_features -i /SynchDescriptors -v /nao/VisualDescriptor -a /nao/AudioDescriptor -o /nao/AVDescriptor -l /nao/vision/0 &
  8. Run decision program
    • rsb_take_decision -i /SynchDescriptors -a /nao/AudioDescriptor -v /nao/VisualDescriptor -m ../models &
  9. Run to visualize:
    • rsb_videoreceiver -i "/DisparityImage"