Sound recognition

Soundrec provides solutions to detect and recognizes sounds on the audio stream.

Overview

This recognition module is based on supervised learning. First, it needs an offline training phase with labeled audio data. This phase will generate data useful for the (real-time) recognition phase. The recognition task is divided in three parts :
  • a detection module which listen the audio stream. It cut and send each sound event detected.
  • for each sound, a stabilized auditory image (SAI) and (by reduction dimensionality) a vector representation is generated.
  • this vector representation is used in a classification module, where it is compared.

Training Phase

The training phase is performed by MATLAB code available in the folder matlab/.

Set the labeled data

Audio data with labels is needed. For each class learned by the classifier, you must provide somes audio examples.
Each audio example must :
  • only contains one sound event.
  • must be one-channel WAV format.
  • with sampling frequency = 48kHz.
The folder containing the soundbank must follow this structure :
  • each class must have a folder of its name (if you want to classify fingerclap events, you must have a folder fingerclap).
  • in each folder, each audio file must a name following this rule : class name + id + .wav. The id are the positives integers in ascending order. (if you have three files in your fingerclap folder, their names must be fingerclap1.wav, fingerclap2.wav, fingerclap3.wav.)

The soundbank's structure will be automatically analysed in the next step.

(!) Under MATLAB, the path of the soundbank must be specified in the variable banksounds_path.

Compiling AIMCopy

A binary file is needed to compute SAI into the MATLAB code. It can be compiled from the source code in aimc/.

  • Go in the folder aimc/
  • bash scons.sh
  • Take the right release in build/ and put it at the root of the MATLAB code.

More informations about building and librairies needed on : [[http://code.google.com/p/aimc/]].

Launch the training

  • Don't forget to compile AIMCopy
  • Launch startup.m
  • Launch naotraining.m
Parameters :
  1. AIM_version is the name of the binary file which compute the SAI.
  2. K is the dimensionality of the codebooks.

(!) Others parameters will be explained and configurables in futures releases.

This part generate S+2 files :
  • labels.txt: text file.
  • REFSONS.bin: binary file.
  • C{1,...,S}.bin: S binary files.

(/) When a SAI from a new sound is generated, its recorded on the disk into the folder banksounds_path/sai. It will be reused for a next training phase, if needed, to accelerate the computation.

(!) K will the parameter kmeans in the config file (see below). In the same way, numberref can be computed as sum(categorie_taille) under Matlab.

(!) Known error: If Matlab can't find the path of libstdc++, add /usr/lib/ or /usr/lib64/ to LD_LIBRARY_PATH : setenv('LD_LIBRARY_PATH',['/lib64:/usr/lib64:' getenv('LD_LIBRARY_PATH')])

Setup the recognition

Three files must be created to be able to launch the module.

config

pathcodebooks=./../../data/codebooks
pathref=./../../data/REFSONS.bin
pathrequest=temp.fv
pathlabels=./../../data/labels.txt
pathconfig=./../../data/SAI_dump.aimcconfig
numberref=852
dimensionnality=4
numbersubspaces=144
kmeans=20

  • pathcodebooks is the relative path of the folder where are the codebooks binary files.
  • pathref and pathlabels are the location of the two others files generated by the training phase.
  • pathrequest is a temporary variable which will be removed in futures releases. No need to modify it.
  • pathconfig is the location of the file described below.
  • numberref, dimensionnality, numbersubspaces, kmeans are parameters of the recognition model.

SAI_dump.aimcconfig

module1.name = Gammatone
module1.id = gt
module1.child1 = NAP

module2.name = NAP
module2.id = hcl
module2.child1 = Strobes

module3.name = Strobes
module3.id = local_max
module3.parameters = <<<ENDPARAMS
ENDPARAMS
module3.child1 = SAI

module4.name = SAI
module4.id = weighted_sai
module4.child1 = FV

module5.name = FV
module5.id = fv
module5.parameters = <<<ENDPARAMS
ENDPARAMS

This file specifies the chain of modules used to generate a SAI.

script.scp

useless.wav useless

(!) script.scp must be placed at the root of ./klasifly.
(!) script.scp is useless but needed by a part of the code. It will be removed in future releases.

Real-time recognition

Offline with testsndbfile

The module can be tested with the binary testsndbfile (availaible in audiocues module). The input soundchunks are generated from WAV files.

./testsndlibfile -f location -o /nao/audition/soundchunk/

where location is the relative location of the audio input file (must have a WAV format with two channels).

./klasifly -i /nao/audition/soundchunk/ -c config

where config is the relative location of the config file presented before.

Online with Nao's head

If spread is running and RSB is launched on Nao, you can start the classification by using :

./klasifly -i /nao/audio/all/ -c config

where config is the relative location of the config file presented before.

The result of each classification is printed on the standard output and a SoundEvent is launched on the RSB.