Bug #808

Problem when receiving played back video data

Added by D. Klotz over 9 years ago. Updated over 9 years ago.

Status:RejectedStart date:01/19/2012
Priority:UrgentDue date:
Assignee:J. Moringen% Done:


Target version:-


We had that problem already directly after recording on the machines in the flat, but it's the same here on the normal university machines.

When i play back the video data recorded at the flat...

$ bag-play /vol/humavips/studies/Y2_Vernissage/corpus/1326283451/video.tide spread://localhost:4803/

... and then then play them back (i am only using naoheadcontrol, which is only a modified version of the rsbvideoreceiver, since i am somehow unable to build the latter currently) ...

$ naoheadcontrol

... this works fine for a short amount of time (~15 seconds?) and then the video hangs and the video receiver starts printing out these error messages:

1326989929457 rsb.spread.ReceiverTask [ERROR]: Error receiving spread message: Spread communication error. Reason: unknown spread receive error
Exception of type: rsb::CommException
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsc.so.0.5.0(_ZN3rsc5debug15LinuxDebugTools15createBacktraceERKj+0x3e) [0xb77206ac]
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsbcore.so.0.5.0(_ZN3rsc5debug10DebugTools13exceptionInfoIN3rsb13CommExceptionEEESsRKT_+0x10a) [0xb757daa8]
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsbcore.so.0.5.0(_ZN3rsb6spread12ReceiverTask7executeEv+0xb28) [0xb757c9c6]
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsc.so.0.5.0(_ZN3rsc9threading14RepetitiveTask3runEv+0x153) [0xb7719647]
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsc.so.0.5.0(_ZN3rsc9threading20ThreadedTaskExecutor11executeTaskEN5boost10shared_ptrINS0_4TaskEEERKy+0x5d) [0xb771322b]
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsc.so.0.5.0(_ZN5boost3_bi5list2INS0_5valueINS_10shared_ptrIN3rsc9threading4TaskEEEEENS2_IyEEEclIPFvS7_RKyENS0_5list0EEEvNS0_4typeIvEERT_RT0_i+0x6d) [0xb771892d]
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsc.so.0.5.0(_ZN5boost3_bi6bind_tIvPFvNS_10shared_ptrIN3rsc9threading4TaskEEERKyENS0_5list2INS0_5valueIS6_EENSC_IyEEEEEclEv+0x48) [0xb77188b8]
    /vol/nao/releases/lucid32-1.12-rc1/lib/librsc.so.0.5.0(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvPFvNS_10shared_ptrIN3rsc9threading4TaskEEERKyENS2_5list2INS2_5valueIS8_EENSE_IyEEEEEEE3runEv+0x22) [0xb77186ba]
    /vol/nao/releases/lucid32-1.12-rc1/lib/libboost_thread-mt.so.1.44.0(thread_proxy+0x65) [0xb7209aa5]
    /lib/tls/i686/cmov/libpthread.so.0(+0x596e) [0xb675896e]
    /lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb658aa4e]

I am not sure this really is a problem with the bag-tools, but since the same visualization (naoheadcontrol in this case) worked fine while displaying the data live while it was being recorded, some connection seems likely.

This was all with the 0.5 tools and the lucid32-1.12-rc1 module installed in the respective Nao-volume.


#1 Updated by D. Klotz over 9 years ago

I wanted to add at least Johannes (probably also Sebastian) as a watcher, but somehow i am unable to do that in this project.

#2 Updated by J. Moringen over 9 years ago

Added Sebastian and Johannes. Changed David's role: reporter -> developer.

#3 Updated by D. Klotz over 9 years ago

This seems to be more widespread than just with tools like the videoreceiver. I just tried a basic bag-play -> bag-record cycle and got the following errors:

$ bag-record -o /tmp/1326283451_video.tide spread://localhost:4803/

# then bag-play is started on another console like in the original description

# after some time, bag-records starts giving errors like this:
[ WARN (#<IN-PUSH spread://localhost:4803/ {CEB4379}> RSB.TP) ]  Failed to receive a notification: Error receiving: NET-ERROR-ON-SESSION

During all this, the spread daemon continues to run without any apparent errors. I just hope theres no real problem with our recorded data...

#4 Updated by J. Wienke over 9 years ago

My original suspicion was that maybe the packages produced by cl-rsb are too big for spread?

#5 Updated by J. Moringen over 9 years ago

I just tried something similar to David's experiment and transmitted the whole file from an bag-play process to a CL logger process without errors. I used

  • 64bit machine
  • Spread 4.00.00 Built 29/November/2006
  • (Slightly ahead of) trunk versions of logger and bag-play

Are the failing binaries available somewhere?

#6 Updated by J. Moringen over 9 years ago

I traced the sending of Spread messages in bag-play and found that all message sizes are below 100,003 bytes (100,002 is the maximum message).

David's observation is typical for the case in which the listener cannot keep up with the informer. However, the throughput is rather low in this case:

[jmoringe@azurit ~]$ code/cl-rsb-tools/trunk/build-x86_64/logger -s statistics spread://localhost:4803
Now                             |Rate        |Throughput      |Latency         
2012-01-19T17:46:08.760267+01:00|         N/A|             N/A|…117.574 ± 0.000
2012-01-19T17:46:09.734350+01:00|      14.594|     8968005.605|…117.679 ± 0.034
2012-01-19T17:46:10.734246+01:00|      12.003|     7374938.984|…117.921 ± 0.176
2012-01-19T17:46:11.733952+01:00|      14.006|     8605119.761|…118.399 ± 0.096
2012-01-19T17:46:12.733682+01:00|      20.006|    12292142.856|…118.275 ± 0.082
2012-01-19T17:46:13.734262+01:00|       9.994|     6140684.373|…118.724 ± 0.177
2012-01-19T17:46:14.733924+01:00|      16.006|     9833822.491|…118.945 ± 0.058
2012-01-19T17:46:15.733509+01:00|      11.005|     6761564.612|…119.163 ± 0.112
2012-01-19T17:46:16.734559+01:00|      19.979|    12274891.269|…119.356 ± 0.082
2012-01-19T17:46:17.734345+01:00|      23.007|    14135582.505|…119.262 ± 0.112
2012-01-19T17:46:18.733913+01:00|       9.004|     5532211.366|…119.384 ± 0.182
2012-01-19T17:46:19.733488+01:00|      20.009|    12293569.354|…119.724 ± 0.062
2012-01-19T17:46:20.734066+01:00|      14.992|     9211063.382|…119.566 ± 0.094
2012-01-19T17:46:21.733730+01:00|       9.003|     5531580.485|…120.075 ± 0.174
2012-01-19T17:46:22.734389+01:00|      26.983|    16578505.707|…120.168 ± 0.125
2012-01-19T17:46:23.733971+01:00|      13.006|     7990924.007|…120.192 ± 0.077
2012-01-19T17:46:24.733540+01:00|      26.012|    15981592.195|…120.240 ± 0.113
2012-01-19T17:46:25.734192+01:00|      16.989|    10438568.772|…119.976 ± 0.104
2012-01-19T17:46:26.733776+01:00|      13.006|     7990820.080|…120.399 ± 0.099

#7 Updated by J. Wienke over 9 years ago

The strange thing is that this seems to happen only with bag-play. Can you replay the files with a different version of bag-play without problems?

#8 Updated by J. Moringen over 9 years ago

I can now reproduce the problem. No idea why it happens now but not previously.

#9 Updated by S. Wrede over 9 years ago

Did you ever try if this happens also using the tcp-transport?

#10 Updated by D. Klotz over 9 years ago

Other experiments, with mixed results: The common-lisp logger basically behaves the same as bag-record in my last experiment, producing the same error message. Starting bag-play with --replay-strategy 'fixed-rate :rate 5' and the cl-logger and an instance of naoheadcontrol had the latter producing the same error messages after a while, but so far the slower rate seems to have fixed the problem at least for the logger.

#11 Updated by J. Wienke over 9 years ago

These files are recorded with VGA? If so, we never had this much data before.

#12 Updated by J. Moringen over 9 years ago

The images are 640x480 YUV422 (I experimentally exported a series of PNG-images in the context of a different experiment) - the data is OK.

The throughput peaks are at ~ 25 Hz and ~ 15 MB/s (see previous comment). I thought, the Spread transport could handle this, but maybe it cannot?

#13 Updated by S. Wrede over 9 years ago

  • Priority changed from High to Urgent

First, the correctness of the recorded data is the most important thing right now. With regard to Spread and its max throughput, I am not fully convinced as Johannes just experimented with the new Stereo head and reported 2x20FPS in 640x480. Johannes?

Comming back to my previous comment: Could anyone try this with the socket transport?

#14 Updated by J. Wienke over 9 years ago

Also, I must have been able to stand this load. Otherwise the data would not have been recorded. ;)

#15 Updated by J. Moringen over 9 years ago

  • Status changed from New to Rejected

Turned out to most likely be the same Spread-related problem as described in #835.

Also available in: Atom PDF