rd-0001.txt

J. Moringen, 02/14/2014 12:52 PM

Download (24.5 KB)

 
1
RD: 1
2
Title: TIDE log format
3
Date: 2013-05-11
4
Author: Geoffrey Biggs <geoffrey dot biggs at aist.go.jp>, Ingo Lütkebohle <iluetkeb@techfak.uni-bielefeld.de>
5
Type: General
6
Content-Type: text/x-rst
7
Created: 2011-01-14
8
Raw-text:
9

    
10

    
11
.. contents:: Table of Contents
12
.. section-numbering::
13

    
14

    
15
Status of this RRFC
16
===================
17

    
18
This document is a draft of a future Robotics Request for Comment. It
19
must not be formally referenced.
20

    
21

    
22
Abstract
23
========
24

    
25
This RRFC describes the Time-Indexed Data Entries (TIDE) log file
26
format. This format is based on the Bag format, developed for logging
27
ROS messages. Files using this format are called *tide files* and
28
typically have the extension ``.tide`` or ``.tid``. The TIDE format is
29
designed for fast streaming of data to disc, fast playback and random
30
access in time of the logged data.
31

    
32

    
33
Rationale
34
=========
35

    
36
Logging of data in robotics is useful for applications including
37
reproduction of experiments, visualisation of sensor data and long-term
38
logging of robot health data for later diagnostics.
39

    
40
The major features of the TIDE format are:
41

    
42
 - Space to store data type definitions with the data, for long-term
43
   compatibility.
44

    
45
 - Abstraction away from specific transports, allowing a TIDE file to be
46
   playable into multiple frameworks and ensuring the data is
47
   understandable even without a transport.
48

    
49
 - Suitable for streaming data to disc, provided enough memory is available to
50
   store a reasonable number of indices and channel information.
51

    
52
 - Ability to compress data records, allowing a smaller file size while
53
   maintaining random-access capability.
54

    
55
 - TIDE files can be introspected and searched in time without needing to load
56
   or decompress the recorded data.
57

    
58

    
59
File format
60
===========
61

    
62
The TIDE format is made up of a set of blocks, each with a specific purpose.
63
The structure of a block is shown below:
64

    
65
====== ======================= =================================================================
66
Field  Size                    Purpose
67
====== ======================= =================================================================
68
Tag    4 bytes                 Indicates the type of block. Typically a 4-byte character string.
69
Size   Unsigned 8 byte integer Size of the block in bytes, excluding the tag and size values.
70
Data   `variable`              The block's data. The format will depend on the block type.
71
====== ======================= =================================================================
72

    
73
A TIDE file MUST begin with the `TIDE block`_, which indicates that it is a
74
TIDE file and gives the format version, as well as some basic information.
75

    
76
The data contained in a block is a fixed format and order. This allows the data
77
to be read easily. Each block includes the size of the entire block's data. To
78
support future extensions to the format that may add information to a block,
79
file readers are REQUIRED to expect the next block to begin after the number of
80
bytes specified in the size field, regardless of if this matches the size of
81
data they expect. This allows for additional fields to be appended to a block
82
by future minor revisions of the format without preventing older
83
implementations from reading the file.
84

    
85
All values in all blocks, with the exception of pre-serialised entry and format
86
data, MUST be little-endian.
87

    
88
All time stamps stored in a TIDE file are in nanoseconds since the POSIX Epoch,
89
1970-01-01 00:00:00 +0000 (UTC).
90

    
91
Blocks
92
======
93

    
94
This section specifies the format of the blocks that may be found in a TIDE
95
file.
96

    
97
.. _`TIDE block`:
98

    
99
TIDE
100
----
101

    
102
The ``TIDE`` block MUST occur only once in the file and MUST occur at the very
103
beginning of the file.
104

    
105
====================== ======================= ============================================
106
Field                  Size                    Purpose
107
====================== ======================= ============================================
108
Major version number   1 unsigned byte         Major version of the TIDE format used.
109
Minor version number   1 unsigned byte         Minor version of the TIDE format used.
110
====================== ======================= ============================================
111

    
112
Implementations are REQUIRED to support any TIDE file with the same major
113
version as the implementation, although not all features supported by the file
114
may be available.
115

    
116
For TIDE log files created according to this version of the TIDE
117
specification, the major version number should be 2 (two) and the
118
minor version number should be 0 (zero).
119

    
120
.. _`META block`:
121

    
122
Meta
123
----
124

    
125
TODO per-channel meta-data would make sense
126

    
127
Zero or more ``META`` blocks MAY occur at arbitrary positions within in the TIDE log file.
128
TODO
129

    
130
====================== ======================= ============================================
131
Field                  Size                    Purpose
132
====================== ======================= ============================================
133
Metadata count         2-byte unsigned integer
134
Metadata entries       `variable`
135
====================== ======================= ============================================
136

    
137
The variable-length ``Metadata entries`` section contains a sequence
138
of TODO. Each entry is TODO. The section has exactly ``Metadata
139
count`` occurances of the following fields:
140

    
141
====================== ======================= ============================================
142
Field                  Size                    Purpose
143
====================== ======================= ============================================
144
Key size               2-byte unsigned integer
145
Key                    ``Key size`` bytes
146
Value size             4-byte unsigned integer
147
Value                  ``Value size`` bytes
148
====================== ======================= ============================================
149

    
150
Dublin Core Metadata
151
~~~~~~~~~~~~~~~~~~~~
152

    
153
Should be encoded as the string :samp:`dc:{KEY}` in the key field and
154
an appropriate value string in the value field. The following Dublin
155
Core elements MAY be used:
156

    
157
#. Title
158
#. Creator
159
#. Subject
160
#. Description
161
#. Publisher
162
#. Contributor
163
#. Date
164
#. Type
165
#. Format
166
#. Identifier
167
#. Source
168
#. Language
169
#. Relation
170
#. Coverage
171
#. Rights
172

    
173
All elements are optional and may be repeated.
174

    
175
See http://dublincore.org/documents/dces/
176
    http://dublincore.org/documents/dcmi-type-vocabulary/index.shtml
177

    
178
.. _`TYPE block`:
179

    
180
Type
181
----
182

    
183
Zero or more ``TYPE`` blocks MAY occur at arbitrary positions within
184
the TIDE log file. Each ``TYPE`` block describes ONE data type used
185
directly or indirectly (as a dependency of some other data type) in
186
one or more channels (See `CHAN block`_). The structure and attributes
187
of this block are based on and designed to interoperate well with the
188
mechanism for referring to data types described in RETF RD 8 [6]_.
189

    
190
.. note::
191

    
192
   TODO(jmoringe): add constraint that ``TYPE`` blocks MUST occur
193
   before ``CHAN`` blocks which use them and MUST be topologically
194
   sorted according to the dependency relation?
195

    
196
.. note::
197

    
198
   TODO(jmoringe): is it always SHA1 or do we need the following::
199

    
200
     Hash function size     1 unsigned byte              Length of the identifier of the used hash function.
201
     Hash function          ``Hash function size`` bytes Identifier of the used hash function.
202

    
203
.. note::
204

    
205
   TODO(jmoringe): do we need ``Hash size`` below, or can we fix the
206
   size of the hash? Also applies to ``Dependency entries`` and ``Type
207
   hash entries`` fields.
208

    
209
========================= ================================= =================================================================
210
Field                     Size                              Purpose
211
========================= ================================= =================================================================
212
Hash size                 2-byte unsigned integer           Length of the type hash (in bytes).
213
Hash                      ``Hash size`` bytes               Hash value identifying the defined data type.
214

    
215
Syntax size               1 unsigned byte                   Length of the syntax (in bytes).
216
Syntax                    ``Syntax size`` bytes             Name of the syntax in which the data type is defined.
217

    
218
Display name size         2-byte unsigned integer           Length of the display name (in bytes).
219
Display name              ``Display name size`` bytes       Human-readable name of the data type as an UTF-8 string.
220

    
221
Version size              2-byte unsigned integer           Length of the version string (in bytes).
222
Version                   ``Display name size`` bytes       Human-readable version of the data type as an ASCII string.
223

    
224
Definition size           4-byte unsigned integer           Length of the definition (in bytes).
225
Definition                ``Definition size`` bytes         Definition of the data type according to the rules of ``Syntax``.
226

    
227
Acceptable source count   1 unsigned byte                   Number of the acceptable source entries in this block.
228
Acceptable source entries `variable`                        ``Acceptable source count`` acceptable source entries.
229

    
230
Dependency count          2-byte unsigned integer           Number of dependency entries in this block.
231
Dependency entries        `variable`                        ``Dependency count`` dependency entries.
232
========================= ================================= =================================================================
233

    
234
The ``Hash`` field contains a "exact topic" hash (See [6]_) value
235
identifying the data type. Given the syntax designated by the
236
``Syntax`` field and the value of the ``Definition`` field, the hash
237
value MUST be computed as described in the "Proposed Approach" and
238
"Data Preparation" sections of [6]_.
239

    
240
.. note::
241

    
242
   TODO(jmoringe): we cannot reference RD-8 because of its draft
243
   status, right?
244

    
245
.. important::
246

    
247
   The base32-encoding and embedding in an URN-string described in
248
   [6]_ are not performed.
249

    
250
The ``Syntax`` designates the date type definition syntax according to
251
which the described data type is defined in the ``Definition``
252
field. Implementations SHOULD refer to a precise version of the
253
respective data type definition syntax. For example, a ROS [9]_
254
implementation may refer to a version of the ROS Message description
255
specification [4]_ in this field.
256

    
257
The value of the ``Definition`` field of the ``TYPE`` block contains
258
the textual or binary definition of the data type according to the
259
rules of the syntax designated by the value of the ``Syntax``
260
field. The value of this field, optionally together with
261
``Definition`` fields of ``TYPE`` blocks referenced in ``Dependency
262
entries`` MUST contain enough information to deserialise the data
263
stored in the channel, such that it can be reconstructed later without
264
any extra information from external to the TIDE file. For example, a
265
ROS implementation may store the message description in this field.
266

    
267
.. note::
268

    
269
   TODO(jmoringe): should we have a ``Documentation link`` field?
270

    
271
The variable-length ``Acceptable source entries`` section contains a
272
sequence of acceptable source entries. Each entry is an "acceptable
273
source" URL (See "as" field in "Link Format" section of [6]_) from
274
which the definition of the data type being described can be
275
obtained. The section has exactly ``Acceptable source count``
276
occurances of the following fields:
277

    
278
====================== ================================= ==================================================================================
279
Field                  Size                              Purpose
280
====================== ================================= ==================================================================================
281
Acceptable source size 2-byte unsigned integer           Length of the acceptable source string (in bytes).
282
Acceptable source      ``Acceptable source size`` bytes  An URL from which the definition of the data type (in ``Syntax``) can be obtained.
283
====================== ================================= ==================================================================================
284

    
285
The variable-length ``Dependency entries`` section contains a sequence
286
of dependencies of the data type being described. Each entry is the
287
"exact topic" hash (See above, [6]_) of the data type to which the
288
dependency refers. The section has exactly ``Dependency count``
289
occurances of the following fields:
290

    
291
========== ======================= =====================================
292
Field      Size                    Purpose
293
========== ======================= =====================================
294
Hash size  2-byte unsigned integer Length of the type hash (in bytes).
295
Hash       ``Hash size`` bytes     "Exact topic" hash of the dependency.
296
========== ======================= =====================================
297

    
298
The ``Hash`` field contains a hash value identifying the data type to
299
which the dependency refers. The TIDE file MUST contain a ``TYPE``
300
block the ``Hash`` field of which is identical to the value of the
301
``Hash`` field of the dependency entry.
302

    
303
.. _`CHAN block`:
304

    
305
Channel
306
-------
307

    
308
The ``CHAN`` block stores the meta-data for ONE channel of data.
309

    
310
================== ================================= ============================================================
311
Field              Size                              Purpose
312
================== ================================= ============================================================
313
ID                 4-byte unsigned integer           Channel identification, used to link entries to the channel.
314
Name size          1 unsigned byte                   Length of the channel name (in bytes).
315
Name               ``Name size`` bytes               Name of the channel, as an UTF-8 string.
316
Source name size   4-byte unsigned integer           Length of the source string (in bytes).
317
Source name        ``Source name size`` bytes        Human-readable source description (encoded as UTF-8).
318
Source config size 4-byte unsigned integer           Length of the source config data (in bytes).
319
Source config      ``Source config size`` raw bytes  Raw data describing the source of this channel's data.
320
Type hash count    1 unsigned byte                   Number of hashes identifying used data types.
321
Type hash entries  `variable`                        ``Type hash count`` type hash entries.
322
================== ================================= ============================================================
323

    
324
A channel stores a set of data entries of a single type, indexed by time. The
325
channel is identified internally in the TIDE file by its unique identification,
326
stored in the ``ID`` field. The channel may be identified externally by its ID
327
or by its unique human-readable name, stored in the ``Name`` field.
328

    
329
.. note::
330

    
331
   TODO(jmoringe): The original channel definition had this ``Type`` field:
332

    
333
     The ``Type`` field indicates the type of source and format in use
334
     by this channel. It should indicate both the connection/transport
335
     type and serialisation type, and will vary by these. For example,
336
     a logging tool for the OpenRTM-aist architecture may specify
337
     ``openrtm-cdr`` or ``openrtm-ros`` to indicate that the source
338
     connection was an OpenRTM-aist port using either the CDR
339
     serialisation or the ROS serialisation scheme.
340

    
341
   In the example, we could no longer describe "the
342
   connection/transport type" and the fact that "the source connection
343
   was an OpenRTM-aist port".
344

    
345
The ``Source name`` field provides a human-readable form of the source
346
information. This allows TIDE file introspection tools to describe the
347
file contents more completely. For example, this field could contain
348
the human-readable path of a ROS topic that provided the data.
349

    
350
The ``Source config`` field provides space for describing the source
351
of the channel's data in a machine-readable form. Typically this will
352
describe the connection that was recorded. For example, an
353
implementation for ROS may store the connection header in this field,
354
while an implementation for OpenRTM-aist may store the name of the
355
source port and the port's properties.
356

    
357
The variable-length ``Type hash entries`` section contains references
358
to data types describing the data stored in the channel. The section
359
has exactly ``Type hash count`` occurances of the following fields:
360

    
361
========== ========================== =============================================================================
362
Hash size  2-byte unsigned integer    Length of the hash (in bytes).
363
Hash       ``Hash size`` bytes        Hash identifying the `TYPE block`_ defining one of types used by the channel.
364
========== ========================== =============================================================================
365

    
366
The ``Hash`` field contains a hash value identifying a data type used
367
in the channel. The TIDE file MUST contain a ``TYPE`` block the
368
``Hash`` field of which is identical to the value of the ``Hash``
369
field of the entry.
370

    
371
If there is a single entry, the referenced ``TYPE`` block defines the
372
type of the data contained in the channel. For example, the ROS
373
middleware would reference a single ``TYPE`` block containing the ROS
374
Message Description [4]_ of the topic or service recorded in the
375
channel.
376

    
377
Multiple entries may describe the data types of multiple "frames"
378
within each entry of the channel. This situation arises with
379
transports supporting multiple "frames" in each message such as ØMQ
380
[8]_.
381

    
382
Finally, multiple entries may describe "outer", containing data types
383
and the contained "inner" data types. For example, the RSB middleware
384
[7]_ would reference a sequence of data types:
385

    
386
#. The first referenced data type would describe the "envelope" or
387
   "notification" data type.
388

    
389
#. The second referenced data type would describe the payload data
390
   type.
391

    
392
.. _`INDX block`:
393

    
394
Index
395
-----
396

    
397
The ``INDX`` block(s) provide(s) an index for random-access in time to the data
398
of one channel stored in the file. They link time stamps with the data values
399
stored in the chunk blocks. A TIDE file may have zero, one or multiple INDX
400
blocks, at any position.
401

    
402
This block has a fixed-length section and a variable-length section. The
403
fixed-length section comes first:
404

    
405
========== ======================= =========================================
406
Field      Size                    Purpose
407
========== ======================= =========================================
408
Channel ID 4-byte unsigned integer Points to the channel this block indexes.
409
Count      4-byte unsigned integer Number of indices in this block.
410
========== ======================= =========================================
411

    
412
The variable-length section contains the indices. It has exactly ``Count``
413
occurances of the following fields:
414

    
415
=========== ======================= ================================================================
416
Field       Size                    Purpose
417
=========== ======================= ================================================================
418
Chunk start 8-byte unsigned integer Starting position of the chunk the referenced entry is in
419
Time stamp  8-byte unsigned integer Time stamp of the entry.
420
Offset      4-byte unsigned integer Offset (in bytes) in the chunk's uncompressed data of the entry.
421
=========== ======================= ================================================================
422

    
423
The offset points to the specific position in the data of the entry relative to
424
the start of its chunk. If the chunk is compressed, the offset refers to the
425
data once it has been uncompressed.
426

    
427
.. _`CHNK block`:
428

    
429
Chunk
430
-----
431

    
432
``CHNK`` blocks store the recorded data entries.
433

    
434
This block has a fixed-length section and a variable-length section. The
435
fixed-length section comes first:
436

    
437
=========== ======================= =====================================================
438
Field       Size                    Purpose
439
=========== ======================= =====================================================
440
Chunk ID    4-byte unsigned integer Chunk identification, used to link entries to chunks.
441
Count       4-byte unsigned integer Number of entries in this chunk.
442
Start       8-byte unsigned integer Time stamp of the first entry in this chunk.
443
End         8-byte unsigned integer Time stmap of the last entry in this chunk.
444
Compression 1 unsigned byte         Indicates the compression used on the entries.
445
=========== ======================= =====================================================
446

    
447
The value of the ``Compression`` field must be one of the following values:
448

    
449
0
450
  No compression.
451

    
452
1
453
  gzip compression.
454

    
455
2
456
  bzip2 compression.
457

    
458
The variable-length section contains the data entries. It has exactly ``Count``
459
occurances of the following fields:
460

    
461
========== ======================= ============================================
462
Field      Size                    Purpose
463
========== ======================= ============================================
464
Channel ID 4-byte unsigned integer Points to the channel this entry belongs to.
465
Time stamp 8-byte unsigned integer Time stamp of the entry.
466
Size       4-byte unsigned integer Size of the following serialised data.
467
Entry      Variable                Serialised entry data.
468
========== ======================= ============================================
469

    
470

    
471
Block ordering
472
==============
473

    
474
Other than the requirement for the file to begin with a ``TIDE`` block, no
475
ordering is required of the blocks in the file. Some orderings may be easier to
476
read or write than others; this is considered an implementation choice and must
477
be balanced with other concerns such as file integrity.
478

    
479

    
480
Implementation examples
481
=======================
482

    
483
This section gives some examples of how TIDE files may be read or written.
484
Implementors should remember that these are examples to demonstrate *simple*
485
methods of implementing the TIDE format for illustrative purposes (for example,
486
the writing method will lead to truncated, unreadable files in the event of an
487
error).
488

    
489
Writing without compression
490
---------------------------
491

    
492
#. Open a new file and write the ``TIDE`` block. Leave the number of channels
493
   and chunks at zero; they will be updated when the file is closed.
494

    
495
#. Write the fixed-length part of the first ``CHNK`` block. Leave the count,
496
   start time stamp and end time stamp at zero; they will be updated when the
497
   chunk is finalised. Make a note of the position in the file that this chunk
498
   began at.
499

    
500
#. When a new channel is added, assign it an identifier
501

    
502
   * When the data type of the new channel has not previously been
503
     used by any other channel, write ``TYPE`` blocks for the data
504
     type of the channel and its transitive dependencies.
505

    
506
   * Write a ``CHAN`` block storing the channel information and
507
     referencing the respective ``TYPE`` block.
508

    
509
#. As entries are received, write them at the end of the file, which will be
510
   the current chunk. Make a note of where each entry is recorded for the
511
   ``INDX`` blocks.
512

    
513
#. When a chunk is finished, go back to its start position and fill in the
514
   fields that were left empty previously.
515

    
516
#. When the file is finished, finalise the current chunk. Then, write the
517
   ``CHAN`` blocks to describe all channels that were recorded. Finally, write
518
   the ``INDX`` blocks.
519

    
520
Reading without compression
521
---------------------------
522

    
523
#. Open the file and read the ``TIDE`` block. Confirm that the format version
524
   is supported.
525

    
526
#. Read through the file, loading ``TYPE``, ``CHAN`` and ``INDX``
527
   blocks and noting the offsets in the file of the ``CHNK`` blocks.
528

    
529
#. As data is requested, look it up in the loaded indices, and use them to find
530
   it in the ``CHNK`` blocks. If a form of deserialization is
531
   necessary, exploit the loaded ``TYPE`` blocks.
532

    
533
Acknowledgements
534
================
535

    
536
This document and the format it describes are based on version 2 of
537
the bag file format created for ROS by Willow Garage [5]_. Several
538
clarifications and the ``TYPE`` block have been proposed by Jan
539
Moringen <jmoringe@techfak.uni-bielefeld.de>, Bielefeld University.
540

    
541

    
542
References
543
==========
544

    
545
.. [1] CORBA specification 3.1 (retrieved 2011-02-08)
546
   http://www.omg.org/spec/CORBA/3.1/Interoperability/PDF
547

    
548
.. [2] Lightweight Communications and Marshalling (retrieved 2011-02-08)
549
   http://code.google.com/p/lcm/
550

    
551
.. [3] Google Protocol Buffers (retrieved 2011-02-08)
552
   http://code.google.com/p/protobuf/
553

    
554
.. [4] ROS Message Description Specification (retrieved 2011-02-08)
555
   http://www.ros.org/wiki/msg
556

    
557
.. [5] Bags/Format/2.0 - ROS Wiki (retrieved 2011-01-26)
558
   http://www.ros.org/wiki/Bags/Format/2.0
559

    
560
.. [6] RETF RD 8: Uniform referencing of Robot Data Message Types
561
   https://retf.org/TODO
562

    
563
.. [7] Robotics Systems Bus
564
   http://code.cor-lab.org/projects/rsb
565

    
566
.. [8] ØMQ
567
   http://www.zeromq.org/
568

    
569
.. [9] Robot Operating System
570
   http://www.ros.org/wiki/
571

    
572

    
573
Copyright
574
=========
575

    
576
This document is licensed under the `Creative Commons Attribution 3.0`_
577
license.
578

    
579
.. _`Creative Commons Attribution 3.0`:
580
   http://creativecommons.org/licenses/by/3.0/
581

    
582

    
583

584
..
585
   Local Variables:
586
   mode: indented-text
587
   indent-tabs-mode: nil
588
   sentence-end-double-space: t
589
   fill-column: 70
590
   coding: utf-8
591
   End: