rd-0001.txt

J. Moringen, 02/14/2014 12:47 PM

Download (22.7 KB)

 
1
RD: 1
2
Title: TIDE log format
3
Date: 2013-05-11
4
Author: Geoffrey Biggs <geoffrey dot biggs at aist.go.jp>, Ingo Lütkebohle <iluetkeb@techfak.uni-bielefeld.de>
5
Type: General
6
Content-Type: text/x-rst
7
Created: 2011-01-14
8
Raw-text:
9

    
10

    
11
.. contents:: Table of Contents
12
.. section-numbering::
13

    
14

    
15
Status of this RRFC
16
===================
17

    
18
This document is a draft of a future Robotics Request for Comment. It
19
must not be formally referenced.
20

    
21

    
22
Abstract
23
========
24

    
25
This RRFC describes the Time-Indexed Data Entries (TIDE) log file
26
format. This format is based on the Bag format, developed for logging
27
ROS messages. Files using this format are called *tide files* and
28
typically have the extension ``.tide`` or ``.tid``. The TIDE format is
29
designed for fast streaming of data to disc, fast playback and random
30
access in time of the logged data.
31

    
32

    
33
Rationale
34
=========
35

    
36
Logging of data in robotics is useful for applications including
37
reproduction of experiments, visualisation of sensor data and long-term
38
logging of robot health data for later diagnostics.
39

    
40
The major features of the TIDE format are:
41

    
42
 - Space to store data type definitions with the data, for long-term
43
   compatibility.
44

    
45
 - Abstraction away from specific transports, allowing a TIDE file to be
46
   playable into multiple frameworks and ensuring the data is
47
   understandable even without a transport.
48

    
49
 - Suitable for streaming data to disc, provided enough memory is available to
50
   store a reasonable number of indices and channel information.
51

    
52
 - Ability to compress data records, allowing a smaller file size while
53
   maintaining random-access capability.
54

    
55
 - TIDE files can be introspected and searched in time without needing to load
56
   or decompress the recorded data.
57

    
58

    
59
File format
60
===========
61

    
62
The TIDE format is made up of a set of blocks, each with a specific purpose.
63
The structure of a block is shown below:
64

    
65
====== ======================= =================================================================
66
Field  Size                    Purpose
67
====== ======================= =================================================================
68
Tag    4 bytes                 Indicates the type of block. Typically a 4-byte character string.
69
Size   Unsigned 8 byte integer Size of the block in bytes, excluding the tag and size values.
70
Data   `variable`              The block's data. The format will depend on the block type.
71
====== ======================= =================================================================
72

    
73
A TIDE file MUST begin with the `TIDE block`_, which indicates that it is a
74
TIDE file and gives the format version, as well as some basic information.
75

    
76
The data contained in a block is a fixed format and order. This allows the data
77
to be read easily. Each block includes the size of the entire block's data. To
78
support future extensions to the format that may add information to a block,
79
file readers are REQUIRED to expect the next block to begin after the number of
80
bytes specified in the size field, regardless of if this matches the size of
81
data they expect. This allows for additional fields to be appended to a block
82
by future minor revisions of the format without preventing older
83
implementations from reading the file.
84

    
85
All values in all blocks, with the exception of pre-serialised entry and format
86
data, MUST be little-endian.
87

    
88
All time stamps stored in a TIDE file are in nanoseconds since the POSIX Epoch,
89
1970-01-01 00:00:00 +0000 (UTC).
90

    
91
Blocks
92
======
93

    
94
This section specifies the format of the blocks that may be found in a TIDE
95
file.
96

    
97
.. _`TIDE block`:
98

    
99
TIDE
100
----
101

    
102
The ``TIDE`` block MUST occur only once in the file and MUST occur at the very
103
beginning of the file.
104

    
105
====================== ======================= ============================================
106
Field                  Size                    Purpose
107
====================== ======================= ============================================
108
Major version number   1 unsigned byte         Major version of the TIDE format used.
109
Minor version number   1 unsigned byte         Minor version of the TIDE format used.
110
====================== ======================= ============================================
111

    
112
Implementations are REQUIRED to support any TIDE file with the same major
113
version as the implementation, although not all features supported by the file
114
may be available.
115

    
116
For TIDE log files created according to this version of the TIDE
117
specification, the major version number should be 2 (two) and the
118
minor version number should be 0 (zero).
119

    
120
.. _`TYPE block`:
121

    
122
Type
123
----
124

    
125
Zero or more ``TYPE`` blocks MAY occur at arbitrary positions within
126
the TIDE log file. Each ``TYPE`` block describes ONE data type used
127
directly or indirectly (as a dependency of some other data type) in
128
one or more channels (See `CHAN block`_). The structure and attributes
129
of this block are based on and designed to interoperate well with the
130
mechanism for referring to data types described in RETF RD 8 [6]_.
131

    
132
.. note::
133

    
134
   TODO(jmoringe): add constraint that ``TYPE`` blocks MUST occur
135
   before ``CHAN`` blocks which use them and MUST be topologically
136
   sorted according to the dependency relation?
137

    
138
.. note::
139

    
140
   TODO(jmoringe): is it always SHA1 or do we need the following::
141

    
142
     Hash function size     1 unsigned byte              Length of the identifier of the used hash function.
143
     Hash function          ``Hash function size`` bytes Identifier of the used hash function.
144

    
145
.. note::
146

    
147
   TODO(jmoringe): do we need ``Hash size`` below, or can we fix the
148
   size of the hash? Also applies to ``Dependency entries`` and ``Type
149
   hash entries`` fields.
150

    
151
========================= ================================= =================================================================
152
Field                     Size                              Purpose
153
========================= ================================= =================================================================
154
Hash size                 2-byte unsigned integer           Length of the type hash (in bytes).
155
Hash                      ``Hash size`` bytes               Hash value identifying the defined data type.
156

    
157
Syntax size               1 unsigned byte                   Length of the syntax (in bytes).
158
Syntax                    ``Syntax size`` bytes             Name of the syntax in which the data type is defined.
159

    
160
Display name size         2-byte unsigned integer           Length of the display name (in bytes).
161
Display name              ``Display name size`` bytes       Human-readable name of the data type as an UTF-8 string.
162

    
163
Version size              2-byte unsigned integer           Length of the version string (in bytes).
164
Version                   ``Display name size`` bytes       Human-readable version of the data type as an ASCII string.
165

    
166
Definition size           4-byte unsigned integer           Length of the definition (in bytes).
167
Definition                ``Definition size`` bytes         Definition of the data type according to the rules of ``Syntax``.
168

    
169
Acceptable source count   1 unsigned byte                   Number of the acceptable source entries in this block.
170
Acceptable source entries `variable`                        ``Acceptable source count`` acceptable source entries.
171

    
172
Dependency count          2-byte unsigned integer           Number of dependency entries in this block.
173
Dependency entries        `variable`                        ``Dependency count`` dependency entries.
174
========================= ================================= =================================================================
175

    
176
The ``Hash`` field contains a "exact topic" hash (See [6]_) value
177
identifying the data type. Given the syntax designated by the
178
``Syntax`` field and the value of the ``Definition`` field, the hash
179
value MUST be computed as described in the "Proposed Approach" and
180
"Data Preparation" sections of [6]_.
181

    
182
.. note::
183

    
184
   TODO(jmoringe): we cannot reference RD-8 because of its draft
185
   status, right?
186

    
187
.. important::
188

    
189
   The base32-encoding and embedding in an URN-string described in
190
   [6]_ are not performed.
191

    
192
The ``Syntax`` designates the date type definition syntax according to
193
which the described data type is defined in the ``Definition``
194
field. Implementations SHOULD refer to a precise version of the
195
respective data type definition syntax. For example, a ROS [9]_
196
implementation may refer to a version of the ROS Message description
197
specification [4]_ in this field.
198

    
199
The value of the ``Definition`` field of the ``TYPE`` block contains
200
the textual or binary definition of the data type according to the
201
rules of the syntax designated by the value of the ``Syntax``
202
field. The value of this field, optionally together with
203
``Definition`` fields of ``TYPE`` blocks referenced in ``Dependency
204
entries`` MUST contain enough information to deserialise the data
205
stored in the channel, such that it can be reconstructed later without
206
any extra information from external to the TIDE file. For example, a
207
ROS implementation may store the message description in this field.
208

    
209
.. note::
210

    
211
   TODO(jmoringe): should we have a ``Documentation link`` field?
212

    
213
The variable-length ``Acceptable source entries`` section contains a
214
sequence of acceptable source entries. Each entry is an "acceptable
215
source" URL (See "as" field in "Link Format" section of [6]_) from
216
which the definition of the data type being described can be
217
obtained. The section has exactly ``Acceptable source count``
218
occurances of the following fields:
219

    
220
====================== ================================= ==================================================================================
221
Field                  Size                              Purpose
222
====================== ================================= ==================================================================================
223
Acceptable source size 2-byte unsigned integer           Length of the acceptable source string (in bytes).
224
Acceptable source      ``Acceptable source size`` bytes  An URL from which the definition of the data type (in ``Syntax``) can be obtained.
225
====================== ================================= ==================================================================================
226

    
227
The variable-length ``Dependency entries`` section contains a sequence
228
of dependencies of the data type being described. Each entry is the
229
"exact topic" hash (See above, [6]_) of the data type to which the
230
dependency refers. The section has exactly ``Dependency count``
231
occurances of the following fields:
232

    
233
========== ======================= =====================================
234
Field      Size                    Purpose
235
========== ======================= =====================================
236
Hash size  2-byte unsigned integer Length of the type hash (in bytes).
237
Hash       ``Hash size`` bytes     "Exact topic" hash of the dependency.
238
========== ======================= =====================================
239

    
240
The ``Hash`` field contains a hash value identifying the data type to
241
which the dependency refers. The TIDE file MUST contain a ``TYPE``
242
block the ``Hash`` field of which is identical to the value of the
243
``Hash`` field of the dependency entry.
244

    
245
.. _`CHAN block`:
246

    
247
Channel
248
-------
249

    
250
The ``CHAN`` block stores the meta-data for ONE channel of data.
251

    
252
================== ================================= ============================================================
253
Field              Size                              Purpose
254
================== ================================= ============================================================
255
ID                 4-byte unsigned integer           Channel identification, used to link entries to the channel.
256
Name size          1 unsigned byte                   Length of the channel name (in bytes).
257
Name               ``Name size`` bytes               Name of the channel, as an UTF-8 string.
258
Source name size   4-byte unsigned integer           Length of the source string (in bytes).
259
Source name        ``Source name size`` bytes        Human-readable source description (encoded as UTF-8).
260
Source config size 4-byte unsigned integer           Length of the source config data (in bytes).
261
Source config      ``Source config size`` raw bytes  Raw data describing the source of this channel's data.
262
Type hash count    1 unsigned byte                   Number of hashes identifying used data types.
263
Type hash entries  `variable`                        ``Type hash count`` type hash entries.
264
================== ================================= ============================================================
265

    
266
A channel stores a set of data entries of a single type, indexed by time. The
267
channel is identified internally in the TIDE file by its unique identification,
268
stored in the ``ID`` field. The channel may be identified externally by its ID
269
or by its unique human-readable name, stored in the ``Name`` field.
270

    
271
.. note::
272

    
273
   TODO(jmoringe): The original channel definition had this ``Type`` field:
274

    
275
     The ``Type`` field indicates the type of source and format in use
276
     by this channel. It should indicate both the connection/transport
277
     type and serialisation type, and will vary by these. For example,
278
     a logging tool for the OpenRTM-aist architecture may specify
279
     ``openrtm-cdr`` or ``openrtm-ros`` to indicate that the source
280
     connection was an OpenRTM-aist port using either the CDR
281
     serialisation or the ROS serialisation scheme.
282

    
283
   In the example, we could no longer describe "the
284
   connection/transport type" and the fact that "the source connection
285
   was an OpenRTM-aist port".
286

    
287
The ``Source name`` field provides a human-readable form of the source
288
information. This allows TIDE file introspection tools to describe the
289
file contents more completely. For example, this field could contain
290
the human-readable path of a ROS topic that provided the data.
291

    
292
The ``Source config`` field provides space for describing the source
293
of the channel's data in a machine-readable form. Typically this will
294
describe the connection that was recorded. For example, an
295
implementation for ROS may store the connection header in this field,
296
while an implementation for OpenRTM-aist may store the name of the
297
source port and the port's properties.
298

    
299
The variable-length ``Type hash entries`` section contains references
300
to data types describing the data stored in the channel. The section
301
has exactly ``Type hash count`` occurances of the following fields:
302

    
303
========== ========================== =============================================================================
304
Hash size  2-byte unsigned integer    Length of the hash (in bytes).
305
Hash       ``Hash size`` bytes        Hash identifying the `TYPE block`_ defining one of types used by the channel.
306
========== ========================== =============================================================================
307

    
308
The ``Hash`` field contains a hash value identifying a data type used
309
in the channel. The TIDE file MUST contain a ``TYPE`` block the
310
``Hash`` field of which is identical to the value of the ``Hash``
311
field of the entry.
312

    
313
If there is a single entry, the referenced ``TYPE`` block defines the
314
type of the data contained in the channel. For example, the ROS
315
middleware would reference a single ``TYPE`` block containing the ROS
316
Message Description [4]_ of the topic or service recorded in the
317
channel.
318

    
319
Multiple entries may describe the data types of multiple "frames"
320
within each entry of the channel. This situation arises with
321
transports supporting multiple "frames" in each message such as ØMQ
322
[8]_.
323

    
324
Finally, multiple entries may describe "outer", containing data types
325
and the contained "inner" data types. For example, the RSB middleware
326
[7]_ would reference a sequence of data types:
327

    
328
#. The first referenced data type would describe the "envelope" or
329
   "notification" data type.
330

    
331
#. The second referenced data type would describe the payload data
332
   type.
333

    
334
.. _`INDX block`:
335

    
336
Index
337
-----
338

    
339
The ``INDX`` block(s) provide(s) an index for random-access in time to the data
340
of one channel stored in the file. They link time stamps with the data values
341
stored in the chunk blocks. A TIDE file may have zero, one or multiple INDX
342
blocks, at any position.
343

    
344
This block has a fixed-length section and a variable-length section. The
345
fixed-length section comes first:
346

    
347
========== ======================= =========================================
348
Field      Size                    Purpose
349
========== ======================= =========================================
350
Channel ID 4-byte unsigned integer Points to the channel this block indexes.
351
Count      4-byte unsigned integer Number of indices in this block.
352
========== ======================= =========================================
353

    
354
The variable-length section contains the indices. It has exactly ``Count``
355
occurances of the following fields:
356

    
357
=========== ======================= ================================================================
358
Field       Size                    Purpose
359
=========== ======================= ================================================================
360
Chunk start 8-byte unsigned integer Starting position of the chunk the referenced entry is in
361
Time stamp  8-byte unsigned integer Time stamp of the entry.
362
Offset      4-byte unsigned integer Offset (in bytes) in the chunk's uncompressed data of the entry.
363
=========== ======================= ================================================================
364

    
365
The offset points to the specific position in the data of the entry relative to
366
the start of its chunk. If the chunk is compressed, the offset refers to the
367
data once it has been uncompressed.
368

    
369
.. _`CHNK block`:
370

    
371
Chunk
372
-----
373

    
374
``CHNK`` blocks store the recorded data entries.
375

    
376
This block has a fixed-length section and a variable-length section. The
377
fixed-length section comes first:
378

    
379
=========== ======================= =====================================================
380
Field       Size                    Purpose
381
=========== ======================= =====================================================
382
Chunk ID    4-byte unsigned integer Chunk identification, used to link entries to chunks.
383
Count       4-byte unsigned integer Number of entries in this chunk.
384
Start       8-byte unsigned integer Time stamp of the first entry in this chunk.
385
End         8-byte unsigned integer Time stmap of the last entry in this chunk.
386
Compression 1 unsigned byte         Indicates the compression used on the entries.
387
=========== ======================= =====================================================
388

    
389
The value of the ``Compression`` field must be one of the following values:
390

    
391
0
392
  No compression.
393

    
394
1
395
  gzip compression.
396

    
397
2
398
  bzip2 compression.
399

    
400
The variable-length section contains the data entries. It has exactly ``Count``
401
occurances of the following fields:
402

    
403
========== ======================= ============================================
404
Field      Size                    Purpose
405
========== ======================= ============================================
406
Channel ID 4-byte unsigned integer Points to the channel this entry belongs to.
407
Time stamp 8-byte unsigned integer Time stamp of the entry.
408
Size       4-byte unsigned integer Size of the following serialised data.
409
Entry      Variable                Serialised entry data.
410
========== ======================= ============================================
411

    
412

    
413
Block ordering
414
==============
415

    
416
Other than the requirement for the file to begin with a ``TIDE`` block, no
417
ordering is required of the blocks in the file. Some orderings may be easier to
418
read or write than others; this is considered an implementation choice and must
419
be balanced with other concerns such as file integrity.
420

    
421

    
422
Implementation examples
423
=======================
424

    
425
This section gives some examples of how TIDE files may be read or written.
426
Implementors should remember that these are examples to demonstrate *simple*
427
methods of implementing the TIDE format for illustrative purposes (for example,
428
the writing method will lead to truncated, unreadable files in the event of an
429
error).
430

    
431
Writing without compression
432
---------------------------
433

    
434
#. Open a new file and write the ``TIDE`` block. Leave the number of channels
435
   and chunks at zero; they will be updated when the file is closed.
436

    
437
#. Write the fixed-length part of the first ``CHNK`` block. Leave the count,
438
   start time stamp and end time stamp at zero; they will be updated when the
439
   chunk is finalised. Make a note of the position in the file that this chunk
440
   began at.
441

    
442
#. When a new channel is added, assign it an identifier
443

    
444
   * When the data type of the new channel has not previously been
445
     used by any other channel, write ``TYPE`` blocks for the data
446
     type of the channel and its transitive dependencies.
447

    
448
   * Write a ``CHAN`` block storing the channel information and
449
     referencing the respective ``TYPE`` block.
450

    
451
#. As entries are received, write them at the end of the file, which will be
452
   the current chunk. Make a note of where each entry is recorded for the
453
   ``INDX`` blocks.
454

    
455
#. When a chunk is finished, go back to its start position and fill in the
456
   fields that were left empty previously.
457

    
458
#. When the file is finished, finalise the current chunk. Then, write the
459
   ``CHAN`` blocks to describe all channels that were recorded. Finally, write
460
   the ``INDX`` blocks.
461

    
462
Reading without compression
463
---------------------------
464

    
465
#. Open the file and read the ``TIDE`` block. Confirm that the format version
466
   is supported.
467

    
468
#. Read through the file, loading ``TYPE``, ``CHAN`` and ``INDX``
469
   blocks and noting the offsets in the file of the ``CHNK`` blocks.
470

    
471
#. As data is requested, look it up in the loaded indices, and use them to find
472
   it in the ``CHNK`` blocks. If a form of deserialization is
473
   necessary, exploit the loaded ``TYPE`` blocks.
474

    
475
Acknowledgements
476
================
477

    
478
This document and the format it describes are based on version 2 of
479
the bag file format created for ROS by Willow Garage [5]_. Several
480
clarifications and the ``TYPE`` block have been proposed by Jan
481
Moringen <jmoringe@techfak.uni-bielefeld.de>, Bielefeld University.
482

    
483

    
484
References
485
==========
486

    
487
.. [1] CORBA specification 3.1 (retrieved 2011-02-08)
488
   http://www.omg.org/spec/CORBA/3.1/Interoperability/PDF
489

    
490
.. [2] Lightweight Communications and Marshalling (retrieved 2011-02-08)
491
   http://code.google.com/p/lcm/
492

    
493
.. [3] Google Protocol Buffers (retrieved 2011-02-08)
494
   http://code.google.com/p/protobuf/
495

    
496
.. [4] ROS Message Description Specification (retrieved 2011-02-08)
497
   http://www.ros.org/wiki/msg
498

    
499
.. [5] Bags/Format/2.0 - ROS Wiki (retrieved 2011-01-26)
500
   http://www.ros.org/wiki/Bags/Format/2.0
501

    
502
.. [6] RETF RD 8: Uniform referencing of Robot Data Message Types
503
   https://retf.org/TODO
504

    
505
.. [7] Robotics Systems Bus
506
   http://code.cor-lab.org/projects/rsb
507

    
508
.. [8] ØMQ
509
   http://www.zeromq.org/
510

    
511
.. [9] Robot Operating System
512
   http://www.ros.org/wiki/
513

    
514

    
515
Copyright
516
=========
517

    
518
This document is licensed under the `Creative Commons Attribution 3.0`_
519
license.
520

    
521
.. _`Creative Commons Attribution 3.0`:
522
   http://creativecommons.org/licenses/by/3.0/
523

    
524

    
525

526
..
527
   Local Variables:
528
   mode: indented-text
529
   indent-tabs-mode: nil
530
   sentence-end-double-space: t
531
   fill-column: 70
532
   coding: utf-8
533
   End: