RD: 1 Title: TIDE log format Date: 2013-05-11 Author: Geoffrey Biggs , Ingo Lütkebohle Type: General Content-Type: text/x-rst Created: 2011-01-14 Raw-text: .. contents:: Table of Contents .. section-numbering:: Status of this RRFC =================== This document is a draft of a future Robotics Request for Comment. It must not be formally referenced. Abstract ======== This RRFC describes the Time-Indexed Data Entries (TIDE) log file format. This format is based on the Bag format, developed for logging ROS messages. Files using this format are called *tide files* and typically have the extension ``.tide`` or ``.tid``. The TIDE format is designed for fast streaming of data to disc, fast playback and random access in time of the logged data. Rationale ========= Logging of data in robotics is useful for applications including reproduction of experiments, visualisation of sensor data and long-term logging of robot health data for later diagnostics. The major features of the TIDE format are: - Space to store data type definitions with the data, for long-term compatibility. - Abstraction away from specific transports, allowing a TIDE file to be playable into multiple frameworks and ensuring the data is understandable even without a transport. - Suitable for streaming data to disc, provided enough memory is available to store a reasonable number of indices and channel information. - Ability to compress data records, allowing a smaller file size while maintaining random-access capability. - TIDE files can be introspected and searched in time without needing to load or decompress the recorded data. File format =========== The TIDE format is made up of a set of blocks, each with a specific purpose. The structure of a block is shown below: ====== ======================= ================================================================= Field Size Purpose ====== ======================= ================================================================= Tag 4 bytes Indicates the type of block. Typically a 4-byte character string. Size Unsigned 8 byte integer Size of the block in bytes, excluding the tag and size values. Data `variable` The block's data. The format will depend on the block type. ====== ======================= ================================================================= A TIDE file MUST begin with the `TIDE block`_, which indicates that it is a TIDE file and gives the format version, as well as some basic information. The data contained in a block is a fixed format and order. This allows the data to be read easily. Each block includes the size of the entire block's data. To support future extensions to the format that may add information to a block, file readers are REQUIRED to expect the next block to begin after the number of bytes specified in the size field, regardless of if this matches the size of data they expect. This allows for additional fields to be appended to a block by future minor revisions of the format without preventing older implementations from reading the file. All values in all blocks, with the exception of pre-serialised entry and format data, MUST be little-endian. All time stamps stored in a TIDE file are in nanoseconds since the POSIX Epoch, 1970-01-01 00:00:00 +0000 (UTC). Blocks ====== This section specifies the format of the blocks that may be found in a TIDE file. .. _`TIDE block`: TIDE ---- The ``TIDE`` block MUST occur only once in the file and MUST occur at the very beginning of the file. ====================== ======================= ============================================ Field Size Purpose ====================== ======================= ============================================ Major version number 1 unsigned byte Major version of the TIDE format used. Minor version number 1 unsigned byte Minor version of the TIDE format used. ====================== ======================= ============================================ Implementations are REQUIRED to support any TIDE file with the same major version as the implementation, although not all features supported by the file may be available. For TIDE log files created according to this version of the TIDE specification, the major version number should be 2 (two) and the minor version number should be 0 (zero). .. _`META block`: Meta ---- TODO per-channel meta-data would make sense Zero or more ``META`` blocks MAY occur at arbitrary positions within in the TIDE log file. TODO ====================== ======================= ============================================ Field Size Purpose ====================== ======================= ============================================ Metadata count 2-byte unsigned integer Metadata entries `variable` ====================== ======================= ============================================ The variable-length ``Metadata entries`` section contains a sequence of TODO. Each entry is TODO. The section has exactly ``Metadata count`` occurances of the following fields: ====================== ======================= ============================================ Field Size Purpose ====================== ======================= ============================================ Key size 2-byte unsigned integer Key ``Key size`` bytes Value size 4-byte unsigned integer Value ``Value size`` bytes ====================== ======================= ============================================ Dublin Core Metadata ~~~~~~~~~~~~~~~~~~~~ Should be encoded as the string :samp:`dc:{KEY}` in the key field and an appropriate value string in the value field. The following Dublin Core elements MAY be used: #. Title #. Creator #. Subject #. Description #. Publisher #. Contributor #. Date #. Type #. Format #. Identifier #. Source #. Language #. Relation #. Coverage #. Rights All elements are optional and may be repeated. See http://dublincore.org/documents/dces/ http://dublincore.org/documents/dcmi-type-vocabulary/index.shtml .. _`TYPE block`: Type ---- Zero or more ``TYPE`` blocks MAY occur at arbitrary positions within the TIDE log file. Each ``TYPE`` block describes ONE data type used directly or indirectly (as a dependency of some other data type) in one or more channels (See `CHAN block`_). The structure and attributes of this block are based on and designed to interoperate well with the mechanism for referring to data types described in RETF RD 8 [6]_. .. note:: TODO(jmoringe): add constraint that ``TYPE`` blocks MUST occur before ``CHAN`` blocks which use them and MUST be topologically sorted according to the dependency relation? .. note:: TODO(jmoringe): is it always SHA1 or do we need the following:: Hash function size 1 unsigned byte Length of the identifier of the used hash function. Hash function ``Hash function size`` bytes Identifier of the used hash function. .. note:: TODO(jmoringe): do we need ``Hash size`` below, or can we fix the size of the hash? Also applies to ``Dependency entries`` and ``Type hash entries`` fields. ========================= ================================= ================================================================= Field Size Purpose ========================= ================================= ================================================================= Hash size 2-byte unsigned integer Length of the type hash (in bytes). Hash ``Hash size`` bytes Hash value identifying the defined data type. Syntax size 1 unsigned byte Length of the syntax (in bytes). Syntax ``Syntax size`` bytes Name of the syntax in which the data type is defined. Display name size 2-byte unsigned integer Length of the display name (in bytes). Display name ``Display name size`` bytes Human-readable name of the data type as an UTF-8 string. Version size 2-byte unsigned integer Length of the version string (in bytes). Version ``Display name size`` bytes Human-readable version of the data type as an ASCII string. Definition size 4-byte unsigned integer Length of the definition (in bytes). Definition ``Definition size`` bytes Definition of the data type according to the rules of ``Syntax``. Acceptable source count 1 unsigned byte Number of the acceptable source entries in this block. Acceptable source entries `variable` ``Acceptable source count`` acceptable source entries. Dependency count 2-byte unsigned integer Number of dependency entries in this block. Dependency entries `variable` ``Dependency count`` dependency entries. ========================= ================================= ================================================================= The ``Hash`` field contains a "exact topic" hash (See [6]_) value identifying the data type. Given the syntax designated by the ``Syntax`` field and the value of the ``Definition`` field, the hash value MUST be computed as described in the "Proposed Approach" and "Data Preparation" sections of [6]_. .. note:: TODO(jmoringe): we cannot reference RD-8 because of its draft status, right? .. important:: The base32-encoding and embedding in an URN-string described in [6]_ are not performed. The ``Syntax`` designates the date type definition syntax according to which the described data type is defined in the ``Definition`` field. Implementations SHOULD refer to a precise version of the respective data type definition syntax. For example, a ROS [9]_ implementation may refer to a version of the ROS Message description specification [4]_ in this field. The value of the ``Definition`` field of the ``TYPE`` block contains the textual or binary definition of the data type according to the rules of the syntax designated by the value of the ``Syntax`` field. The value of this field, optionally together with ``Definition`` fields of ``TYPE`` blocks referenced in ``Dependency entries`` MUST contain enough information to deserialise the data stored in the channel, such that it can be reconstructed later without any extra information from external to the TIDE file. For example, a ROS implementation may store the message description in this field. .. note:: TODO(jmoringe): should we have a ``Documentation link`` field? The variable-length ``Acceptable source entries`` section contains a sequence of acceptable source entries. Each entry is an "acceptable source" URL (See "as" field in "Link Format" section of [6]_) from which the definition of the data type being described can be obtained. The section has exactly ``Acceptable source count`` occurances of the following fields: ====================== ================================= ================================================================================== Field Size Purpose ====================== ================================= ================================================================================== Acceptable source size 2-byte unsigned integer Length of the acceptable source string (in bytes). Acceptable source ``Acceptable source size`` bytes An URL from which the definition of the data type (in ``Syntax``) can be obtained. ====================== ================================= ================================================================================== The variable-length ``Dependency entries`` section contains a sequence of dependencies of the data type being described. Each entry is the "exact topic" hash (See above, [6]_) of the data type to which the dependency refers. The section has exactly ``Dependency count`` occurances of the following fields: ========== ======================= ===================================== Field Size Purpose ========== ======================= ===================================== Hash size 2-byte unsigned integer Length of the type hash (in bytes). Hash ``Hash size`` bytes "Exact topic" hash of the dependency. ========== ======================= ===================================== The ``Hash`` field contains a hash value identifying the data type to which the dependency refers. The TIDE file MUST contain a ``TYPE`` block the ``Hash`` field of which is identical to the value of the ``Hash`` field of the dependency entry. .. _`CHAN block`: Channel ------- The ``CHAN`` block stores the meta-data for ONE channel of data. ================== ================================= ============================================================ Field Size Purpose ================== ================================= ============================================================ ID 4-byte unsigned integer Channel identification, used to link entries to the channel. Name size 1 unsigned byte Length of the channel name (in bytes). Name ``Name size`` bytes Name of the channel, as an UTF-8 string. Source name size 4-byte unsigned integer Length of the source string (in bytes). Source name ``Source name size`` bytes Human-readable source description (encoded as UTF-8). Source config size 4-byte unsigned integer Length of the source config data (in bytes). Source config ``Source config size`` raw bytes Raw data describing the source of this channel's data. Type hash count 1 unsigned byte Number of hashes identifying used data types. Type hash entries `variable` ``Type hash count`` type hash entries. ================== ================================= ============================================================ A channel stores a set of data entries of a single type, indexed by time. The channel is identified internally in the TIDE file by its unique identification, stored in the ``ID`` field. The channel may be identified externally by its ID or by its unique human-readable name, stored in the ``Name`` field. .. note:: TODO(jmoringe): The original channel definition had this ``Type`` field: The ``Type`` field indicates the type of source and format in use by this channel. It should indicate both the connection/transport type and serialisation type, and will vary by these. For example, a logging tool for the OpenRTM-aist architecture may specify ``openrtm-cdr`` or ``openrtm-ros`` to indicate that the source connection was an OpenRTM-aist port using either the CDR serialisation or the ROS serialisation scheme. In the example, we could no longer describe "the connection/transport type" and the fact that "the source connection was an OpenRTM-aist port". The ``Source name`` field provides a human-readable form of the source information. This allows TIDE file introspection tools to describe the file contents more completely. For example, this field could contain the human-readable path of a ROS topic that provided the data. The ``Source config`` field provides space for describing the source of the channel's data in a machine-readable form. Typically this will describe the connection that was recorded. For example, an implementation for ROS may store the connection header in this field, while an implementation for OpenRTM-aist may store the name of the source port and the port's properties. The variable-length ``Type hash entries`` section contains references to data types describing the data stored in the channel. The section has exactly ``Type hash count`` occurances of the following fields: ========== ========================== ============================================================================= Hash size 2-byte unsigned integer Length of the hash (in bytes). Hash ``Hash size`` bytes Hash identifying the `TYPE block`_ defining one of types used by the channel. ========== ========================== ============================================================================= The ``Hash`` field contains a hash value identifying a data type used in the channel. The TIDE file MUST contain a ``TYPE`` block the ``Hash`` field of which is identical to the value of the ``Hash`` field of the entry. If there is a single entry, the referenced ``TYPE`` block defines the type of the data contained in the channel. For example, the ROS middleware would reference a single ``TYPE`` block containing the ROS Message Description [4]_ of the topic or service recorded in the channel. Multiple entries may describe the data types of multiple "frames" within each entry of the channel. This situation arises with transports supporting multiple "frames" in each message such as ØMQ [8]_. Finally, multiple entries may describe "outer", containing data types and the contained "inner" data types. For example, the RSB middleware [7]_ would reference a sequence of data types: #. The first referenced data type would describe the "envelope" or "notification" data type. #. The second referenced data type would describe the payload data type. .. _`INDX block`: Index ----- The ``INDX`` block(s) provide(s) an index for random-access in time to the data of one channel stored in the file. They link time stamps with the data values stored in the chunk blocks. A TIDE file may have zero, one or multiple INDX blocks, at any position. This block has a fixed-length section and a variable-length section. The fixed-length section comes first: ========== ======================= ========================================= Field Size Purpose ========== ======================= ========================================= Channel ID 4-byte unsigned integer Points to the channel this block indexes. Count 4-byte unsigned integer Number of indices in this block. ========== ======================= ========================================= The variable-length section contains the indices. It has exactly ``Count`` occurances of the following fields: =========== ======================= ================================================================ Field Size Purpose =========== ======================= ================================================================ Chunk start 8-byte unsigned integer Starting position of the chunk the referenced entry is in Time stamp 8-byte unsigned integer Time stamp of the entry. Offset 4-byte unsigned integer Offset (in bytes) in the chunk's uncompressed data of the entry. =========== ======================= ================================================================ The offset points to the specific position in the data of the entry relative to the start of its chunk. If the chunk is compressed, the offset refers to the data once it has been uncompressed. .. _`CHNK block`: Chunk ----- ``CHNK`` blocks store the recorded data entries. This block has a fixed-length section and a variable-length section. The fixed-length section comes first: =========== ======================= ===================================================== Field Size Purpose =========== ======================= ===================================================== Chunk ID 4-byte unsigned integer Chunk identification, used to link entries to chunks. Count 4-byte unsigned integer Number of entries in this chunk. Start 8-byte unsigned integer Time stamp of the first entry in this chunk. End 8-byte unsigned integer Time stmap of the last entry in this chunk. Compression 1 unsigned byte Indicates the compression used on the entries. =========== ======================= ===================================================== The value of the ``Compression`` field must be one of the following values: 0 No compression. 1 gzip compression. 2 bzip2 compression. The variable-length section contains the data entries. It has exactly ``Count`` occurances of the following fields: ========== ======================= ============================================ Field Size Purpose ========== ======================= ============================================ Channel ID 4-byte unsigned integer Points to the channel this entry belongs to. Time stamp 8-byte unsigned integer Time stamp of the entry. Size 4-byte unsigned integer Size of the following serialised data. Entry Variable Serialised entry data. ========== ======================= ============================================ Block ordering ============== Other than the requirement for the file to begin with a ``TIDE`` block, no ordering is required of the blocks in the file. Some orderings may be easier to read or write than others; this is considered an implementation choice and must be balanced with other concerns such as file integrity. Implementation examples ======================= This section gives some examples of how TIDE files may be read or written. Implementors should remember that these are examples to demonstrate *simple* methods of implementing the TIDE format for illustrative purposes (for example, the writing method will lead to truncated, unreadable files in the event of an error). Writing without compression --------------------------- #. Open a new file and write the ``TIDE`` block. Leave the number of channels and chunks at zero; they will be updated when the file is closed. #. Write the fixed-length part of the first ``CHNK`` block. Leave the count, start time stamp and end time stamp at zero; they will be updated when the chunk is finalised. Make a note of the position in the file that this chunk began at. #. When a new channel is added, assign it an identifier * When the data type of the new channel has not previously been used by any other channel, write ``TYPE`` blocks for the data type of the channel and its transitive dependencies. * Write a ``CHAN`` block storing the channel information and referencing the respective ``TYPE`` block. #. As entries are received, write them at the end of the file, which will be the current chunk. Make a note of where each entry is recorded for the ``INDX`` blocks. #. When a chunk is finished, go back to its start position and fill in the fields that were left empty previously. #. When the file is finished, finalise the current chunk. Then, write the ``CHAN`` blocks to describe all channels that were recorded. Finally, write the ``INDX`` blocks. Reading without compression --------------------------- #. Open the file and read the ``TIDE`` block. Confirm that the format version is supported. #. Read through the file, loading ``TYPE``, ``CHAN`` and ``INDX`` blocks and noting the offsets in the file of the ``CHNK`` blocks. #. As data is requested, look it up in the loaded indices, and use them to find it in the ``CHNK`` blocks. If a form of deserialization is necessary, exploit the loaded ``TYPE`` blocks. Acknowledgements ================ This document and the format it describes are based on version 2 of the bag file format created for ROS by Willow Garage [5]_. Several clarifications and the ``TYPE`` block have been proposed by Jan Moringen , Bielefeld University. References ========== .. [1] CORBA specification 3.1 (retrieved 2011-02-08) http://www.omg.org/spec/CORBA/3.1/Interoperability/PDF .. [2] Lightweight Communications and Marshalling (retrieved 2011-02-08) http://code.google.com/p/lcm/ .. [3] Google Protocol Buffers (retrieved 2011-02-08) http://code.google.com/p/protobuf/ .. [4] ROS Message Description Specification (retrieved 2011-02-08) http://www.ros.org/wiki/msg .. [5] Bags/Format/2.0 - ROS Wiki (retrieved 2011-01-26) http://www.ros.org/wiki/Bags/Format/2.0 .. [6] RETF RD 8: Uniform referencing of Robot Data Message Types https://retf.org/TODO .. [7] Robotics Systems Bus http://code.cor-lab.org/projects/rsb .. [8] ØMQ http://www.zeromq.org/ .. [9] Robot Operating System http://www.ros.org/wiki/ Copyright ========= This document is licensed under the `Creative Commons Attribution 3.0`_ license. .. _`Creative Commons Attribution 3.0`: http://creativecommons.org/licenses/by/3.0/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: