TIDELogBenchmark » History » Version 5

J. Moringen, 07/21/2014 09:24 PM

1 1 J. Moringen
h1. TIDE log Benchmark
2 1 J. Moringen
3 5 J. Moringen
%{color:red}*The information on this page is obsolete*%
4 5 J. Moringen
5 1 J. Moringen
h2. Chunk Sizes and Indices
6 1 J. Moringen
7 4 J. Moringen
We simulated "scanning" large TIDE log files with varying numbers of entries per chunk and entry sizes. By "scanning" we mean seeking to all @CHNK@ blocks, reading the block headers and reading all entry headers of the respective @CHNK@ s.
8 1 J. Moringen
9 3 J. Moringen
|_.Number of Chunks |_. Number of Entries per Chunk |_. Size of one Entry |_.Filesize [GB] |_. Seeks and Reads |_. "Scan" Time [s] |
10 3 J. Moringen
|>.             100 |>.                         100 |>.         1,000,000 |>.        ~ 9.4 |>.          10,000 |>.         < 0.002 |
11 3 J. Moringen
|>.             100 |>.                      10,000 |>.            10,000 |>.        ~ 9.4 |>.       1,000,000 |>.         < 0.200 |
12 3 J. Moringen
|>.             100 |>.                   1,000,000 |>.               100 |>.       ~ 11.0 |>.     100,000,000 |>.       < 200.000 |
13 3 J. Moringen
14 3 J. Moringen
The number of seek and read operations grows linearly in the number of entry headers that have to be read. In the first two cases, the runtime reflects this very clearly. In the third cases, the overhead starts dominating and performance worsens accordingly. Almost the entire runtime consists of IOWAIT in this case (~ 15 s user, ~ 10 s system, ~ 175 s iowait).
15 3 J. Moringen
16 3 J. Moringen
Conclusion
17 3 J. Moringen
* For high-frequency data with small individual entries, scanning all entries of a file is unacceptable
18 1 J. Moringen
** Indexes (via @INDX@ blocks) are unavoidable and should probably be made mandatory; at least for certain file structures
19 5 J. Moringen
* Since all @CHNK@ blocks have to be visited, even with indexes, it is important to accumulate enough data in each chunk
20 1 J. Moringen
** On the other hand, periodically writing complete chunks to disc may be desirable for data safety and error recovery reasons
21 4 J. Moringen
22 4 J. Moringen
Observation
23 4 J. Moringen
* @TIDE.num_chunks@ and @TIDE.num_channels@ are never really used since the whole file has to be scanned anyway
24 4 J. Moringen
* If data integrity is a concern, a hash of the file's content could be stored instead and only verified when explicitly requested