Apache IoTDB: A New Leader in Time Series Databases
Processing and storing immense time series data streams - such as sensor data, financial data, user & event logs or energy tracking - will become a challenge for many industries and their IT systems.
In our well-known database ranking we also analyze, among other things, the performance of database management solutions specialized in time series, the so-called time series databases.
As of last week, there is a new leader in the TimeSeries ranking - Apache IoTDB.
We discuss the results.

Key Insights about Apache IoTDB
- Apache IoTDB is an open source time series database. After passing through the Apache Incubator, IoTDB is one of Apache's top-level projects.
- IoTDB is currently available as a self-hosted solution, though there are efforts to make it available as a managed database-as-a-service solution in the future.
- In both scaling sizes used in the ranking, Apache IoTDB sets new records for write throughput - also in comparison with well-known time series databases such as InfluxDB, TimescaleDB or QuestDB.
- By tuning the database settings, adapted to the load pattern, all performance values could be increased even further.
- The read throughput and the read latency also convince with excellent values.
- Likewise, the determined data compression - to reduce memory requirements - is in the leading range.

About Apache IoTDB
- Apache IoTDB is an emerging time-series database management system that provides all the necessary features to process large amounts of data for industrial IoT applications, including
- The processing of millions of writes per second
- The compression of data to reduce storage requirements
- The analytics capabilities of this data.
 
- Apache IoTDB supports an SQL-like query language for analyzing the data.
- Apache IoTDB offers easy integrations to Hadoop, Spark or Grafana.
- Our performance measurements show very good values for write request processing, data compression and read throughput as well as read latency.
During our benchmarks, deployment and working with Apache IoTDB ran without complications. The source code and documentation can be found on GitHub and on the official Apache page of IoTDB.
Professional support is offered in Europe by Timecho Europe GmbH, on whose behalf we performed our measurements independently.
Benchmarking: Scenarios & Methodology
For the database ranking of the time series databases, we use standardized scenarios that allow comparability of the measurement results. These are described in detail below the Ranking.
Workload
The load pattern for our benchmarks is generated by the open-source TSBS benchmarking suite and reflects a "time series DevOps" workload. These types of workloads occur in system monitoring, for example.
The TSBS benchmarking suite was configured for all measurements as follows:
- Software-Version: TSBS benchANT fork (https://github.com/benchANT/tsbs)
- Scale-Flag: 1,000
- Query type: single-groupby-1-1-1
- Data set size: 3 days
- Batch insert size: 1,000
- Runtime query phase: 100,000 queries
- Repetitions: 1 execution
- Parallel threads
- xSmall scaling: 50
- small scaling: 100
 
The software was run on a Virtual Machine (VM) with 16 cores at AWS (m5.4xlarge).
Scenarios
Based on this workload, 2 scenarios with different scaling were defined, which differ in resource size and number of parallel accesses, i.e. workload intensity. Horizontal scaling is not included in these scenarios.
Scaling: xSmall
- IaaS: AWS EC2
- VM type: m5.large
- Region: eu-central 1
- VM size: 2 vCPUs / 8 GiB RAM
- Cluster size: 1
- Replication factor: 1
- Workload threads: 50
Scaling: Small
- IaaS: AWS EC2
- VM type: m5.xLarge
- Region: eu-central 1
- VM size: 4 vCPUs x / 16 GiB RAM
- Cluster size: 1
- Replication factor: 1
- Workload threads: 100
Database Configuration
In addition to the different scaling sizes, the configuration of Apache IoTDB was also varied in the course of the investigation. Initially, the standard configuration (vanilla, untuned) was used here. In a further series, a configuration optimized for write throughput was then used.
The main differences in the tuning are as follows:
| Parameter and Description | Tuned Value | Default Value | 
|---|---|---|
| enable_last_cacheWhether to enable LAST cache. | false | true | 
| wal_modeThe write mode of wal. For DISABLE mode, the system will disable wal. For SYNC mode, the system will submit wal synchronously, write request will not return until its wal is fsynced to the disk successfully. For ASYNC mode, the system will submit wal asynchronously, write request will return immediately no matter its wal is fsynced to the disk successfully. | ASYNC | ASYNC | 
| wal_async_mode_fsync_delay_in_msDuration a wal flush operation will wait before calling fsync in the async mode. | 3000 | 3000 | 
| max_wal_nodes_numMax number of wal nodes, each node corresponds to one wal directory. The default value 0 means the number is determined by the system. | 9 | 0 | 
| wal_buffer_queue_capacityBlocking queue capacity of each wal buffer. | 5000 | 5000 | 
| degree_of_query_parallelismdocumentation pending | 1 | 0 | 
| max_number_of_points_in_pageThe maximum number of data points (timestamps - valued groups) contained in a page. | 360 | 10,000 | 
| time_partition_intervalTime partition interval of data when ConfigNode allocate data. | 60,480,000,000 100 weeks | 604,800,000 one week | 
| data_region_group_extension_policyThe extension policy of DataRegionGroup. | CUSTOM | AUTO | 
| default_data_region_group_num_per_databaseThe number of DataRegionGroups that each Database has when using the CUSTOM-DataRegionGroup extension policy. The least number of DataRegionGroups that each Database has when using the AUTO-DataRegionGroup extension policy. | 1 | 2 | 
| config_node_consensus_protocol_classConsensus protocol of ConfigNode replicas, only support RatisConsensus. | SimpleConsensus | RatisConsensus | 
| schema_region_consensus_protocol_classConsensus protocol of schema replicas, SimpleConsensus could only be used in 1 replica,larger than 1 replicas could only use RatisConsensus. | SimpleConsensus | RatisConsensus | 
| data_region_consensus_protocol_classConsensus protocol of data replicas, SimpleConsensus could only be used in 1 replica,larger than 1 replicas could use IoTConsensus or RatisConsensus. | SimpleConsensus | IoTConsensus | 
| dn_metric_leveldocumentation pending | DO_NOTHING | CORE | 
| MAX_HEAP_SIZEThe maximum heap memory size that IoTDB can use. | 500M | 1/4 of the memory effective: ~2GB | 
| HEAP_NEWSIZEThe minimum heap memory size that IoTDB can use at startup. | 500M | min{cores * 100M, one quarter of MAX_HEAP_SIZE} effective: ~200MB | 
| MAX_DIRECT_MEMORY_SIZEThe max direct memory that IoTDB could use. | 500M | Equal to the MAX_HEAP_SIZE effective: ~2GB | 
| MAX_HEAP_SIZEThe maximum heap memory size that IoTDB can use. | 5G | 1/4 of the memory effective: ~2GB | 
| HEAP_NEWSIZEThe minimum heap memory size that IoTDB will use when startup. | 5G | min{cores * 100M, one quarter of MAX_HEAP_SIZE} effective: ~200MB | 
| MAX_DIRECT_MEMORY_SIZEThe max direct memory that IoTDB could use. | 1G | Equal to MAX_HEAP_SIZE effective: ~2GB | 
| IOTDB_JMX_OPTSadditional flags for JVM. | -XX:+UseParallelGCEnables the Parallel Copying Collector for young generation garbage collection. It is Multi-threaded Garbage Collector tuned for large (gigabyte size) heaps optimized for minimizing pauses. | -XX:+UseParallelGC | 
Performance Results: Write Throughput
Write throughput is considered the main metric for time series database performance. It indicates how many write operations per second can be processed by the database.


- For both scaling sizes, the write throughput of Apache IoTDB is at the top of the measurement results, both in the vanilla and in the tuned configuration.
- The previous peak values of TimescaleDB and QuestDB can be exceeded by more than 43% in the small scaling.
- Tuning leads to an increase of 6% (xSmall) and 15% (small) for Apache IoTDB.
- Apache IoTDB convinces with strong values for the write throughput in these unclustered measurements.
Performance Results: Read Throughput
The read throughput is crucial for the analysis of the stored time series data. The results are strongly dependent on the complexity of the queries. The queries used here have a medium complexity. Depending on the use case, the read throughput is more important or less important.


- At the "xSmall" scaling, the read throughput of the vanilla IoTDB ranks second behind VictoriaMetrics, but far ahead of the other Time Series databases.
- The tuned version delivers a read throughput increase of over 100%.
- These ratios can be seen at both scaling sizes.
Performance Results: Read Latency
Read latency is an important factor in the analysis of stored data and describes how long it takes the database to perform the results. Similar to read throughput, the importance of read latency is highly dependent on the use case.


- Apache IoTDB shows very low read latencies, which only VictoriaMetrics can match. All other databases are at least a factor of 20 higher.
- The tuning of the IoTDB has a significant influence on the read latency here. Interestingly, the read latency decreases in the xSmall case while it increases in the small case.
Performance Results: Data Compression
Storing large amounts of data is not only an operationally important issue, but also a cost issue. For this reason, many time series databases have built-in compression logic. Data compression is an important functionality when data volumes in the terrabyte range have to be kept.


- For both the xSmall and Small cases, about 70 GB of data was written in the benchmark. This can be seen well with TimescaleDB, which does not use native data compression because it is built on PostgreSQL.
- Apache IoTDB reduces this amount of data to about 2-3 GB, depending on the scenario. This equates to a 96-97.5% storage savings without sacrificing performance (see above).
- This puts Apache IoTDB in the top group around InfluxDB and VictoriaMetrics, which also achieve excellent storage savings of up to 97%, but at the cost of latency.
Conclusion
- The team behind the open source Apache project for IoTDB is doing a good job in terms of the most important performance metrics.
- In all important performance KPIs, new records are set, or the values are found in the upper third.
- The write throughput sets a new record for the "Small" scenario.
- The read throughput is far above many other time series databases, only VictoriaMetrics can deliver comparable numbers here.
- The read latency even sets a new benchmark for time series databases and moves into regions that are only known from leading NoSQL databases.
- Storage utilization or data compression delivers top values and is in the same order of magnitude as InfluxDB and VictoriaMetrics.
- Performance measurements with horizontal scaling were not performed in this test, so no statements can be made about this.
- Once again, this demonstrates the usefulness of database benchmarks not only for technology comparison, but also in particular for database optimization.
About benchANT and the Database Ranking
benchANT is a data infrastructure IT consulting firm focused on performance analysis and independent performance cost evaluations. We support our clients in architecture planning, technology selection, data infrastructure modernization and optimization, and database migrations.
Our performance measurements are automated, transparent and reproducible using our scientifically proven benchmarking framework.
With the freely-accessible database ranking we have created an independent, transparent and standardized comparison platform for database performance results, which is constantly extended and updated.
We have also created a freely-accessible technical analysis tool for Database-as-a-Service products, the DBaaS Navigator.
Disclaimer
Performance results are only representative of the workload, resources and database settings used. The results should not be generalized or applied to other scenarios. Specific measurements are recommended.