benchANT Homepage
benchANT Homepage

Apache IoTDB: A New Leader in Time Series Databases

Processing and storing immense time series data streams - such as sensor data, financial data, user & event logs or energy tracking - will become a challenge for many industries and their IT systems.

In our well-known database ranking we also analyze, among other things, the performance of database management solutions specialized in time series, the so-called time series databases.

As of last week, there is a new leader in the TimeSeries ranking - Apache IoTDB.

We discuss the results.

TimeSeries Database Ranking with Apache IoTDB

Key Insights about Apache IoTDB

  • Apache IoTDB is an open source time series database. After passing through the Apache Incubator, IoTDB is one of Apache's top-level projects.
  • IoTDB is currently available as a self-hosted solution, though there are efforts to make it available as a managed database-as-a-service solution in the future.
  • In both scaling sizes used in the ranking, Apache IoTDB sets new records for write throughput - also in comparison with well-known time series databases such as InfluxDB, TimescaleDB or QuestDB.
  • By tuning the database settings, adapted to the load pattern, all performance values could be increased even further.
  • The read throughput and the read latency also convince with excellent values.
  • Likewise, the determined data compression - to reduce memory requirements - is in the leading range.
Apache IoTDB Throughput Overview

About Apache IoTDB

  • Apache IoTDB is an emerging time-series database management system that provides all the necessary features to process large amounts of data for industrial IoT applications, including
    • The processing of millions of writes per second
    • The compression of data to reduce storage requirements
    • The analytics capabilities of this data.
  • Apache IoTDB supports an SQL-like query language for analyzing the data.
  • Apache IoTDB offers easy integrations to Hadoop, Spark or Grafana.
  • Our performance measurements show very good values for write request processing, data compression and read throughput as well as read latency.

During our benchmarks, deployment and working with Apache IoTDB ran without complications. The source code and documentation can be found on GitHub and on the official Apache page of IoTDB.

Professional support is offered in Europe by Timecho Europe GmbH, on whose behalf we performed our measurements independently.

Benchmarking: Scenarios & Methodology

For the database ranking of the time series databases, we use standardized scenarios that allow comparability of the measurement results. These are described in detail below the Ranking.

Workload

The load pattern for our benchmarks is generated by the open-source TSBS benchmarking suite and reflects a "time series DevOps" workload. These types of workloads occur in system monitoring, for example.

The TSBS benchmarking suite was configured for all measurements as follows:

  • Software-Version: TSBS benchANT fork (https://github.com/benchANT/tsbs)
  • Scale-Flag: 1,000
  • Query type: single-groupby-1-1-1
  • Data set size: 3 days
  • Batch insert size: 1,000
  • Runtime query phase: 100,000 queries
  • Repetitions: 1 execution
  • Parallel threads
    • xSmall scaling: 50
    • small scaling: 100

The software was run on a Virtual Machine (VM) with 16 cores at AWS (m5.4xlarge).

Scenarios

Based on this workload, 2 scenarios with different scaling were defined, which differ in resource size and number of parallel accesses, i.e. workload intensity. Horizontal scaling is not included in these scenarios.

Scaling: xSmall

  • IaaS: AWS EC2
  • VM type: m5.large
  • Region: eu-central 1
  • VM size: 2 vCPUs / 8 GiB RAM
  • Cluster size: 1
  • Replication factor: 1
  • Workload threads: 50

Scaling: Small

  • IaaS: AWS EC2
  • VM type: m5.xLarge
  • Region: eu-central 1
  • VM size: 4 vCPUs x / 16 GiB RAM
  • Cluster size: 1
  • Replication factor: 1
  • Workload threads: 100

Database Configuration

In addition to the different scaling sizes, the configuration of Apache IoTDB was also varied in the course of the investigation. Initially, the standard configuration (vanilla, untuned) was used here. In a further series, a configuration optimized for write throughput was then used.

The main differences in the tuning are as follows:

IoTDB Database configurations
Parameter and DescriptionTuned ValueDefault Value
enable_last_cache

Whether to enable LAST cache.
falsetrue
wal_mode

The write mode of wal. For DISABLE mode, the system will disable wal. For SYNC mode, the system will submit wal synchronously, write request will not return until its wal is fsynced to the disk successfully. For ASYNC mode, the system will submit wal asynchronously, write request will return immediately no matter its wal is fsynced to the disk successfully.
ASYNCASYNC
wal_async_mode_fsync_delay_in_ms

Duration a wal flush operation will wait before calling fsync in the async mode.
30003000
max_wal_nodes_num

Max number of wal nodes, each node corresponds to one wal directory. The default value 0 means the number is determined by the system.
90
wal_buffer_queue_capacity

Blocking queue capacity of each wal buffer.
50005000
degree_of_query_parallelism

documentation pending
10
max_number_of_points_in_page

The maximum number of data points (timestamps - valued groups) contained in a page.
36010,000
time_partition_interval

Time partition interval of data when ConfigNode allocate data.
60,480,000,000
100 weeks
604,800,000
one week
data_region_group_extension_policy

The extension policy of DataRegionGroup.
CUSTOMAUTO
default_data_region_group_num_per_database

The number of DataRegionGroups that each Database has when using the CUSTOM-DataRegionGroup extension policy. The least number of DataRegionGroups that each Database has when using the AUTO-DataRegionGroup extension policy.
12
config_node_consensus_protocol_class

Consensus protocol of ConfigNode replicas, only support RatisConsensus.
SimpleConsensusRatisConsensus
schema_region_consensus_protocol_class

Consensus protocol of schema replicas, SimpleConsensus could only be used in 1 replica,larger than 1 replicas could only use RatisConsensus.
SimpleConsensusRatisConsensus
data_region_consensus_protocol_class

Consensus protocol of data replicas, SimpleConsensus could only be used in 1 replica,larger than 1 replicas could use IoTConsensus or RatisConsensus.
SimpleConsensusIoTConsensus
dn_metric_level

documentation pending
DO_NOTHINGCORE
MAX_HEAP_SIZE

The maximum heap memory size that IoTDB can use.
500M1/4 of the memory
effective: ~2GB
HEAP_NEWSIZE

The minimum heap memory size that IoTDB can use at startup.
500Mmin{cores * 100M, one quarter of MAX_HEAP_SIZE}
effective: ~200MB
MAX_DIRECT_MEMORY_SIZE

The max direct memory that IoTDB could use.
500MEqual to the MAX_HEAP_SIZE
effective: ~2GB
MAX_HEAP_SIZE

The maximum heap memory size that IoTDB can use.
5G1/4 of the memory
effective: ~2GB
HEAP_NEWSIZE

The minimum heap memory size that IoTDB will use when startup.
5Gmin{cores * 100M, one quarter of MAX_HEAP_SIZE}
effective: ~200MB
MAX_DIRECT_MEMORY_SIZE

The max direct memory that IoTDB could use.
1GEqual to MAX_HEAP_SIZE
effective: ~2GB
IOTDB_JMX_OPTS

additional flags for JVM.
-XX:+UseParallelGC
Enables the Parallel Copying Collector for young generation garbage collection. It is Multi-threaded Garbage Collector tuned for large (gigabyte size) heaps optimized for minimizing pauses.
-XX:+UseParallelGC

Performance Results: Write Throughput

Write throughput is considered the main metric for time series database performance. It indicates how many write operations per second can be processed by the database.

Write Throughput of Apache IoTDB
Write Throughput of Apache IoTDB
  • For both scaling sizes, the write throughput of Apache IoTDB is at the top of the measurement results, both in the vanilla and in the tuned configuration.
  • The previous peak values of TimescaleDB and QuestDB can be exceeded by more than 43% in the small scaling.
  • Tuning leads to an increase of 6% (xSmall) and 15% (small) for Apache IoTDB.
  • Apache IoTDB convinces with strong values for the write throughput in these unclustered measurements.

Performance Results: Read Throughput

The read throughput is crucial for the analysis of the stored time series data. The results are strongly dependent on the complexity of the queries. The queries used here have a medium complexity. Depending on the use case, the read throughput is more important or less important.

Read throughput of Apache IoTDB
Read throughput of Apache IoTDB
  • At the "xSmall" scaling, the read throughput of the vanilla IoTDB ranks second behind VictoriaMetrics, but far ahead of the other Time Series databases.
  • The tuned version delivers a read throughput increase of over 100%.
  • These ratios can be seen at both scaling sizes.

Performance Results: Read Latency

Read latency is an important factor in the analysis of stored data and describes how long it takes the database to perform the results. Similar to read throughput, the importance of read latency is highly dependent on the use case.

Read Latency of Apache IoTDB
Read Latency of Apache IoTDB
  • Apache IoTDB shows very low read latencies, which only VictoriaMetrics can match. All other databases are at least a factor of 20 higher.
  • The tuning of the IoTDB has a significant influence on the read latency here. Interestingly, the read latency decreases in the xSmall case while it increases in the small case.

Performance Results: Data Compression

Storing large amounts of data is not only an operationally important issue, but also a cost issue. For this reason, many time series databases have built-in compression logic. Data compression is an important functionality when data volumes in the terrabyte range have to be kept.

Storage consumption of Apache IotDB
Storage consumption of Apache IotDB
  • For both the xSmall and Small cases, about 70 GB of data was written in the benchmark. This can be seen well with TimescaleDB, which does not use native data compression because it is built on PostgreSQL.
  • Apache IoTDB reduces this amount of data to about 2-3 GB, depending on the scenario. This equates to a 96-97.5% storage savings without sacrificing performance (see above).
  • This puts Apache IoTDB in the top group around InfluxDB and VictoriaMetrics, which also achieve excellent storage savings of up to 97%, but at the cost of latency.

Conclusion

  • The team behind the open source Apache project for IoTDB is doing a good job in terms of the most important performance metrics.
  • In all important performance KPIs, new records are set, or the values are found in the upper third.
  • The write throughput sets a new record for the "Small" scenario.
  • The read throughput is far above many other time series databases, only VictoriaMetrics can deliver comparable numbers here.
  • The read latency even sets a new benchmark for time series databases and moves into regions that are only known from leading NoSQL databases.
  • Storage utilization or data compression delivers top values and is in the same order of magnitude as InfluxDB and VictoriaMetrics.
  • Performance measurements with horizontal scaling were not performed in this test, so no statements can be made about this.
  • Once again, this demonstrates the usefulness of database benchmarks not only for technology comparison, but also in particular for database optimization.

About benchANT and the Database Ranking

benchANT is a data infrastructure IT consulting firm focused on performance analysis and independent performance cost evaluations. We support our clients in architecture planning, technology selection, data infrastructure modernization and optimization, and database migrations.

Our performance measurements are automated, transparent and reproducible using our scientifically proven benchmarking framework.

With the freely-accessible database ranking we have created an independent, transparent and standardized comparison platform for database performance results, which is constantly extended and updated.

We have also created a freely-accessible technical analysis tool for Database-as-a-Service products, the DBaaS Navigator.

Disclaimer

Performance results are only representative of the workload, resources and database settings used. The results should not be generalized or applied to other scenarios. Specific measurements are recommended.