Navigating the Database Performance Landscape: A Comprehensive Guide to Database Benchmarking Suites (2024)
In the dynamic database management systems market, performance is a key differentiator.
Database benchmarking suites provide the tool to assess performance, guide us through the vast landscape of options, and help developers, engineers, and decision-makers make informed choices.
This article dives into the various benchmarking suites for different workload domains and provides an overview of their origin, implementations, and unique features.
Join us on a journey through YCSB, TSBS, TPC-C, and others.
Table of Contents
- Introduction to Database Benchmarking Suites
- OLTP Database Benchmarking Suites
- Sysbench
- TPC-C (BenchBase)
- HammerDB – TPROC-C
- TPC-E
- Twitter (BenchBase)
- TATP (BenchBase)
- Wikipedia (BenchBase)
- AuctionMark (BenchBase)
- Epinions.com (BenchBase)
- Resource Stresser (BenchBase)
- SEATS (BenchBase)
- SIBench (BenchBase)
- SmallBank (BenchBase)
- Voter (BenchBase)
- YCSB (BenchBase)
- PgBench
- Cockroach-PKG
- Deprecated OLTP Database Benchmarking Suites
- OLAP Database Benchmarking Suites
- HTAP Database Benchmarking Suites
- NoSQL Database Benchmarking Suites
- Time-Series Database Benchmarking Suites
- Vector Benchmarking
- Getting Involved
Introduction to Database Benchmarking Suites
At their core, database benchmark suites are standardized sets of tests crafted to assess the performance of database management systems (DBMS). These suites simulate real-world scenarios, emulating the diverse and demanding conditions databases encounter in production environments.
Each database benchmarking suite comes with a predefined data model that emulates a target application domain, including a varying number of configuration options such as total data size, query concurrency or query distribution. Usually, the data set is synthetic data that is modelled based on real-world application data but there are also a few benchmark suites that support real-world traces.
During the benchmark execution, the benchmarking suite captures key performance indicators such as throughput or latency. The collected data forms the basis for comprehensive assessments, enabling stakeholders to make informed decisions about the suitability of a particular DBMS for their unique use cases. For information about executing database benchmarks please read this guide about database benchmarking.
Benefits of Database Benchmarking Suites:
- Performance Validation: Benchmark suites validate a DBMS's performance under varying workloads, ensuring it meets the specified requirements and can handle the anticipated user load.
- Comparative Analysis: By facilitating side-by-side comparisons, benchmarking suites empower decision-makers to evaluate different database solutions objectively, aiding in the selection of the most suitable option for their specific needs, as we have done in our Database Ranking
- Scalability Assessment: Organizations can gauge a DBMS's scalability by examining its performance under increasing data volumes and user loads, allowing them to plan for future growth.
- Resource Optimization: Identifying bottlenecks and resource-intensive operations helps optimise database configurations, enhancing overall system efficiency and reducing operational costs.
- Strategic Decision-Making: Armed with accurate performance data, businesses can make strategic decisions regarding infrastructure investments, technology adoption, and long-term database strategies.
In essence, database benchmark suites are the linchpin in the pursuit of an optimized, resilient, and high-performing database ecosystem. As we explore the most influential benchmarking suites, we'll uncover the nuances that distinguish them.
We have split the numerous existing database benchmarking suites into the workload type they can generate. The origin of some benchmarking suites is not clear in every case. The provided links to the benchmark suites might be only one of a few possible sources.
OLTP Database Benchmarking Suites
At the core of many database systems, OLTP (Online Transaction Processing) is the beating heart that powers the transactional interactions vital for daily business operations.
OLTP refers to a class of database workloads characterized by their emphasis on quick, low-latency transactions. These transactions typically involve frequent inserts, updates, and deletes of relatively small amounts of data. In essence, OLTP systems cater to the swift and concurrent processing of numerous transactions, mirroring the dynamic nature of applications like e-commerce platforms, banking systems, and airline reservation systems.
Key Metrics in OLTP Benchmarking:
- Transaction Throughput: Measures the number of transactions processed per unit of time, indicating the system's ability to handle a high transaction load.
- Response Time/Latency: Evaluate the time taken to complete a transaction, emphasizing the system's responsiveness to user requests.
- Concurrency Scalability: Assesses how well the database system handles an increasing number of concurrent transactions without compromising performance or data integrity.
Benchmarking Suite | Use Cases | Last Official Version | Forks | Github Stars | benchANT Integration |
---|---|---|---|---|---|
Sysbench | Microbenchmark | 2020 | 1,000 | 5,500 | yes |
TPC-E | Finance | 2023 | n/a | n/a | no |
TPC-C (BenchBase) | eCommerce | 2023 | 144 | 343 | yes |
Twitter (BenchBase) | Social | 2023 | 144 | 343 | yes |
TATP (BenchBase) | Tele-communications | 2023 | 144 | 343 | yes |
Wikipedia (BenchBase) | Social, Website | 2023 | 144 | 343 | no |
AuctionMark (BenchBase) | Website, eCommerce | 2023 | 144 | 343 | no |
Epinions.com (BenchBase) | Website, Social | 2023 | 144 | 343 | no |
Resource Stresser (BenchBase) | Microbenchmark | 2023 | 144 | 343 | no |
SEATS (BenchBase) | Web, eCommerce | 2023 | 144 | 343 | no |
SIBench (BenchBase) | Microbenchmark | 2023 | 144 | 343 | no |
SmallBank (BenchBase) | Finance | 2023 | 144 | 343 | no |
Voter (BenchBase) | Social | 2023 | 144 | 343 | no |
YCSB (BenchBase) | Social, Logging, Caching | 2023 | 144 | 343 | no |
Cockroach-PKG | Multi-Benchmark | 2023 | n/a | n/a | no |
PgBench | Microbenchmark | 2023 | n/a | n/a | no |
HammerDB- TPROC-C | eCommerce | 2020 | 108 | 475 | no |
Sysbench
- Origin: Open-source, see Gitlab
- Purpose: Sysbench serves as a versatile benchmarking tool for assessing system performance under various workloads. Originally developed for MySQL, it has evolved to support multiple database engines.
- Real-world Use-Cases: Sysbench can be understood as a microbenchmark, which allows the generation of several different queries for benchmarking and stress-testing databases, making it a go-to tool for database administrators and developers. It's ideal for assessing general system performance and database scalability.
- Key Features: Flexible and extensible, Sysbench allows users to create custom test scenarios. It includes also CPU, memory, file I/O, and database benchmarks, making it a comprehensive solution for system-level testing.
- benchANT Integration: Yes
TPC-C (BenchBase)
- Origin: Transaction Processing Performance Council (TPC), now integrated into the BenchBase framework
- Purpose: TPC-C evaluates the performance of OLTP systems by simulating the workload of an order-entry environment with multiple users. It is one of the first and maybe the most famous TPC benchmark. It was integrated into the BenchBase framework by the Carnegie Mellon University. They also added several other database integrations. We recommend using this branch instead of the original TPC version.
- Real-world Use-Cases: Ideal in scenarios mirroring order processing systems, such as those found in retail and manufacturing industries.
- Key Features: TPC-C generates a mix of transactions, including new order creation, order status checks, and delivery updates. It stresses the system by simulating a busy, multi-user environment with varying transaction types.
- benchANT Integration: Yes
HammerDB - TPROC-C
- Origin: Transaction Processing Performance Council (TPC), GitHub HammerDB Project
- Purpose: HammerDB, with its TPROC-C benchmark, assesses the performance of database systems using the TPC-C standard. It simulates an order-entry environment with multiple users and complex transactions.
- Real-world Use-Cases: Valuable for organizations evaluating in scenarios resembling order processing and transaction-intensive environments.
- Key Features: The TPROC-C benchmark within HammerDB stresses databases with a mix of transactions, including order creation, order status checks, and delivery updates.
- benchANT Integration: No
TPC-E
- Origin: Transaction Processing Performance Council (TPC), see TPC.org
- Purpose: TPC-E benchmarks measure the performance of online transaction processing (OLTP) systems by simulating the activities of a brokerage firm. It focuses on complex, realistic transactions.
- Real-world Use-Cases: TPC-E is relevant for handling financial transactions, providing insights into how well a system can handle demanding OLTP workloads.
- Key Features: TPC-E emphasizes transactional complexity, including intricate order processing and financial calculations. This benchmark provides a more modern and challenging alternative to its predecessor, TPC-C.
- benchANT Integration: No
- Origin: BenchBase
- Purpose: Part of the BenchBase suite, the Twitter benchmark simulates the workload of a social media platform, focusing on tweet creation, retrieval, and user interactions.
- Real-world Use-Cases: Relevant for assessing the performance of databases powering social media platforms, providing insights into how well a system handles high-volume, short-duration transactions.
- Key Features: The Twitter benchmark emulates user activities such as tweeting, retweeting, and following, simulating the dynamic and real-time nature of social media interactions.
- benchANT Integration: Yes
TATP (BenchBase)
- Origin: BenchBase, originally dveloped at Nokia, called TM1
- Purpose: TATP assesses the performance of databases handling telecommunications transaction processing.
- Real-world Use-Cases: Relevant for telecommunications scenarios, where high-throughput, low-latency transaction processing is crucial.
- Key Features: TATP benchmarks focus on complex transactional operations typical in telecommunications, including subscriber data updates and call detail record generation.
- benchANT Integration: Yes
Wikipedia (BenchBase)
- Origin: BenchBase, originally from OLTPbench
- Purpose: The Wikipedia benchmark, part of the BenchBase suite, simulates a scenario based on a collaborative content creation platform, stressing the database with various read and write operations.
- Real-world Use-Cases: Suitable for content management systems and collaborative platforms, such as wikis.
- Key Features: The Wikipedia benchmark simulates user interactions like page edits, searches, and revisions, providing insights into the efficiency of databases handling content-centric workloads.
- benchANT Integration: No
Epinions.com (BenchBase)
- Origin: BenchBase
- Purpose: AuctionMark, within the BenchBase suite, emulates an online auction platform's workload, stressing the database with bid submissions, auction status checks, and user interactions.
- Real-world Use-Cases: Relevant for evaluating the performance of databases supporting online auction platforms and e-commerce systems with dynamic bidding processes.
- Key Features: AuctionMark generates transactional scenarios including bidding, auction creation, and result retrieval, reflecting the complex interactions of online auction environments.
- benchANT Integration: No
Resource Stresser (BenchBase)
- Origin: BenchBase
- Purpose: The Resource Stresser benchmark is a microbenchmark for extreme resource contention scenarios, such as high CPU or memory loads.
- Real-world Use-Cases: Valuable for understanding how a database responds and maintains stability under resource-intensive conditions, aiding in capacity planning and resource optimization.
- Key Features: Resource Stresser applies extreme stress to a database system, evaluating its resilience and resource management capabilities under challenging conditions.
- benchANT Integration: No
SEATS (BenchBase)
- Origin: BenchBase
- Purpose: SEATS, a part of the BenchBase suite, simulates a scenario based on an airline reservation system.
- Real-world Use-Cases: Relevant for (airline) reservation and booking systems.
- Key Features: SEATS generates transactional scenarios, including seat reservations, ticket purchases, and flight status checks, offering insights into database efficiency in the context of airline operations.
- benchANT Integration: No
SIBench (BenchBase)
- Origin: BenchBase
- Purpose: SIBench, within the BenchBase suite, simulates a workload based on a stock trading platform.
- Real-world Use-Cases: Applicable for evaluating the performance of databases supporting stock trading and financial platforms, simulating high-throughput, transactional scenarios.
- Key Features: SIBench generates transactions related to stock trading, account management, and financial operations, providing insights into database efficiency in the context of financial markets.
- benchANT Integration: No
SmallBank (BenchBase)
- Origin: BenchBase, originally developed by Michael J. Cahill at Oracle
- Purpose: SmallBank, part of the BenchBase suite, emulates the workload of a small banking application, stressing the database with various financial transactions, account operations, and user interactions.
- Real-world Use-Cases: Suitable for assessing the performance of databases supporting small-scale banking applications, offering insights into transactional efficiency and data integrity.
- Key Features: SmallBank generates transactions like deposits, withdrawals, and fund transfers, simulating the transactional nature of small-scale banking operations.
- benchANT Integration: No
Voter (BenchBase)
- Origin: BenchBase, originally created from a VoltDB app
- Purpose: The Voter benchmark, within the BenchBase suite, simulates a workload based on an election voting system.
- Real-world Use-Cases: Relevant for assessing the performance of databases supporting election systems, providing insights into transactional efficiency during peak voting periods.
- Key Features: The Voter benchmark generates scenarios including voter registrations, ballot submissions, and result queries, simulating the transactional nature of election systems.
- benchANT Integration: No
YCSB (BenchBase)
- Origin: original by Yahoo! Research, integrated and adjusted in BenchBase
- Purpose: YCSB (Yahoo! Cloud Serving Benchmark), part of the BenchBase suite, is designed for benchmarking and comparing the performance of cloud-based and distributed databases. In comparison to the original YCSB, which we will explain in the NoSQL section, the operations of this implementation are packed into transactions.
- Real-world Use-Cases: Widely used for evaluating the efficiency of databases in cloud environments, helping organizations choose suitable solutions for distributed and scalable applications.
- Key Features: YCSB supports a variety of workloads, including read and write-intensive operations, providing a standardized approach for assessing the performance of cloud databases under different scenarios.
- benchANT Integration: No, benchANT uses its own YCSB fork with updated and a lot more database integrations, which originate from the original non-transactional YCSB.
PgBench
- Origin: PostgreSQL Community
- Purpose: PgBench is a benchmarking tool that comes bundled with the PostgreSQL database. It is designed to simulate various transactional workloads to assess the performance of PostgreSQL.
- Real-world Use-Cases: Widely used for benchmarking PostgreSQL installations, helping database administrators and developers understand the system's capabilities and identify areas for optimization.
- Key Features: PgBench supports multiple built-in scenarios, including simple read and write operations, making it a valuable tool for evaluating PostgreSQL performance in diverse usage scenarios.
- benchANT Integration: No
CockroachPKG
- Origin: Cockroach Labs
- Purpose: Cockroach-PKG, developed by Cockroach Labs, is a benchmarking suite specifically tailored for the CockroachDB distributed SQL database, evaluating its performance under various workloads from existing benchmarks like YCSB, TPC-C and so on. The benchmarking suite is integrated into the source code of Cockroach.
- Real-world Use-Cases: Applicable for organizations considering or utilizing CockroachDB as their distributed database solution, providing insights into its performance characteristics for several use cases, based on the selected benchmarking suite.
- Key Features: Cockroach-PKG includes tests for transactional workloads, schema changes, and distributed consistency, allowing users to assess CockroachDB's capabilities in a variety of scenarios.
- benchANT Integration: No
Deprecated OLTP Database Benchmarking Suites
In recent years, numerous benchmarking suites have been created, particularly from university research, but were discontinued after a short time. The following benchmarking suites are no longer current or actively maintained. Running these benchmarks should be done with caution and is only possible for older database versions and drivers.
- FinTime: Benchmarking suite for OLTP finance applications for Oracle and DB2 developed by NYU around the year 2000.
- BenchFoundry: A suite developed by 2017 for consistency benchmarks and TPC-C-like benchmarks with workload traces.
OLAP Database Benchmarking Suites
Online Analytical Processing (OLAP) workloads play a pivotal role in evaluating the performance of database systems, when handling complex analytical queries and reporting tasks. Unlike OLTP (Online Transactional Processing) workloads that focus on short running transactions with read, write and update operations, OLAP workloads are designed to support complex, read-intensive operations.
These workloads involve complex queries that demand the processing of vast amounts of historical data to derive meaningful insights. Examples of OLAP queries include aggregating sales data over time, analyzing product performance across different regions, or generating financial reports.
In comparison to OLTP workloads queries have a much higher complexity which results in longer query times. Due to these reasons, the results vary and transactions per second become transactions per hour and latencies are often not measured in milliseconds, but in seconds.
Benchmarking Suite | Use Cases | Last Official Version | Forks | Github Stars | benchANT Integration |
---|---|---|---|---|---|
TPC-DS | Analytics | 2023 | n/a | n/a | no |
TPC-H (BenchBase) | Analytics | 2023 | 144 | 343 | yes |
Clickbench | Web, Analytics, Logging | 2023 | 109 | 484 | no |
HammerDB- TPROC-H | Analytics | 2020 | 108 | 475 | no |
Telemetry Logs Analytics Benchmark | IoT | 2022 | 4 | 11 | no |
TPC-H (BenchBase)
- Origin: Transaction Processing Performance Council (TPC), now integrated into the BenchBase framework
- Purpose: TPC-H, within the BenchBase suite, is a decision support benchmark that evaluates the performance of database systems in handling complex analytical queries and reporting tasks.
- Real-world Use-Cases: Relevant for assessing databases in scenarios requiring analytical processing, such as business intelligence applications. It simulates queries involving aggregations and large-scale data analysis.
- Key Features: TPC-H includes a set of 22 queries with varying complexity, focusing on tasks like sales volume analysis, market share calculations, and customer segmentation. It provides a standardized measure for evaluating the efficiency of databases in OLAP scenarios.
- benchANT Integration: Yes
TPC-DS
- Origin: Transaction Processing Performance Council (TPC)
- Purpose: TPC-DS is a decision support benchmark that measures the performance of database systems in complex analytical scenarios, similar to TPC-H but with additional features.
- Real-world Use-Cases: Applicable for assessing databases handling large-scale analytical workloads, such as data warehousing environments. It simulates diverse decision support queries and complex data analysis.
- Key Features: TPC-DS includes 99 queries, covering a broad range of business scenarios. It evaluates database performance in tasks like customer behaviour analysis, inventory management, and trend forecasting. TPC-DS provides a comprehensive benchmark for organizations dealing with extensive analytical data.
- benchANT Integration: No
HammerDB - TPROC-H
- Origin: HammerDB Project
- Purpose: HammerDB, with its TPROC-H benchmark, assesses the performance of database systems using the TPC-H standard. It specifically focuses on decision support workloads, similar to TPC-H.
- Real-world Use-Cases: Valuable for organizations dealing with analytical processing and decision support tasks. It simulates queries involving complex aggregations and data analysis typical in business intelligence scenarios.
- Key Features: The TPROC-H benchmark within HammerDB stresses databases with a set of TPC-H-like queries, assessing their performance in handling analytical workloads. It also has 22 analytical queries, but it does not comply with the TPC-H standard.
- benchANT Integration: No
ClickBench
- Origin: Clickhouse – Github Repo
- Purpose: ClickBench is a benchmarking suite developed by ClickHouse. It focuses on assessing the performance of analytical databases and is used in the ClickHouse Benchmark Ranking
- Real-world Use-Cases: This benchmark represents typical workload in the following areas: clickstream and traffic analysis, web analytics, machine-generated data, structured logs, and events data. It covers the typical queries in ad-hoc analytics and real-time dashboards.
- Key Features: ClickBench includes a set of analytical queries in one large table. It evaluates performance in tasks such as aggregations, filtering, and other analytical operations.
Telemetry Logs Analytical Benchmark
- Origin: Developed by Microsoft, see GitHub Repo
- Purpose: The Telemetry Logs Benchmark evaluates the performance of database systems in handling large-scale structured, semi-structured, and unstructured telemetry data.
- Real-world Use Cases: Relevant for industries requiring real-time analysis of telemetry data, such as aerospace, automotive, and infrastructure monitoring.
- Key Features: The queries of this benchmarking suite consists of queries, that represent common diagnostic and monitoring sceanrios, according to Microsoft. It also allows massive scale tests with up to several billion records.
- benchANT Integration: No
HTAP Database Benchmarking Suites
Hybrid Transactional/Analytical Processing (HTAP) workloads emerge as a convergence of OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing).
HTAP benchmarks execute simultaneous transactional processing and analytical queries. This stands in contrast to OLAP, which traditionally involves periodic batch processing for analytical tasks.
In consequence, HTAP workloads are mixed workloads with high levels of transactional activity and, at the same time, executing business analytics queries directly over production data.
Benchmarking Suite | Use Cases | Last Official Version | Forks | Github Stars | benchANT Integration |
---|---|---|---|---|---|
CH-benCHmark (BenchBase) | eCommerce, Website, Analytics | 2023 | 144 | 343 | no |
HATtrick | eCommerce, Analytics | 2022 | 5 | 29 | no |
CH-benCHmark (BenchBase)
- Origin: BenchBase framework, but developed by the Technical University of Munich
- Purpose: CH-benCHmark, within the BenchBase suite, is a benchmark specifically designed to create an HTAP workload based on the BenchBase TPC-C implementation with the additional 22 analytical queries of the TPC-H.
- Real-world Use-Cases: Relevant for organizations which have simultaneous OLTP and OLAP workloads on the same database systems in eCommerce-similar scenarios, selling and analyzing the orders.
- Key Features: The CH-benCHmark allows the parallel execution of transactional and analytical queries to the same database and database tables, simulating a hybrid workload which can often be seen in monolithic applications.
- benchANT Integration: No
HATtrick
- Origin: University of Wisoncsin, based on TPC-C and the SSB benchmark
- Purpose: HATtrick was created by researchers at the University of Wisconsin to define a new HTAP benchmark which covers the gap between pure OLTP and OLAP workloads.
- Real-world Use-Cases: The HTAP workload that is created by the HATtrick benchmark is similar to eCommerce applications with integrated analytical queries to generate a mixture of OLTP and OLAP.
- Key Features: A unique features of this benchmark is the “freshness” KPI for the analytical queries, which describes how new the results of the analytical queries in reference to the transactional write queries.
- benchANT Integration: No
Deprecated HTAP Database Benchmarking Suites
- HTAPBench: A parallelized workload from TPC-C and TPC-H to create an HTAP workload, which was developed in 2019 in a university environment, but has not been updated recently. It follows a similar approach as the CH-benCHmark.
NoSQL Database Benchmarking Suites
Unlike the previous categories, which were classified by workload type, the NoSQL Benchmarking Suites category is more of a catch-all category for benchmarks for NoSQL databases. An alternative name, but not always 100% accurate, for this category would have been "CRUD Database Benchmarking Suites".
CRUD stands for Create, Read, Update, and Delete, representing the fundamental operations for managing data in a database system. NoSQL databases excel in CRUD workloads due to their flexible schema and distributed architecture, making them suitable for applications requiring rapid data ingestion, retrieval, and manipulation.
Still, these NoSQL workloads can have various facets from simple CRUD operations to complex database-specific operations, why we named the category "NoSQL Database Benchmarking Suites".
Benchmarking Suite | Use Cases | Last Official Version | Forks | Github Stars | benchANT Integration |
---|---|---|---|---|---|
YCSB | Social, Logging, Caching, Comparison | 2019 | 2,200 | 4,700 | 1 |
Cassandra- stress | Testing | 2023 | NA | NA | 0 |
MongoDB Genny | Testing | 2023 | 85 | 45 | 1 |
NoSQLBench | Social, eCommerce, Analytics, IoT, Comparison | 2023 | 69 | 158 | 0 |
NdBench | Testing | 2021 | 105 | 361 | 0 |
TLP-Stress | Testing | 2021 | 28 | 53 | 0 |
Redis- benchmark | Testing | 2023 | NA | NA | 0 |
Memtier- benchmark | Testing | 2023 | 200 | 828 | 0 |
elastic- benchmark | Testing, Search | 2023 | 2 | 15 | 0 |
ftsb | Testing, Search | 2023 | 3 | 17 | 0 |
ucsb | Testing, Caching | 2023 | 5 | 42 | 0 |
Aerospike- benchmark | Testing | 2023 | 11 | 14 | 0 |
YCSB (Yahoo! Cloud Serving Benchmark)
- Origin: Yahoo! Research
- Purpose: YCSB is the de-facto standard for NoSQL benchmarking. It is widely used and forked for several purposes and updates. There has been no new official version since 2019 and many pull requests and issues are open. We recommend using these updated forks:
- benchANT YCSB: Updated and additional bindings for many databases; used and tested frequently
- YCSB GO: A re-write in GO by PingCAP, a database company, with additional database bindings.
- YCSB Big Query: The YCSB-fork is used in an whitepaper by Altoros for benchmarking different NoSQL DBaaS
- Real-world Use Cases: Widely used for evaluating databases in cloud environments, baseline technology comparison and stress tests. It has 5 standard scenarios and is highly adjustable for nearly any simple CRUD workload, which makes it possible to test nearly any database technology for basic operations.
- Key Features: YCSB supports a variety of simple crud workloads and has 45 database bindings. An extensive article about the YCSB can be found here.
- benchANT Integration: Yes, support for the original and the benchANT fork
NoSQLBench
- Origin: Community-driven project, sponsored by DataStax
- Purpose: NoSQLBench is a community-driven benchmarking tool focused on evaluating the performance of NoSQL databases across different use cases and workloads.
- Real-world Use Cases: The built-in scenarios and workloads cover several topics from stress-testing different basic database operations up to social, media and IoT scenarios.
- Key Features: NoSQLBench offers flexibility in workload customization, supporting different types of operations, data distributions, and concurrency levels. It provides a unified platform for benchmarking multiple NoSQL databases, facilitating comparative analysis.
- benchANT Integration: No
Cassandra-stress
- Origin: Apache Cassandra Project
- Purpose: Cassandra-stress is a stress testing tool specifically designed for Apache Cassandra, simulating various read and write operations to evaluate cluster performance.
- Real-world Use Cases: Ideal for assessing the performance of Apache Cassandra clusters in scenarios involving high-throughput data ingestion and retrieval, common in distributed systems.
- Key Features: Cassandra-stress is a built-in utility tool in the Cassandra source code and is specifically designed for Cassandra and Cassandra-derived databases.
- benchANT Integration: No
MongoDB Genny
- Origin: MongoDB, see GitHub
- Purpose: MongoDB Genny is a workload generator and benchmarking tool for MongoDB, designed to simulate real-world use cases and assess MongoDB's performance.
- Real-world Use Cases: Valuable for evaluating MongoDB deployments in scenarios such as data ingestion, complex query processing, and data manipulation tasks. It comes with over 30 simple built-in workloads.
- Key Features: MongoDB Genny supports customizable workload profiles, enabling users to simulate various data access patterns and concurrency levels. It provides detailed performance metrics and supports integration with monitoring tools for comprehensive analysis.
- benchANT Integration: Yes
NdBench
- Origin: Netflix, see GitHub
- Purpose: NdBench is a benchmarking tool developed by Netflix to evaluate the performance of their internal data storage systems, focusing on scalability and reliability.
- Real-world Use Cases: Primarily used by Netflix for assessing the performance of their data storage infrastructure in handling high-throughput data processing and storage.
- Key Features: NdBench supports customizable workloads and data generation, allowing users to simulate various data access patterns and query types. It emphasizes scalability testing and provides metrics for evaluating system performance under load.
- benchANT Integration: No
Aerospike-benchmark
- Origin: Aerospike, see GitHub
- Purpose: Aerospike-benchmark is an open-source benchmarking tool for Aerospike, designed to measure database performance by simulating various read and write operations.
- Real-world Use Cases: Ideal for evaluating the performance of Aerospike instances in scenarios requiring high-speed data caching, session management, and real-time analytics.
- Key Features: Aerospike-benchmark supports customizable workload parameters, enabling users to assess Aerospike performance under different usage patterns and data access scenarios.
- benchANT Integration: No
Redis-benchmark
- Origin: Redis, see GitHub
- Purpose: Redis-benchmark is a built-in benchmarking tool for Redis, designed to measure Redis server performance by simulating various read and write operations.
- Real-world Use Cases: Ideal for evaluating the performance of Redis instances in scenarios requiring high-speed data caching, session management, and real-time analytics.
- Key Features: Redis-benchmark supports multiple benchmark modes and customizable workload parameters.
- benchANT Integration: No
Memtier-benchmark
- Origin: Redis Labs, see GitHub
- Purpose: Memtier-benchmark is a benchmarking tool designed for Redis, Memcached and other key-value stores.
- Real-world Use Cases: Relevant for evaluating the performance of Redis and Memcached instances in scenarios involving high-throughput data caching and session management.
- Key Features: Memtier-benchmark supports customizable workload profiles, including read and write operations with varying data sizes and concurrency levels. It provides detailed performance metrics and supports multi-threaded benchmarking for accurate assessment of system capabilities.
- benchANT Integration: No
Elastic-benchmark
- Origin: Elasticsearch, see GitHub
- Purpose: Elastic-benchmark is a benchmarking tool designed for Elasticsearch and OpenSearch, focusing on evaluating cluster performance and scalability under a search-related workload.
- Real-world Use Cases: Valuable for assessing the performance of Elastic and OpenSearch clusters in scenarios involving full-text search, log analytics, and real-time data processing.
- Key Features: Elastic-benchmark supports customizable workload configurations, including indexing, querying, and data retrieval operations.
- benchANT Integration: No
FTSB
- Origin: Redis, see GitHub
- Purpose: ftsb (Full Text Search Benchmark) is a benchmarking tool designed to evaluate the performance of full-text search engines, focusing on indexing and query processing efficiency.
- Real-world Use Cases: Relevant for assessing the performance of full-text search engines in applications such as document management systems, content indexing, and search-based applications.
- Key Features: ftsb supports customizable workload profiles for indexing and querying large volumes of textual data. It provides metrics for evaluating indexing speed, query response time, and resource utilization of full-text search engines. Currently, only RediSearch is supported.
- benchANT Integration: No
UCSB
- Origin: Unum, see GitHub
- Purpose: UCSB is a complete YCSB reimplementation in C++, supporting additional advanced features like workload isolation, concurrency and bulk operations.
- Real-world Use Cases: Performance Testing of embedded databases like RocksDB, LevelDB, WiredTiger, LMDB and UDisk, but also for Redis and MongoDB. It comes with 8 default workload scenarios.
- Key Features: UCSB supports customizable workload profiles, allowing users to simulate different types of operations, data distributions, and concurrency levels.
- benchANT Integration: No
Deprecated NoSQL Database Benchmarking Suites
- TLP-Stress: The TLP-stress benchmarking suite was developed by a Cassandra IT consulting company, called The Last Pickle. It was written as a replacement for the Cassandra-stress benchmarking suite providing better documentation, a rich command-line user interface and a better understandable code base. Since The Last Pickle was bought by DataStax in 2021 no further updates to the code were shipped. Instead DataStax supports the development of the NoSQLBench suite.
Time Series Database Benchmarking Suites
Time series workloads involve the management and analysis of data that is time-stamped or sequenced in time order. This type of data became important in various industrial fields due to its ability to capture changes over time, providing insights that are not evident from static data. Time series workloads are characterized in particular by a high ingest rate. Therefore, a high throughput is one of the most important KPIs for time series databases. Besides the ingests, the analytical view on this data and data compression can be important characteristics of time series workloads.
Key Use Cases:
- IoT (Internet of Things): IoT devices generate vast amounts of time-stamped data through sensors and devices. Time series databases are adept at handling this continuous influx, supporting real-time monitoring and decision-making.
- Logging/Monitoring: Systems, applications, and network devices produce logs that are time-stamped, i.e: Monitor system health, detect anomalies, and ensure compliance with security policies.
- Financial Transaction: Brokers, banks and crypto application produces also a high amount of time-stamped information for all the financial transactions. The following list of time series benchmarking suites allows you to generate time series workloads for different use cases.
Benchmarking Suite | Use Cases | Last Official Version | Forks | Github Stars | benchANT Integration |
---|---|---|---|---|---|
TSBS | DevOps, IoT | 2021 | 280 | 1200 | yes |
IoT-benchmark | IoT | 2023 | 86 | 174 | yes |
Telegraf-TS | Logging | 2024 | 0 | 3 | yes |
TPCx-IoT | IoT | 2022 | 0 | 0 | no |
TSBS (Time Series Benchmark Suite)
- Origin: Originally developed by InfluxDB, continued by TimescaleDB, see GitHub
- Purpose: TSBS is designed to simulate real-world scenarios for time series data, such as IoT sensor data logging and financial market data analysis.
- Key Features: It offers extensive customization options for queries and data loading, making it highly adaptable to various time series use cases. TSBS supports multiple database systems, providing a broad comparative analysis. It is one of the most used benchmarking suites for time series benchmarking measurements
- benchANT integration: Yes, but we use our own fork due to open pull requests and outdated database bindings in the original suite.
Telegraf-TS
- Origin: Developed by benchANT, based on the YCSB, see GitHub and Blog.
- Purpose: The Telegraf-TS benchmark simulates analytical queries on a typical monitoring system.
- Key Features: The Telegraf-TS benchmarking suite is based on the YCSB benchmarking suite with significant changes. It omits the LOAD phase and assumes that the databases have been loaded using a number of Telegraf instances. In the run phase a sub-set of up to 20 queries will be performed iteratively.
- benchANT integration: Yes
IoT-benchmark
- Origin: Developed at the THU Lab Bejing (China) by the founders of Apache IoTDB, see Github.
- Purpose: The IoT-benchmark suite generates various IoT application workloads.
- Key Features: The benchmark focuses on the simulation of real-time data streams from multiple IoT devices, with several operation modes to simulate different scenarios. The IoT-benchmark supports various time series databases like Apache IoTDB, InfluxDB, QuestDB, VictoriaMetrics,…
- benchANT integration: Yes
TPCx-IoT
- Origin: Transaction Processing Performance Council (TPC), see TPC.org
- Purpose: The TPCx-IoT benchmark from the acknowledged TPC counsil is based on the YCSB benchmark with significant changes for data ingestions and concurrent quries. It simulates the workload of typical IoT gateway systems with data from a electric power station.
- Key Features: TPCx-IoT tests both data ingestion and real-time analytics, providing a holistic view of system performance under IoT scenarios. It offers a standardized metric for comparing IoT platform capabilities across different technology stacks.
- benchANT integration: No
Deprecated Time Series Benchmarking Suites
- TSDBBench is a benchmarking suite developed at the university of Stuttgart (Germany) in 2016 along with an extensive, but nowadays outdated comparison of time series databases. It is a based on the YCSB.
- SmartBench is a benchmarking suite developed at the UCI (California) with several configuration possibilities and a few time series databases.
Vector Benchmarking
Vector data types are relevant in may different fields of Artificial Intelligence (AI), where so-called models are represented as high-dimensional vectors. As such, use cases for vector databases include all domains where high-dimensional vector data needs to be stored, processed, and retrieved including image and video recognition, natural language processing, recommendation systems, similarity analysis, and many others. As with time-series workloads, the database market is split between specialized vector databases on the one hand side and extensions to general purpose database management systems (DBMS).
Key Metrics in Vector Benchmarking: In principal, for vector benchmarking the same fundamental metrics apply as with OLTP, that is throughput, latency, and scalability. Yet, as many Vector Databases make use of Approximate Nearest Neighbors (ANN) search at least one further metric becomes important: recall. Recall denotes the ratio of relevant results that was returned for a query. Recall is usually combined with performance metrics to answer the question of, e.g., maximum throughput for an envisaged minimum
Benchmarking Suite | Last Tag/Commit | Forks | Github Stars | benchANT Integration |
---|---|---|---|---|
ANN Benchmarks | ---/2024 | 702 | 4,674 | no |
Big ANN Benchmarks | 2024/2024 | 101 | 301 | no |
Qdrant vector- db-benchmark | 2022/2024 | 59 | 233 | no |
VectorDBBench | 2024/2024 | 101 | 425 | yes |
NoSQLBench | 2023/2024 | 69 | 162 | no |
pgvectorbench | ---/2024 | 0 | 4 | no |
ANN Benchmarks
- Origin: Built by Erik Bernhardsson from Better Inc. with contributions from Martin Aumüller and Alexander Faithfull from IT University of Copenhagen, Denmark. An acadmemic background paper was published at SISAP 2017. The source code is available on GitHub.
- Purpose and features: Originally, ANN Benchmarks was built to evaluate properties of different algorithms for ANN. For that purpose ANN ships with a python/container-based benchmarking engine and an extension mechanism to plug in different algorithms. It also contains a set of purposely built data sets. Over time, contributors have also provided plug-ins that evaluate full-fledged (yet single-node) databases instead of mere algorithms.
- benchANT Integration: No
Big-ANN Benchmarks
- Origin: Big ANN Benchmarks is basically an evaluation framework for a data challenge. Originally organized as a track of the Conference on Neural Information Processing Systems (NeurIPS), it has turned into an ongoing competition. The source code is available on GitHub.
- Purpose and features: The challenge started off to encourage the development of indexing data structures and search algorithms for practical variants of the Approximate Nearest Neighbor (ANN) or Vector search problem for different scenarios. Even though it is aimed at Big ANN, the datasets used in 2023 were limited to about 10 million points. Comparable to ANN Benchmarks, Big-ANN Benchmarks is focussed on algorithms and data structures and not so much on search engines / databases per se. Also, this framework itself seems to limit the computational power to one server node and hence, does not capture distribution aspects.
- benchANT Integration: No
Qdrant vector-db-benchmark
- Origin: Developped by Qdrant Solutions GmbH. Source code is available on GitHub.
- Purpose and features: Qdrant developed the benchmark mostly to compare how Qdrant performs against other vector search engines. Despite this major goal, they implemented the software as a general, extendable framework with multiple data sets and scenarios. As ANN Benchmarks they also support instantiating the entire benchmarking set-up including clients and data backend. Yet, just as ANN Benchmarks they also seem to be limited to single nodes set-ups and do not support true distribution.
- benchANT Integration: No
VectorDBBench
- Origin: Developped by Zilliz Inc. The Source code is available on GitHub.
- Purpose and features: VectorDBBench is both a results dashboard, configurator, and benchmarking engine for vector data. In contrast to ANN and the Qdrant framework, it does not take care of deploying the databases instances, but assumes these are provided by an out-of-band mechanism (just as many other non-vector benchmarking suites). In consequence, it is not limited to single-node DBMS instances. VectorDBBench ships with hand-crafted data sets that can be used either in performance or in load mode. In the first case, VectorDBBench first loads the data set to the database and then issues queries with increasing parallelism. In the latter case, the load phase is executed repeatedly with an increasing number of data points until the database collapses. It also comes with built-in scoring rules and pricing data which is needed for visualization purposes.
- benchANT Integration: Yes
NoSQLBench
- Origin: Community-driven project, sponsored by DataStax
- Purpose and features: NoSQLBench is a community-driven benchmarking tool focused on evaluating the performance of NoSQL databases across different use cases and workloads (see above). NoSQLBench has been used in the past to run vector benchmarks, e.g. by GigaOm. Yet, the documentation does not unveil which benchmarking scenarios and data sets actually ship with the harness.
- benchANT Integration: No
pgvectorbench
- Origin: Community-driven project
- Purpose and features: pgvectorbench is a benchmarking tool specifically designed for the performance evaluation and optimization of pgvector. It makes use of the data set curated by Zilliz for VectorDBBench. It supports the very same benchmark steps as other tools, but allows to run each step individually, which is great when used in a benchmarking process as benchANT's.
- benchANT Integration: No
Getting Involed
This article spans a wide field and provides an overview of established and emerging database benchmarks for many different workload domains including OLTP, HTAP, OLAP, and time-series and vector workload domains. It also collects several different benchmarks targeted towards NoSQL databases management systems.
Are you missing something that should be here? Either a benchmark suite or a workload domain? Do not hestiate to get in touch and start a discussion! Also get in touch for custom, tailore-made benchmarking.