benchANT Homepage
benchANT Homepage

Database Benchmarking

Learn all about the basics of database benchmarking!

This article briefly explains the What? The why? And the how?

In addition, an exclusive technical chapter gives you deep insight into the subject.

Let's get started!

What is Database Benchmarking?

Database benchmarking is a clearly defined method for analysing, measuring and comparing performance metrics of database management systems.

Since the end of the 1980s, it has gained a lot of importance. However, since the 2010s, database benchmarking has been used more and more incompletely operationally to measure database performance and make quantitative decisions.

There were a few reasons for this:

  • Many new, heterogeneous database technologies with different drivers, functions and schemas.
  • The architectural shift from single-node to distributed database systems.
  • The change in mindset from "One-DBMS-Fits-All" to Polyglot Persistence.

These factors increased the effort and complexity of database benchmarking. At the same time, the previous technical benchmarking solutions did not evolve.

Why is Database Benchmarking Important?

In short: database benchmarking enables well-founded quantitative decisions based on measured values for the selection and optimisation of databases.

Why is this important? Every database has strengths and weaknesses due to its design and implementation. Every software application has its own read/write workload that is processed in the database.

Is the database able to process this workload effectively and efficiently? Database benchmarking provides an answer to this before the database is implemented. It also enables efficient optimisation of the database configuration.

What is the Goal of Database Benchmarking?

Jim Gray, a founding father of database benchmarking, described the goal of database benchmarking in 1993 as follows?

Which computer (server) should I buy?

The system that does the job and has the lowest cost-of-ownership!

Database benchmarking is about efficiency and the performance/cost ratio. This is still one of the most important KPIs and combines technical requirements with business aspects.

Database benchmarking results according to Jim Gray

Note: At that time, databases and servers were only available as a bundle. With this business model, IBM, among others, became a global corporation. There were huge benchmark lists containing all server performance data. Many were produced by the "Transaction Processing Performance Council" (TPC). This performance consortium still exists and regularly publishes new benchmarks on modern workload profiles.

How to Benchmark a Database?

The benchmarking method follows the following generic process:

  1. define the framework (process, boundaries, objectives)
  2. identify all relevant entities (products, suppliers, solutions)
  3. measure all options
  4. compare results

    This process can be transferred 1:1 to database benchmarking.

1. framework & goals

What is the objective of benchmarking?

  • The database with the best performance/cost ratio?
  • Cost optimisation?
  • Performance tuning?

And what constraints exist?

  • SLAs that have to be met
  • Technical constraints due to expertise, requirements, ...
  • Economic constraints due to budget, licence agreements, ...

2. identify entities

What databases exist and what configurations and variants exist?

  • Data models: Relational, NoSQL, NewSQL?
  • Database systems:
    • Relational: MySQL, PostgreSQL,
    • NoSQL: MongoDB, Cassandra
    • NewSQL: CockroachDB, MemSQL

STOP! NO! NOT ONLY!

The crucial thing is to identify your own workload!

  • Which "user types" generate the workload?
    • Users
    • Applications
    • Sensors
    • Cronjobs
  • Distribution of the CRUD workload
    • Read - Write - Update - Delete
    • Time distribution over the day / week / year
  • Initial number of data sets
  • Size of the individual data sets
  • Future development of data sets

What the workload is and why it plays a decisive role will be explained in Contribution 1-4..

Benchmarking questions

3. measure performance

Set up a measurement procedure to run the workload on the databases and measure performance.

And how does this work?

The crucial component is the benchmarks that allow modelling and application of workloads on a database.

The setup for measuring and storing performance quantities such as throughput and latency must first be implemented. There are existing frameworks and tools that support at least classic non-cloud database benchmarking.

4. compare results

The measured results are stored in a database or in files. For a benchmark analysis, this data must now be transformed into a visualisation that enables decision-making. If costs are relevant in addition to performance, these must be researched separately and integrated into the performance measurement results.

Common forms of visualisation are bar charts, time series diagrams or box plots.

In the end, for example, you have a bar chart for each performance metric with the results of the different database setups.

  • Throughput
  • Read latency
  • Write latency
  • Cost ratio
  • Throughput-Cost ratio
TPC-C database benchmarking results

Influence Parameters of Database Benchmarking

Databases belong to the highly configurable systems and every modern database now offers over 100 different configuration options to configure the database for specific use cases.

These configuration options have a direct influence on performance, but also on consistency, partition tolerance and availability guarantees.

However, no database can be optimised with respect to each of these properties without sacrificing other properties. This fact was defined by Eric Brewer in 2000 in the Keynote at the PODC Conference in CAP Theorem and is still generally valid today.

A reconsideration in 2012 refines the CAP theorem in that the decision between consistency, availability and partition tolerance need not be binary, as modern database systems offer fine-granular configuration options for each of these properties.

Our scientific findings show, however, that even adjusting individual configuration parameters, such as applying a stricter consistency guarantee, can lead to up to 90% throughput losses.

Consistency influence on database throughput

Conclusion

In this article you have learned the basics of database benchmarking and you now know the goal and the benchmarking process.

You can find a more detailed article on database benchmarking here in our blog.

The next post will delve deeper into database benchmarking and adapt it to distributed systems like the cloud.

Do you have any questions?