benchANT Homepage
benchANT Homepage

Database-as-a-Service: What, when, how?

In November 2023 the founders of benchANT released an article in iX, a renowned German IT Journal. Here, we summarize the originally German article in an English version. The iX article includes excerpts from benchANT’s DBaaS study and other articles we have recently published about DBaaS.

This article re-iterates on the DBaaS market and types of providers, discusses when using DBaaS may pay off compared to running self-management DBMS, and finally, illustrates by an example how benchmarking can help with selecting the right DBaaS solution.

Table of Content

Key Insights

  • For cloud-native applications to fully exploit the cloud advantages of scalability and resilience, it is recommended to separate data storage and computing power (compute).
  • DBaaS (Database as a Service) has emerged as a popular approach to database management in the cloud, with numerous offerings based on well-known open-source projects as well as proprietary solutions from hyper-scalers such as AWS, Azure and Google Cloud.
  • Despite higher direct costs compared to IaaS, DBaaS offers economic advantages through automated services such as installation, monitoring, and maintenance, which reduces labour costs and time.
  • Benchmarking supports users to make data-driven decisions when choosing a DBaaS offering

Introduction: The DBaaS Market

Since AWS DynamoDB was released in 2012, a variety of DBaaS offerings have appeared. A lot of them in the last five years. DBaaS operators are database manufacturers, cloud providers and third-party providers – so-called brokers. On a technical level, the market covers a wide range of different database technologies. PostgreSQL, MariaDB and MySQL are particularly popular in the relational area. In the document-based environment, there are a variety of offers for MongoDB and derivatives, but occasionally also Couchbase. When it comes to columnar DBMS, many services are based on Apache Cassandra.

Almost every database manufacturer now provides their own DBaaS offerings. Examples include MongoDB Atlas, Couchbase Capella, MariaDB SkySQL or ScyllaDB Cloud. Most of these products are based on the IaaS of the three hyper-scalers AWS, MS Azure or GPC. Almost all European cloud providers also have DBaaS in their portfolio – including Open Telekom Cloud, StackIT, OVHCloud and others. The transition between DBaaS and managed services is fluid at this point.

DBaaS is often based on popular open-source projects such as PostgreSQL, MySQL or Apache Cassandra. In other cases, providers license enterprise versions of MongoDB or Couchbase. Hyperscalers also offer their own systems (Google Spanner or Alibaba PolarDB) or offer tailored versions of open-source projects (AWS Aurora, Google AlloyDB for PostgreSQL). Just like cloud providers, DBaaS brokers such as Aiven, Scalegrid and Instaclustr usually offer a variety of different DBMS. In contrast to cloud providers, brokers do not maintain their own infrastructure and operate their DBMS instances with different cloud providers as Infrastructure as a Service (IaaS) or, more recently, Container as a Service (CaaS).

DBaaS? Yes, but Which of Them?

At first glance, the per-minute prices of DBaaS offers seem very expensive compared to IaaS. Nevertheless, there are good reasons to use DBaaS instead of running a DBMS yourself, be it on premises or on IaaS. An example of AWS’s RDS: Here, a non-replicated DBaaS instance on a virtual machine from AWS (m6i.4xlarge) with 16 cores, 64 GB of RAM and 1 TB of persistent GP3 storage costs approximately $1,375 per month. The underlying IaaS resources, on the other hand, only cost around $640.

This means that the DBaaS service is more than twice as expensive as pure cloud resources. In return, however, customers receive a range of services that they would otherwise have to realize themselves through working hours and tooling. In particular, the offer includes:

  • Automatic installation and configuration of DBMS instance
  • Monitoring with troubleshooting
  • Maintaining and updating the database
  • Support services and d tuning recommendations

In the best case, this saves the costs of a database administrator. With estimated personnel costs of $10,000 per month, DBaaS can pay off from an economically point of view for small and medium-sized database clusters. A managed cloud project for a 100-node Cassandra cluster, however, is likely to be significantly more expensive than a self-managed solution. Besides paying off, DBaaS services also make sense from a strategic point of view: using DBaaS change may free up human resources and enable a stronger focus on product development. DBaaS also opens easy access to database technologies for which no in-house expertise is available yet.

Ultimately, as with many other cloud computing services, DBaaS increases flexibility, while on the other hand it also introduces additional dependencies to third-party providers, because of the lock-in effects they create. It is therefore important to choose the most suitable service right from the start.

Performance Evaluation with Benchmarking

This raises the question how performance can be determined, especially if, as in green field projects, the applications to be used do not yet exist? In the database area, benchmarking has a long tradition of determining the capabilities of different technologies and different configurations. There are several suites available that create an idealized workload on a DBMS instance to evaluate performance. Different workloads with different requirements prevail in different application scenarios (OLTP, OLAP, caching, time series, ...). Accordingly, there are specialized benchmarks for many scenarios. Those of the Transaction Processing Performance Council (TPC) are widely known. TPC-C and TPC-E focus on OLAP systems, TPCx-HS and TPCx-BB are intended for big data applications, TPC-H and others for the area of analytics. Many more exist for a wide range of other applications.

It is, however, not possible to transfer findings from one type of workload and a matching benchmark to another benchmark. Therefore, it is essential to select the right suite straight away. Many suites also allow parameterization to gradually approximate the real workload.

Workload Modelling

The first step in analyzing performance and costs is to map the real workload to one or more benchmark suites and their configurations. This step is called workload modelling. In many scenarios, multiple workload intensities are used to evaluate the scalability of the DBaaS instance more finely.

In addition to determining the workloads, another step is to define the DBaaS configurations to be analyzed: type of block storage, size of the virtual machines, configuration of the cluster, etc. Before running the benchmarks, testers must define the methodology, which includes the structure and level of automation of the benchmarks. Only almost complete automation ensures comparability of the results and enables re-execution under changed initial conditions. In addition, the number of repetitions should be fixed to obtain statistically reliable results.

When preparing the results, a visual representation of the data is helpful to arrive at a final assessment. Since the results usually only determine performance parameters, interested parties still must relate these to the costs of the respective configuration. Here too, a visual preparation is beneficial.

Performance-cost analysis for PostgreSQL DBaaS

Based on the previous information, what follows is an exemplary, limited comparison of DBaaS offers for PostgreSQL from the two hyper-scalers AWS and MS Azure as well as the European providers Open Telekom Cloud (OTC) and OVHCloud. The benchANT DBaaS navigator summarizes selected aspects regarding deployment and management. We do not go into a detailed discussion of these aspects here. Summarizing briefly, providers offer comparable, but not identical support levels and SLAs. Further, there are differences particularly about the cluster topology, but also to the largest possible configurations and number of disk types available. The following Table 1 provides a high-level comparison of the providers along those lines.

Table 1: Overview on selected provider features
Open Telekom
Cloud PostgreSQL
AWS RDS
PostgreSQL
Azure Data-
base PostreSQL
OVH Managed
PostgreSQL
Cluster TypesSingle,
HA with 2 Nodes
Single,
HA with 2 nodes,
HA with 3 nodes
and load balancing
Single,
HA with 2 nodes
Single,
HA with 2 nodes,
HA with 3 nodes
largest instance60 vCPUs
512 GB RAM
4 TB Disk
128 vCPU
4096 GB RAM
65 TB Disk
64 vCPUs
432 GB RAM
16 TB Disk
32 vCPUs
120 GB RAM
5 TB Disk
Storage types2311
Auto scaling (Compute)nononono
Auto scaling (Storage)noyesnono
Auto tuningnonoyesno
Auto upgradenoyesyesno
InterfacesWeb UI
REST API
CLI
SDK,
Terraform
Ansible
Web UI
REST API
CLI
SDK
Terraform
Ansible
Web UI
REST API
CLI
SDK
Terraform
Web UI
REST API
CLI
SKD
Terraform
BackupScheduled, ManualScheduledContinuousScheduled
GDPRCompany headquartered in EU, EU-GDPR Policy availableCompany headquartered in USA (Cloud Act), EU-GDPR Policy availableCompany headquartered in USA (Cloud Act), EU-GDPR Policy availableCompany headquartered in EU, EU-GDPR Policy available
Certificates20+30+30+10+

All offers follow the resource-oriented cost model. Accordingly, a cost comparison is only possible if both the type of cluster and the size of the virtual machines and storage space are known. The case study therefore uses virtual machines with 16 cores and 64 GB of RAM wherever possible. At the time of creating the study, OVHcloud did not offer such a configuration, so it is not included in our evaluation. All configurations are based on the block storage the vendors provide for I/O-intensive use. The specific configurations are summarized in Table 2 along with the monthly costs.

Table 2: Comparison of Hardware Configuration used for performance comparison
OTC RDSAWS RDSAzure for
PostgreSQL
Virtual Machinedb.s1.4xlarge.pg
16 vCPU, 64 GB RAM
db.m5.4xlarge
16 vCPU, 64 GB RAM
D16ds_v4
16 vCPU, 64 GB RAM
Memory500 GB Ultra High I/O SSD
20.000 IOPS, 320 MB/s
500 GB GP3
12.000 IOPS, 500 MB/s
500 GB
18.000 IOPS, 384 MiB/s
Monthly cost
including Backup
1,333 €1,210 €1,167 €

Benchmarking Workloads

For this evaluation, two read-heavy benchmarks are used to generate the workloads: TPC-H and TATP as provided by BenchBase. TPC-H is designed for analytical application scenarios and uses complex read-only queries. It has CPU-intensive characteristics.

The TATP benchmark simulates a subscriber database from the mobile communications sector. Accordingly, with its predominantly read-heavy load, it represents latency-critical scenarios such as those found not only in telecommunications but also in financial services or reservation systems. Similar to TPC-H, TATP can be CPU intensive, but also has high throughput requirements in network and storage backends.

Evaluation and Insights

The following two figures illustrate the throughput of the three offerings with each benchmark (higher values are better than lower values). The scales of the two graphs cannot be directly compared due to the differences in the methodologies of TPC-H and TATP. TPC-H measures transactions (TXN) that span multiple, sometimes longer-running operations, while TATP consists of lighter operations.

TPC-H Throughput
TATP Throughput

The charts provide interesting insights: For both benchmarks, OTC and AWS are at comparable levels, with AWS achieving 87.5% and 97.6% of OTC's throughput, respectively. MS Azure is 24.8% more powerful than OTC in the case of TPC-H, while in the case of TATP it only achieves 49.1% of the performance of OTC.

A detailed, comparative performance analysis is usually also carried out about other parameters such as latency and its percentiles or memory requirements. It is omitted here, though.

The price-performance ratios of the three offers are shown in the following two figures (lower values are better than higher ones). Here too, the units are different. While the TPC-H case shows the cost per 10,000 transactions, the TATP chart shows the cost per one billion operations.

TATP Cost per 1B Operations

Looking at the results for TATP, it should be highlighted that the price-performance ratios of two configurations are quite close, with slight advantages for AWS compared to OTC. MS Azure, on the other hand, is almost twice as expensive per billion transactions as its competitors.

TPC-H Cost per 10k Transactions

In the case of TPC-H, Azure, the competitor with the highest absolute throughput, is the clear price-performance winner. OTC and AWS are close not only in terms of throughput but also in terms of price-performance. Yet, these are more expensive than MS Azure.

Conclusions

Although the evaluation only allows early results due to the simplified benchmarking methodology, it clearly shows that there is great variability in terms of the providers' pricing and the performance of the different offers. A benchmark-based evaluation of the different services proves to be helpful and necessary to make a well-founded, fact-based decision when choosing a DBaaS.