ARM Instances for DBMS: Price-Efficient or Just Cheap?

Data centers and public cloud providers advertise low-cost ARM virtual machines.

But are they suitable for running database management systems?

We looked at quite small ARM instances for small-scale applications and benchmarked them against comparably large Intel-based x86 VMs.

The test results are available here!

The Benchmark-Setting

ARM chips are conquering tablets, cell phones, and notebooks, and now there are also ARM virtual machines in data centers and public cloud providers like AWS EC2.

ARM relies on a scaled-down chip architecture and is meant to be an energy-efficient alternative compared to x86 processors that are optimized for peak performance.

But are these ARM VMs suitable for efficiently running a DBMS instance for typical web-oriented workloads?.

We have started a small benchmark series to provide some initial insights into this little-studied area.

We have already shown in a whitepaper that even VMs with identical core and memory dimensions can have an impact on database performance that should not be underestimated.

What is the situation with ARM machines?

Cloud

In the first part of the ARM series, we use very small VMs with 2 CPUs and 4 GB RAM each on AWS EC2.

a1.large: AWS Graviton processors with 64-bit ARM Neoverse cores.
t3.medium: AWS general purpose instances with peak load performance (burst).
c6i: AWS Intel Xeon scalable processor for data processing and compute-intensive workloads.

Virtual machines of this size as a single-node instance are suitable for running a database for a small web application, but also as a developer instance for a database during prototyping and application creation.

Database

As a database we use one of the most popular and widespread NoSQL databases

MongoDB (version 4.4.6) in the vanilla standard configuration
As a single-node instance without replication.

This database is installed identically and fully automated by our platform on all VMs.

Benchmark

As a benchmark, we use the Yahoo Cloud Serving Benchmark in version 0.17.0 with the following parameters:

Data set size: 5 KB
Initial data size: 10 GB
Read-Write distribution:
- A) social-media: 50% Read / 50% Write
- B) e-commerce: 90% Read / 10% Read
Read access pattern:
- A) social-media: ZIPFIAN
- B) e-commerce: LATEST

The full parameters can be found in the YCSB command:

The two defined workloads A) social media and B) e-commerce reflect typical web-based workloads. With the help of these ideal-typical workloads, the performance of the different setups is evaluated and compared.

The workload is sized to be suitable for the selected instances while not representing a pure in-memory workload. The entire data volume cannot be kept completely in memory due to the size.

Benchmark execution

The benchmarks were measured in a fully automated manner using the benchANT platform.

benchANTs benchmarking platform is based on over 7 years of research & development in a university context and is a major part of the completed PhD of my colleague Dr. Daniel Seybold.

Each configuration setup was measured 3x and the data aggregated at the end. The performance measurement under load ran for 30 minutes per benchmark run.

As performance KPIs we consider

the Throughput,
the Costs (cloud compute costs)
the Throughput/Cost-Ratio
the Read Latency (95% quantile) and
the Write Latency (95% quantile)

For more up-to-date benchmarking data on performance & scalability of MongoDB, see our MongoDB vs Apache Cassandra Study.

Throughput, Costs and Throughput/Cost-Ratio.

Which VM type is best suited for running a cloud database for web-based workloads?

We look at the throughput (operations / second) and relate this value to the costs in order to make a statement about the economic efficiency of the instances.

Cost

The main reason for selecting an ARM instance is the advertised efficiency in terms of energy and cost. The on-demand prices of the various instances for one month are:

a1.large: $ 47.23 / € 41.85
c6i.large: $ 72.05 / € 63.84
t3.medium: $ 40.37 / € 35.77 (unbursted)

The ARM instance is thus slightly more expensive than a usual t3 burst instance. However, a t3 instance can quickly become many times more expensive due to higher load (> 20% CPU load), so this comparison is difficult if the base load is consistently exceeded. For consistent high load patterns, this would not be recommended. In our case, this is the case with a CPU load of just under 70%.

The c6i instance optimized for data processing is about 50% more expensive than the ARM VM, but also promises the desired suitability for data processing.

Note: Depending on the AWS cloud region, prices differ minimally. In addition, significantly lower monthly prices can be achieved through reserved instances.

Throughput

When it comes to throughput, there is a clear performance difference between the ARM and non-ARM instances.

a1.large: 4,181 ops/s (eCommerce) - 4,257 ops/s (social media)
c6i.large: 7,166 ops/s (eCommerce) - 7,424 ops/s (social media)
t3.medium: 8,014 ops/s (eCommerce) - 9,150 ops/s (Social-Media)

A performance difference of this magnitude is unusual. The ARM instances drop significantly and even the t3 instances cannot keep up with the c6i instance specialized in data processing.

Throughput/cost ratio

ARM instances perform more favorably than other VMs of the same size, but they also perform significantly worse. In such a case, the throughput/cost ratio provides a good indication of economic efficiency:

What performance do I get for the same money?

a1.large: 99-101 operations/euro
c6i.large: 125-143 operations/euro
t3.medium: 198-205 operations/euro

The difference between the ARM and the c6i instance is about 25-40% more performance for the same money, depending on the workload considered.

The values of the t3 instance should be taken with a grain of salt, as the real costs can be significantly higher. Due to the high utilization of the t3 burst instances, the use of the burst functionality results in a multiple of the default costs shown (attention!).

ARM instances small: throughput, cost and throughput/cost ratio

Read Latency and Write Latency.

In addition to looking at throughput and cost, for some web applications it is also useful to look at the read & write latency of the cloud database setup to provide a fast response to the end user.

Lower latency is always better than high latency in this regard.

In each case, we look at the 95% percentile, meaning that 95% of all operations were faster than the specified values.

Write Latency

The write latency of the ARM processors drops significantly compared to the x86-based instances. In addition, there are sometimes significant differences between the two different load patterns.

a1.large: 19.5 ms (e-commerce) - 21.65 ms (social media)
c6i.large: 7.5 ms (social media) - 12 ms (e-commerce)
t3.medium: 11.7 ms (e-commerce) - 12.7 ms (social media)

The write latency of the ARM instance is more than twice as high as that of the c6i instance.

The t3 instance also performs slightly slower than the c6i instance.

For web applications, however, the write latency is more often significantly less important than the read latency.

Read Latency

A completely different picture emerges for the read latency. The ARM instances are more than able to compete here.

a1.large: 19.75 ms (e-commerce) - 25.3 ms (social media)
c6i.large: 24.75 ms (e-commerce) - 38.1 ms (social media)
t3.medium: 15.26 ms (social media)- 22.8 ms (e-commerce)

The read latencies differ greatly depending on the workload and VM type. The c6i in particular shows a significant weakness in the social media workload and is significantly slower than the ARM setup.

ARM instances small: read latency and write latency

Conclusion

ARM instances are cheaper than comparable instances of the same size.

However, they also have a significantly lower throughput and a worse throughput/cost ratio. Only in terms of read latency can they keep up or even partly surpass their alternatives.

ARM instances do not seem to fulfill their (marketing) promise and are not an optimal choice for DBMS instances.

However, what does such a small sample size tell us?

In the second part of the series, we will benchmark larger ARM VMs.

About benchANT

benchANT is a spin-off from the University of Ulm. The two technical co-founders have more than 20 years of combined research experience with distributed systems and performance engineering, specifically in the area of database management systems.

With the benchANT platform, they have developed an automated benchmarking tool that enables any IT architect and system/database administrator to quickly and efficiently, run cloud database benchmarks and make decisions based on performance measurements.

In addition, benchANT also advises on on-prem vs. cloud decisions and helps with resource selection and performance optimization - always based on reliable performance measurements.