PostgreSQL DBaaS – In-depth Query Performance Analysis

A few weeks ago we published an article about cost/performance results for PostgreSQL DBaaS products of the three hyperscalers AWS, MS Azure and Google Cloud Platform.

We were surprised by the massive performance differences and decided to take a closer look at the performance of individual queries to identify possible reasons.

This is going to be an in-depth dive into query performance.

Suit up!

Chapter 1: Key Insights
Chapter 2: TATP Query Performance
- Throughput
- Latency
Chapter 3: TPC-H Query Performance
- Throughput
- Latency
Chapter 4: Résumé
Appendix
- Benchmarking Specifications
- Workload Specifications

Key Insights – Query Performance Analysis of PostgreSQL DBaaS

In our PostgreSQL DBaaS Performance/Costs article we were surprised by the huge performance differences. For a simple workload AWS RDS for PostgreSQL beat Azure Database PostgreSQL and Google CloudSQL PostgreSQL by 100%. For the more complex TPC-H workload Azure Database for PostgreSQL outperforms by 42% and 110%.
We promised to take a deeper dive into this. Here we start with the single query performance to see if we can find out differences in tuning or hardware-related issues.
Performance differences for Database-as-a-Service (DBaaS) products for the same database technology can have several technological reasons: CPU, memory, storage, network, database tuning or differences in several software implementation layers
Identifying and understanding the reasons are a multistage process, looking on the issue from different viewpoints and doing fitting measurements.
The TATP throughput results are not helpful on query level because they are grouped together in one transaction whereas the result metric is only measured on transaction level.
The TATP latency results give clear hints for Azure Database for PostgreSQL with worse write latencies and better read latencies than the AWS and GCP DBaaS solutions. This can result from a bottleneck on the storage layer.
The TPC-H throughput metrics per query show some strengths for a few queries, which are expression-strong with a large IN clause. Also we saw some weaknesses for GCP CloudSQL PostgreSQL for queries with complex subqueries.
The latency results for TPC-H are also showing some remarkable results for different queries. But are still not solving the origin of the huge performance differences.
The query analysis gives some hints about possible reasons for the performance differences, but further investigations from different angles are necessary.

TATP Benchmark Query Results for PostgreSQL DBaaS

The TATP benchmark suite from the BenchBase Multi-DBMS SQL Benchmarking Framework simulates a telecommunication application with users, services, subscriptions and location. The workload consists of 7 queries and simulates look-ups for calls and update and deletes for user specific data. It is built as a perfect example for a demanding, high-throughput environment. The queries are very basic SQL statements without many JOINs, aggregation functions or subquery.

The distribution per query can be adjusted freely. We used the default distribution as shown in the table.

The execution of the seven TATP queries are grouped together into one transaction.

Throughput TATP	Default Query Distribution	Query Type	JOINS
Query#1: DeleteCall Forwarding	2%	Read (SELECT) & Write (DELETE)	0
Query#2: GetAccess Data	35%	Read (SELECT)	0
Query#3: GetNew Destination	10%	Read (SELECT)	1
Query#4: GetSubscri berData	35%	Read (SELECT)	0
Query#5: InsertCall Forwarding	2%	Read (SELECT) & Write (INSERT)	0
Query#6: Update Location	14%	Read (SELECT) & Write (UPDATE)	0
Query#7: UpdateSub scriberData	2%	Write (UPDATE)	0

TATP PostgreSQL DBaaS Performance

TATP - PostgreSQL DBaaS - Query Throughput

The TATP benchmark suite capsulate all queries into transactions and allocates the throughput as transaction per second in the result logs.
Therefore, no differences can be seen on query level compared to the aggregated results. Azure Database PostgreSQL (blue) shows a 45% lower throughput than AWS RDS PostgreSQL. Google CloudSQL PostgreSQL shows a 48% lower performance compared to AWS RDS PostgreSQL.
The two read queries - query #2 and query #4 – are weighted at 70% in the default benchmark configuration are mainly responsible for the throughput results of this benchmark.

TPC-H Benchmark Query Results for PostgreSQL DBaaS

The TPC-H Benchmark Suite is an established relational benchmark. It was developed in the early 2000s by the TPC Performance Group, and updated permanently since then. It is well-known and used as an analytical workload simulating business-decision queries about customers, orders, suppliers and parts. Not only that, but it consists of 22 single read queries, which are distributed equally in the default configuration. The queries are complex analytical queries with JOINS, Aggregations, GROUP BYs and subqueries.

Throughput TPC-H	Default Query Distribution	Query Type	JOINS	AGGREGATIONS	GROUP BY	ORDER BY	SUBQUERY
Query#1	4.5%	Read (SELECT)	0	non-KEY: 8	non-KEY: 2	non-KEY: 2	0
Query#2	4.5%	Read (SELECT)	4	0	0	PRIMARY: 1, non-KEY: 3	1
Query#3	4.5%	Read (SELECT)	2	non-KEY: 1	PRIMARY: 1, non-KEY: 2	non-KEY: 1 (limit 10)	0
Query#4	4.5%	Read (SELECT)	0	non-KEY: 1	non-KEY: 1	non-KEY: 1	1
Query#5	4.5%	Read (SELECT)	5	non-KEY: 1	non-KEY: 1	non-KEY: 1	0
Query#6	4.5%	Read (SELECT)	0	non-KEY: 1	0	0	0
Query#7	4.5%	Read (SELECT)	5	non-KEY: 1	non-KEY: 3	non-KEY: 3	1
Query#8	4.5%	Read (SELECT)	7	non-KEY: 1	non-KEY: 1	non-KEY: 1	1
Query#9	4.5%	Read (SELECT)	5	non-KEY: 1	non-KEY: 2	non-KEY: 2	1
Query#10	4.5%	Read (SELECT)	3	non-KEY: 1	PRIMARY: 1, non-KEY: 4	non-KEY: 1	0
Query#11	4.5%	Read (SELECT)	2	non-KEY: 1	PRIMARY: 1 (Having)	non-KEY: 1	1
Query#12	4.5%	Read (SELECT)	1	non-KEY: 2	non-KEY: 1	non-KEY: 1	0
Query#13	4.5%	Read (SELECT)	1	non-KEY: 1	non-KEY: 1	non-KEY: 2	1
Query#14	4.5%	Read (SELECT)	2	non-KEY: 2	0	0	0
Query#15	4.5%	Read (CREATE VIEW)	0	non-KEY: 1	PRIMARY: 1	0	0
Query#16	4.5%	Read (SELECT)	1	PRIMARY: 1	non-KEY: 3	non-KEY: 4	1
Query#17	4.5%	Read (SELECT)	1	non-KEY: 1	0	0	1
Query#18	4.5%	Read (SELECT)	1	non-KEY: 1	PRIMARY: 2, non-KEY: 3	non-KEY: 2	1
Query#19	4.5%	Read (SELECT)	1	non-KEY: 1	0	0	0
Query#20	4.5%	Read (SELECT)	1	0	0	non-KEY: 1	3
Query#21	4.5%	Read (SELECT)	3	non-KEY: 1	non-KEY: 1	non-KEY: 2	3
Query#22	4.5%	Read (SELECT)	0	non-KEY: 2	non-KEY: 1	non-KEY: 1	1

A classification of these queries was done in this scientific research paper by Peter Boncz et al.. A simplified visualization of this classification can be found in the following graphic, for details please read the paper.

TPC-H PostgreSQL DBaaS Throughput

Looking at the individual query results does not provide a clear picture for analysis. Therefore, we have additionally calculated a relative observation using a heat map, where we use the AWS results as baseline.

TPC-H - PostgreSQL DBaaS - Query Throughput

TPC-H query throughput - relativ deviation

The TPC-H performance of Azure Database for PostgreSQL over AWS RDS (+42%) and Google CloudSQL (#110%) was already shown in the former article. But there are also a few queries, which are outperforming even more, and there are a few queries, where the differences are significantly smaller.
The queries #16, #19 and #22 show the most outperformance for Azure DBaaS. Both queries are listed as “expression calculation = strong” and have a large IN Clause in the SQL query.
Google CloudSQL performs, compared to the AWS product, bad on the query #11 which has a correlated subquery in the HAVING clause. Query #7 and #8 also have subqueries in their FROM section.
Besides the few outstanding queries, most of the query performance is in a similar range around the average performance: +42% for Azure / AWS and - 31% GCP / AWS. This is expected, but does not explain the general differences in performance.
It is not possible to explain this results with 100% certainty. Maybe the DBaaS PostgreSQL tuning parameters can give us a hint on that in a follow-up research.

TPC-H PostgreSQL DBaaS Latency

The picture for the latencies is far more diverse than for the throughput.
Azure Database for PostgreSQL show extreme lower latencies for Q#3 and Q#10, both rated as “aggregation performance = strong” with 3 JOINS and a GROUP BY with primary keys.
Cloud SQL for PostgreSQL shows significant higher latencies for Q#13 and Q#19, which are rated “expression calculation = strong”.

Résumé

Analyzing the query performance of the two benchmark suites showed some interesting differences. The latency results of the TATP measurements showed a clear picture for the write operation at Azure Database for PostgreSQL.

But for the TPC-H it is much harder to make any clear statements due to complex, similar read-only queries. Also, the results show a more divers picture.

Sure, we found some hints why the immense performance differences exist, but overall we need to take a different angle to get more insights into that.

Reverse engineering performance issues is a complex, multi-stage approach from different points of view.

Appendix – Benchmarking Specification

DBaaS Configuration
DBaaS Configuration	AWS RDS	Azure PostgreSQL	GCP PostgreSQL
PostgreSQL Version	14.7	14.7	14.7
Cluster Type	single node	single node	single node
Cloud	AWS	Azure	GCP
Region	eu-central-1	northeurope	europe-west3
Instance Type	db.m5.4xlarge	Standard_D16ds_v5	standard
Instance Specs	16 vCores/64 GB RAM	16 vCores/64 GB RAM	16 vCores/64 GB RAM
Storage Type	GP3	GeneralPurpose	SSD
Storage Size	500 GB	512 GB	500 GB
VPC Peerings	not enabled	not enabled	not enabled

Cloud Configuration
Cloud Configuration	Benchmark Instance
Cloud	same as DBaaS
Region	same as DBaaS
Instance Type	general purpose
Instance Specs	16 vCores/32 GB RAM
Storage Type	standard
Storage Size	50 GB
VPC Peerings	not enabled

Appendix – Workload & Database Specification

BenchBase Configuration
BenchBase Configuration	TATP	TPC-H
Version	master	master
Runtime	30m	30m
Terminals	140	10
Scale Factor	40	5