benchANT Homepage
benchANT Homepage

PostgreSQL DBaaS – Performance Tuning Differences of AWS, Azure and GCP

In the last weeks, we identified huge performance differences for the PostgreSQL DBaaS solutions from AWS, GCP and MS Azure.

We already did an in-depth query performance analysis for different queries and are now going to take a deeper look into performance-critical settings of PostgreSQL for these 3 DBaaS products.

We are getting closer!

Table of Contents

Key Insights – PostgreSQL DBaaS Configuration

  • In our initial PostgreSQL DBaaS evaluation of AWS RDS, MS Azure Database and GCP CloudSQL, which was focused on Performance/Cost, we identified huge performance differences for varying workload scenarios. Based on these results we decided to dig deeper.
  • We identified clear performance advantages for AWS RDS when stressed with the TATP workload. In contrast, running the TPC-H benchmark showed significant advantages for MS Azure Database for PostgreSQL.
  • In the first in-depth analysis we investigated the query performance of 30 different SQL queries from the TATP and the TPC-H workload.
  • We identified longer latencies for MS Azure Database at write queries, but better latencies and throughput for complex SQL queries with aggregations and expression calculations as they are found in TPC-H.
  • Looking at the parameters we can see more non-default parameters at the AWS DBaas product, than at the solutions of GCP and Azure.
  • Azure Database for PostgreSQL uses RAM for caching, which is helpful for complex queries like in the TPC-H benchmark. This setting preference backs the good performance results for read-heavy operations.
  • GCP uses a bigger shared_buffer, but has no performance advantages with this setting.

PostgreSQL Configuration - Parameters Overview

PostgreSQL has over 350 configuration, additionally it can be extended with hundreds of extensions.

When it comes to Database-as-a-Service products for PostgreSQL you will often find pre-made configurations diverting from the default configuration shipping with PostgreSQL.

We expect to find answers in the configurations provided by AWS, Azure and GCP that help us to understand the significant performance differences of these 3 PostgreSQL-based managed database solutions.

Not all 350 configuration parameters are performance critical. Many of them are in fact administrative settings to handle backups, logs, DevOps, recovery and many other features. When it comes to performance critical settings, we need to focus on the workload, too. Not all workloads profit from the same configurations or from the same configuration setting.

A good starting point to PostgreSQL configuration performance tuning is PGTune. It allows you to specify your settings, workload type and infrastructure on a high-level and suggests the value for a handful of parameters, which are performance relevant.

PGTune Screenshot

Another good source is Percona’s blog, for example this article about PostgreSQL tuning. Percona is an open-source database consulting company based in the US. They publish technical articles about MySQL, PostgreSQL and MongoDB.

But of course there are many other blog posts about PostgreSQL configuration tuning out there. If you start digging in to that, you can and will find a lot of user stories, best practices and so on.

In the following, we focus our analysis on 20 parameters, introduce you some of them and analyze differences and similarities.

Configuration Parameter Analysis of PostgreSQL DBaaS

In our analysis, we first look into some general statistics of the configurations to get a bigger picture and then focus on 20 important configuration settings.

The config parameter lists of these DBaaS products can be downloaded from the running database servers. They are usually saved in the postgresql.config file.

The full config parameter files can be found in our GitHub Repository.

Statistical Analysis of the Configuration Parameter

AWSAzureGCP
Config Parameters374409396
Identical Parameters349349349
Provider Extension
Parameters
241426
Identical Values266266266
Identical Values
to AWS
NA276277
Identical Values
to Azure
276NA300
Identical Values
to GCP
277300NA
  • Due to different extensions the number of overall parameters is different for each DBaaS service. All of them use the pg_stat_statement extension, provider specific extensions and public libraries.
  • There are 349 parameters which all 3 providers have in common. 266 of these parameters have identical settings for all 3 DBaaS products.
  • GCP and Azure have 300 parameters with identical settings, while AWS only has 276/277 identical to GCP and Azure respectively. This means that AWS has changed more on the default settings than the other DBaaS products did.
  • But how does the picture look like for the most performance relevant parameters?

PostgreSQL Performance Parameters in the Focus

Before we analyze the different values, we take a look into some important configuration parameters:

  • server_version: The PostgreSQL version, which is used on the server side, in this case the DBaaS server. We have shown in several articles that performance differences for different major and minor versions for nearly any database exist, not only PostgreSQL. The performance often is worse for newer versions due to more features and security add-ins. That is the reason why we always recommend performance measurements before upgrading the database version.
  • shared_buffers: PostgreSQL uses its own buffer and also kernel-buffered IO. With the shared_buffer setting you can adjust the dedicated memory to the PostgreSQL built-in buffer. And this is often the most effective tuning parameter. It is recommended to set it to 25% of your total RAM, but it also depends on the workload.
  • wal_buffers: The buffer setting for the Write-Ahead-Log can be set with this parameter. For many concurrent connections , it is recommended to increase this parameter beyond the the default 16MB.
  • effective_cache_size: This parameter gives a guideline for the memory available for disk caching and important for the optimizer’s query plan. For complex queries a higher value is beneficial.
  • min/max_wal_size: These parameters can determine the size of the files/segments for the Write-Ahead-Log checkpoints. For performance, larger timed checkpoints can be beneficial, yet penalizing recovery as discussed in the the EDB Blog.
  • max_connections: The number of concurrent network connections is set by this parameter. Increasing the number of parallel connections can help to increase the performance but can also overload the database if the physical resources cannot handle the connections. Read more about that on CyberTec PostgreSQL Blog.

PostgreSQL Performance Parameters - Detailed Analysis

ParameterDefault ValueAWS RDS
Value
Azure Database
Value
GCP CloudSQL
Value
server_version14.714.914.9
shared_buffers131072 KB202908920971522744192
wal_buffers16384 KB8192163842048
max_connections10050005000800
effective_cache_size4194304 KB405817862914563293088
maintenance_work_mem65536 KB106444039219265536
max_parallel_maintenance_workers22642
max_worker_processes83288
max_parallel_workers8888
max_parallel_workers_per_gather2222
work_mem4096 KB409640964096
min_wal_size80 MB1928080
max_wal_size1024 MB2048256001504
huge_pagestryontrytry
random_page_cost4424
checkpoint_timeout300300600300
default_statistics_target100100100100
effective_io_concurrency1111
synchronous_commitonononon
  • The first obvious difference is the different PostgreSQL server version for AWS RDS. Although all measurements were done in the same week, AWS RDS has an older 14.7 PostgreSQL server version compared to the 14.9 versions of Azure Database and GCP Cloud SQL for PostgreSQL.
  • When it comes to the shared_buffers size, GCP CloudSQL uses 2.7 GB, while AWS RDS and Azure Database are configured to use 2.0 / 2.1 GB. CloudSQL hadn’t the best performance in our measurements, so the 2 GB shared_buffer probably was no performance killer.
  • The max_connections is limited at GCP CloudSQL to 800, while AWS and Azure allow up to 5000 parallel connections. But this has not an impact on the performance results, because we used less parallel clients in our measurements.
  • The effective_cache_size at Azure Database for PostgreSQL is 6.3GB and thus much higher than for AWS and GCP. This can be a reason for the good performance at the complex TPC-H queries.
  • We also see many important performance-critical settings which are at the default value and not tuned by any of the three DBaaS providers.

Summary

In this article, we took a look at the PostgreSQL configurations used in the PostgreSQL DBaaS offerings of the hyper-scalers. Summarizing, we can conclude that there is a correlation between the performance seen for simple and complex read operations and the configuration used.

Yet, this does not necessarily mean man the configurations are the sole reason for the performance differences. Other reasons may lie in differences of the software-hardware stacks (different servers, different CPUs, different hypervisors, different storage connectivity, etc.) and policies such as scheduling and overprovisioning that are used by different providers.

As many of the config parameters can be set by the DBaaS users, it is an interesting question whether aligning the PostgreSQL configs also more closely aligns the performance results. This is something we shall investigate in the upcoming weeks.

In any case, it is never a mistake to closely look at the database configuration you are using, even when buying the DBaaS .