PostgreSQL is one of the widely used open-source database management systems. High Availability is one of the must-have requirements for almost all applications nowadays. In this article, we will look at how to setup PostgreSQL high availability with Patroni.
Patroni is an open-source tool to deploy and manage highly available PostgreSQL clusters. However, Patroni developers call it a template as you can create your very own high-availability solution using Python by extending it.
How Patroni Works?
In order to understand how Patroni works, you will have to first know what all components are at play in a Patroni HA setup. Let’s look at them:
- PostgreSQL – The DBMS
- Patroni Agent – This is the Patroni agent that lives on each PostgreSQL node
- DCS – Distributed Configuration Store. This is a distributed system for storing Patroni configuration. We will be using etcd for this setup.
- Load Balancer – Balances the load on the Patroni cluster. We will be using HAProxy for our setup.
This is the basic workflow for how Patroni manages the PostgreSQL DB HA setup:
- The Patroni agent identifies if a failure of the primary node occurred.
- The Patroni agent will promote a sufficiently healthy replica node to the primary node.
- The Load Balancer now routes requests to the new primary node.
- If the failed primary comes back online it is added to the cluster as a replica node.
How to Setup PostgreSQL High Availability with Patroni?
Prerequisites
We will be using four machines for this setup, however, you can setup etcd and haproxy on the same machine if you are just trying things out. These are the machines we will be using:
Server | IP | Application |
Node1 | 10.171.40.1 | PostgreSQL, Patroni |
Node2 | 10.171.40.2 | PostgreSQL, Patroni |
Node3 | 10.171.40.3 | ETCD |
Node4 | 10.171.40.4 | HAProxy |
Installing Packages
1. Let’s first install PostgreSQL and other required packages on Node1 and Node2. Run these commands to install them.
$ sudo apt update
$ sudo apt install postgresql postgresql-contrib libpq-dev python3-dev-y
Stop the PostgreSQL service on both nodes after the installation. Now create the symbolic links for the PostgreSQL binaries in the /usr/sbin
directory as they will be used by Patroni.
$ ln -s /usr/lib/postgresql/12/bin/* /usr/sbin/
2. Now install Patroni and python packages for etcd and PostgreSQL on the nodes Node1 and Node2.
python3 -m pip install patroni python-etcd psycopg2
3. Now install etcd on Node3.
$ apt install etcd -y
4. Finally install HAProxy on the Node4.
$ apt install haproxy -y
Configuring ETCD
Edit the etcd configuration file located at /etc/default/etcd
and set the configuration as follows:
ETCD_LISTEN_PEER_URLS="http://127.0.0.1:7001, http://10.171.40.3:2380"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379, http://10.171.40.3:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.171.40.3:2380"
ETCD_INITIAL_CLUSTER="etcd0=http://10.171.40.3:2380,"
ETCD_ADVERTISE_CLIENT_URLS="http://10.171.40.3:2379"
ETCD_INITIAL_CLUSTER_TOKEN="node1"
ETCD_INITIAL_CLUSTER_STATE="new"
Note: Don’t forget to replace the IP 10.171.40.3 with your node’s IP address.
Restart the etcd service after making the changes to apply them.
$ systemctl restart etcd
Configuring Patroni
Let’s now configure Patroni on the nodes Node1 and Node2. We will first start with Node1.
Create a file /etc/patroni.yml
with the following configuration.
scope: postgres-ha
name: node1
restapi:
listen: 10.171.40.1:8008
connect_address: 10.171.40.1:8008
etcd:
host: 10.171.40.3:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 10.171.40.1/0 md5
- host replication replicator 10.171.40.2/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: adminpass
options:
- createrole
- createdb
postgresql:
listen: 10.171.40.1:5432
connect_address: 10.171.40.1:5432
data_dir: /mnt/patroni
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: rep-pass
superuser:
username: postgres
password: postgres-pass
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
The data_dir config specifies the directory Patroni will use to store the database data. You will have to create this directory and set appropriate permissions for Patroni to be able to use it.
$ mkdir -p /mnt/patroni
$ chown postgres:postgres /mnt/patroni
$ chmod 750 /mnt/patroni
You can mount a separate disk in the /mnt
directory and use it as Patroni’s data_dir as database files grow quite large.
Repeat these same steps for Node2, and create the /etc/patroni.yml
configuration file and data directory with correct permissions and ownership. However, you will have to use this Patroni configuration on Node2.
scope: postgres-ha
name: node2
restapi:
listen: 10.171.40.2:8008
connect_address: 10.171.40.2:8008
etcd:
host: 10.171.40.3:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 10.171.40.1/0 md5
- host replication replicator 10.171.40.2/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 10.171.40.2:5432
connect_address: 10.171.40.2:5432
data_dir: /mnt/patroni
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: rep-pass
superuser:
username: postgres
password: postgres-pass
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
The scope needs to same for all the Patroni nodes however the name has to be distinct for every node. Ensure that all the Patroni nodes point to the same etcd server.
Now let’s create a systemd service for Patroni to manage it. Repeat these steps on both Patroni nodes. Create the service file /etc/systemd/system/patroni.service
with the following configuration.
[Unit]
Description=PostgreSQL High Availability Setup with Patroni
After=syslog.target network.target
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
Now reload the daemon unit files.
$ systemctl daemon-reload
Finally, start the Patroni service and enable it so that it comes up automatically during system boot up.
$ systemctl start patroni
$ systemctl enable patroni
Also, disable Postgresql and let Patroni manage the database cluster.
$ systemctl disable postgresql
Configuring HAProxy
Now let’s setup the load balancer for our PostgreSQL HA setup on Node4. Edit the HAProxy configuration file at /etc/haproxy/haproxy.cfg
. Remove the default configuration and edit it as follows:
global
maxconn 100
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
listen stats
mode http
bind *:7000
stats enable
stats uri /
listen postgres
bind *:5000
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server postgresql_10.171.40.1_5432 10.171.40.1:5432 maxconn 100 check port 8008
server postgresql_10.171.40.2_5432 10.171.40.2:5432 maxconn 100 check port 8008
Save the file and restart the HAProxy service to apply the configuration.
$ systemctl restart haproxy
Now the PostgreSQL High Availability Load Balanced cluster is finally up. Let’s now look at the ways how you can access it.
Accessing PostgreSQL Patroni HA Setup
1. Web View
You can view your cluster in the browser using your load balancer’s IP and the port specified in the HAProxy configuration. The port is 7000 for the above setup, hence we can access it using the URL http://10.171.40.4:7000.
This page only shows us the cluster status with the current primary node highlighted in GREEN and the replicas highlighted in RED.
2. PostgresSQL Shell
You can access the PostgreSQL shell using its psql command. Go to one of your Patroni nodes and use the psql utility to access your PostgreSQL HA setup.
You can access the PostgreSQL cluster using the load balancer this way. Notice the port is 7000 which is what we set in the HAProxy configuration. HAProxy passes on the requests to the Patroni nodes on port 5432.
$ psql -h 10.171.40.4 -p 7000 -U postgres -W
Note: Use the IP of your load balancer node instead of 10.171.40.4.
If you want to access the PostgreSQL shell using your Patroni node’s IP use port 5432.
$ psql -h 10.171.40.1 -p 5432 -U postgres -W
Use the password set for the postgres user in the /etc/patroni.yml
configuration file on the password prompt. It will launch you into your PostgreSQL shell.
Note: The Patroni cluster runs separately from the default PostgreSQL cluster hence your databases will not show up in the default PostgreSQL shell. You can access the Patroni databases by launching a PostgreSQL shell managed by Patroni. You can use one of the above commands for it.
Conclusion
Patroni is a very handy tool and makes it very easy to setup a PostgreSQL DB HA cluster. You can setup a big PostgreSQL cluster very quickly using this tool. It is a very easy-to-use tool however if you do face any issues let me know in the comments section below.
I also write interesting articles about Linux and would highly recommend you to give them a read.