PostgreSQL is one of the widely used open-source database management systems. High Availability is one of the must-have requirements for almost all applications nowadays. In this article, we will look at how to setup PostgreSQL high availability with Patroni.
Patroni is an open-source tool to deploy and manage highly available PostgreSQL clusters. However, Patroni developers call it a template as you can create your very own high-availability solution using Python by extending it.
How Patroni Works?
In order to understand how Patroni works, you will have to first know what all components are at play in a Patroni HA setup. Let’s look at them:
- PostgreSQL – The DBMS
- Patroni Agent – This is the Patroni agent that lives on each PostgreSQL node
- DCS – Distributed Configuration Store. This is a distributed system for storing Patroni configuration. We will be using etcd for this setup.
- Load Balancer – Balances the load on the Patroni cluster. We will be using HAProxy for our setup.
This is the basic workflow for how Patroni manages the PostgreSQL DB HA setup:
- The Patroni agent identifies if a failure of the primary node occurred.
- The Patroni agent will promote a sufficiently healthy replica node to the primary node.
- The Load Balancer now routes requests to the new primary node.
- If the failed primary comes back online it is added to the cluster as a replica node.
How to Setup PostgreSQL High Availability with Patroni?
We will be using four machines for this setup, however, you can setup etcd and haproxy on the same machine if you are just trying things out. These are the machines we will be using:
1. Let’s first install PostgreSQL and other required packages on Node1 and Node2. Run these commands to install them.
$ sudo apt update $ sudo apt install postgresql postgresql-contrib libpq-dev python3-dev-y
Stop the PostgreSQL service on both nodes after the installation. Now create the symbolic links for the PostgreSQL binaries in the
/usr/sbin directory as they will be used by Patroni.
$ ln -s /usr/lib/postgresql/12/bin/* /usr/sbin/
2. Now install Patroni and python packages for etcd and PostgreSQL on the nodes Node1 and Node2.
python3 -m pip install patroni python-etcd psycopg2
3. Now install etcd on Node3.
$ apt install etcd -y
4. Finally install HAProxy on the Node4.
$ apt install haproxy -y
Edit the etcd configuration file located at
/etc/default/etcd and set the configuration as follows:
ETCD_LISTEN_PEER_URLS="http://127.0.0.1:7001, http://10.171.40.3:2380" ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379, http://10.171.40.3:2379" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.171.40.3:2380" ETCD_INITIAL_CLUSTER="etcd0=http://10.171.40.3:2380," ETCD_ADVERTISE_CLIENT_URLS="http://10.171.40.3:2379" ETCD_INITIAL_CLUSTER_TOKEN="node1" ETCD_INITIAL_CLUSTER_STATE="new"
Note: Don’t forget to replace the IP 10.171.40.3 with your node’s IP address.
Restart the etcd service after making the changes to apply them.
$ systemctl restart etcd
Let’s now configure Patroni on the nodes Node1 and Node2. We will first start with Node1.
Create a file
/etc/patroni.yml with the following configuration.
scope: postgres-ha name: node1 restapi: listen: 10.171.40.1:8008 connect_address: 10.171.40.1:8008 etcd: host: 10.171.40.3:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator 10.171.40.1/0 md5 - host replication replicator 10.171.40.2/0 md5 - host all all 0.0.0.0/0 md5 users: admin: password: adminpass options: - createrole - createdb postgresql: listen: 10.171.40.1:5432 connect_address: 10.171.40.1:5432 data_dir: /mnt/patroni pgpass: /tmp/pgpass authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: postgres-pass parameters: unix_socket_directories: '.' tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
The data_dir config specifies the directory Patroni will use to store the database data. You will have to create this directory and set appropriate permissions for Patroni to be able to use it.
$ mkdir -p /mnt/patroni $ chown postgres:postgres /mnt/patroni $ chmod 750 /mnt/patroni
You can mount a separate disk in the
/mnt directory and use it as Patroni’s data_dir as database files grow quite large.
Repeat these same steps for Node2, and create the
/etc/patroni.yml configuration file and data directory with correct permissions and ownership. However, you will have to use this Patroni configuration on Node2.
scope: postgres-ha name: node2 restapi: listen: 10.171.40.2:8008 connect_address: 10.171.40.2:8008 etcd: host: 10.171.40.3:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator 10.171.40.1/0 md5 - host replication replicator 10.171.40.2/0 md5 - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdb postgresql: listen: 10.171.40.2:5432 connect_address: 10.171.40.2:5432 data_dir: /mnt/patroni pgpass: /tmp/pgpass authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: postgres-pass parameters: unix_socket_directories: '.' tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
The scope needs to same for all the Patroni nodes however the name has to be distinct for every node. Ensure that all the Patroni nodes point to the same etcd server.
Now let’s create a systemd service for Patroni to manage it. Repeat these steps on both Patroni nodes. Create the service file
/etc/systemd/system/patroni.service with the following configuration.
[Unit] Description=PostgreSQL High Availability Setup with Patroni After=syslog.target network.target [Service] Type=simple User=postgres Group=postgres ExecStart=/usr/local/bin/patroni /etc/patroni.yml KillMode=process TimeoutSec=30 Restart=no [Install] WantedBy=multi-user.target
Now reload the daemon unit files.
$ systemctl daemon-reload
Finally, start the Patroni service and enable it so that it comes up automatically during system boot up.
$ systemctl start patroni $ systemctl enable patroni
Also, disable Postgresql and let Patroni manage the database cluster.
$ systemctl disable postgresql
Now let’s setup the load balancer for our PostgreSQL HA setup on Node4. Edit the HAProxy configuration file at
/etc/haproxy/haproxy.cfg. Remove the default configuration and edit it as follows:
global maxconn 100 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen stats mode http bind *:7000 stats enable stats uri / listen postgres bind *:5000 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server postgresql_10.171.40.1_5432 10.171.40.1:5432 maxconn 100 check port 8008 server postgresql_10.171.40.2_5432 10.171.40.2:5432 maxconn 100 check port 8008
Save the file and restart the HAProxy service to apply the configuration.
$ systemctl restart haproxy
Now the PostgreSQL High Availability Load Balanced cluster is finally up. Let’s now look at the ways how you can access it.
Accessing PostgreSQL Patroni HA Setup
1. Web View
You can view your cluster in the browser using your load balancer’s IP and the port specified in the HAProxy configuration. The port is 7000 for the above setup, hence we can access it using the URL http://10.171.40.4:7000.
This page only shows us the cluster status with the current primary node highlighted in GREEN and the replicas highlighted in RED.
2. PostgresSQL Shell
You can access the PostgreSQL shell using its psql command. Go to one of your Patroni nodes and use the psql utility to access your PostgreSQL HA setup.
You can access the PostgreSQL cluster using the load balancer this way. Notice the port is 7000 which is what we set in the HAProxy configuration. HAProxy passes on the requests to the Patroni nodes on port 5432.
$ psql -h 10.171.40.4 -p 7000 -U postgres -W
Note: Use the IP of your load balancer node instead of 10.171.40.4.
If you want to access the PostgreSQL shell using your Patroni node’s IP use port 5432.
$ psql -h 10.171.40.1 -p 5432 -U postgres -W
Use the password set for the postgres user in the
/etc/patroni.yml configuration file on the password prompt. It will launch you into your PostgreSQL shell.
Note: The Patroni cluster runs separately from the default PostgreSQL cluster hence your databases will not show up in the default PostgreSQL shell. You can access the Patroni databases by launching a PostgreSQL shell managed by Patroni. You can use one of the above commands for it.
Patroni is a very handy tool and makes it very easy to setup a PostgreSQL DB HA cluster. You can setup a big PostgreSQL cluster very quickly using this tool. It is a very easy-to-use tool however if you do face any issues let me know in the comments section below.
I also write interesting articles about Linux and would highly recommend you to give them a read.