This article describes how to setup PostgreSQL Replication and Hot Standby on two Amazon EC2 instances running Ubuntu. I spent good chunk of time learning these tricks in past two weeks, in order to implement High Availability solution for 3DTin database server. The standby node can take over when main node goes down, the way it did last month during electric outage of Amazon EC2’s East coast data centers. 3DTin was down for 6 hours during the outage and since then we have been implementing measures to avoid it from happening again in future. This post may help other web developers to build robust cloud backends as well.
The replication technique used here has become available in PostgreSQL only since version 9.1. There are several blog posts and few books that describe it, but they don’t explain everything that is necessary. I found the explanation of these setups in “PostgreSQL 9 Administration Cookbook” very useful. I tried all the replication recipes described in that book on two VMs on my laptop. After going through 7 different database clusters, each used for testing different recipe, I got a good handle on the things. Today we’ve successfully set up this replication solution on 3DTin production database. With the details fresh in mind I would like to document all the steps for future reference and in the hope that it may also help others, given that the documentation on this topic is not yet abundant.
Setup
The instructions here use Ubuntu 11.10 32-bit server, with PostgreSQL 9.1.4 (Ubuntu 12.04 will do as well with any PostgreSQL version 9.1.x).
We are going to prepare 2 hosts, one master and another standby. If you are using Amazon EC2 like us, you want to run these two hosts in two different Availability zones. That is important if your goal is High Availability. Any accidental downtime is not supposed to affect more than one Amazon EC2 availability zones simultaneously. Therefore if one of the host goes down, you can bring up your application servers in another zone where your standby DB server is running. You can also put them in two different regions, but that will cost you more in networking bandwidth.
Preparation
Install following packages on Ubuntu 11.10/12.04
postgresql-9.1
postgresql-client-9.1
postgresql-contrib-9.1
postgres user account
Installing these packages will automatically create postgres
user account.
All following commands are supposed to be executed as postgres
user. For security reasons this account doesn’t have any password (like the root account on Ubuntu). If you are working in test environment you can set a password for postgres
with
sudo passwd postgres
Then perform all following commands by logging in as postgres
su - postgres
If you are on production server though, you may want to leave postgres
passwordless. Instead run all following commands with sudo -u postgres
prepended. I am going to omit it from every command for convenience.
Password-less ssh logins
MASTER server should be able to access STANDBY over ssh without password. More accurately – postgres
account on MASTER should be able to access postgres
account on STANDBY without password. This is needed for the rsync
operation in base-backup step (explained below)
You can do it using ssh-copy-id utility.
On master run
ssh-copy-id <IP_OF_STANDBY_SERVER>
You can also specify user in above command, but I deliberately omitted it because it is recommended that on both machines you do these operations under same user account postgres
. Not mentioning the user will make the command login to remote machine with same username as you are currently logged in on this machine (or mentioned in sudo -u).
In case of failover, MASTER server and STANDBY server will be switching roles, therefore run ssh-copy-id from current STANDBY server, so that it can access current MASTER server without password too.
Ubuntu specific PostgreSQL conventions
On ubuntu a PostgreSQL database instance is called cluster. Don’t confuse it with multi-node configuration of servers or even a certain SQL keyword. It’s simply an instance of PostgreSQL server that runs on a particular port and saves its data in its own directory (default being /var/lib/postgresql/9.1/<cluster-name>
– also set to PGDATA
environment variable in various scripts).
Ubuntu comes with pg_createcluster
, pg_ctlcluster
, pg_lscluster
CLI commands to help you administer these clusters. You will use pg_ctlcluster
to start, stop, reload the database instance.
Each cluster also has its own set of configuration files stored in /etc/postgresql/9.1/<cluster-name>
. We will need to modify only two of them: pg_hba.conf
and postgresql.conf
.
For keeping things simple, it’s better if you ensure that $PGDATA is same on both MASTER and STANDBY servers. (Side note: On production server $PGDATA is expected to be mounted on a separate volume)
Replication with Streaming Log Shipping and Hot Standby
There are many different configurations that you can follow to achieve replication across two PostgreSQL instances. Each has its own benefits and pitfalls. The technique we are going to use is known as ‘Streaming Log Shipping’ and ‘Hot Standby’ in the Admin cookbook I referred to earlier. This setup seems most optimal in terms of immediate replication (i.e. smallest window of data loss) and minimum networking traffic (hence cheaper).
This configuration is of Master-Slave kind. The Slave or STANDBY server can be used to query the database, but you cannot make any writes to it. This configuration is NOT what is known as “Multi Master” configuration. The Master will accept all read and write queries from your application servers. The Standby server will copy the new modifications from Master with minimal delay. It will only be available for read-only queries while in Standby mode. When the Master goes down or we bring it down for other reasons, we can tell the Standby server to become the Master by creating a touch file (explained later). After this point the ex-Standby server will stop following the original Master and will now be ready to accept read-write queries, it will then be the new Master.
Let’s start
If you already have a PostgreSQL database running on production server, that will be your current MASTER server. But note that we are going to configure both MASTER and STANDBY machines as identically as possible, because you will want to make any one of them MASTER in case another goes down and switch back when it comes back up.
Let’s start with current MASTER running and STANDBY stopped.
On the MASTER create replicator user. The STANDBY server will login to the MASTER using this account to read latest changes that it has to replicate. You should not do this step on current STANDBY server, even if it may sometime become a new Master. That is because during the base backup step the replicator user role will get copied to STANDBY server automatically.
psql -c "CREATE USER replicator SUPERUSER LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'changeme';"
Modify pg_hba.conf
on both MASTER and STANDBY server by adding this line. This tells respective PostgreSQL instances to accept connection from other node for replication purposes.
host replication replicator <IP_OF_OTHER_HOST>/32 md5
Modify postgresql.conf
on both MASTER and STANDBY servers and add following lines to it. (Check if these options are already set to different values elsewhere in the file)
hot_standby = on
max_wal_senders = 1
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .'
listen_addresses = 'localhost,<IP_ADDRESS_OF_THIS_NODE>'
At this point restart MASTER. Do NOT start STANDBY yet
Next we are going to perform base backup from MASTER to STANDBY
BACKUP_LABEL="base-backup"
psql -p $PORT -c "select pg_start_backup('$BACKUP_LABEL');"
rsync -cva --inplace --exclude=*pg_xlog* $PGDATA/ <IP_OF_OTHER_HOST>:$PGDATA/
psql -p $PORT -c "select pg_stop_backup();"
It’s advised that you put this in a bash script so that it can be run quickly and repeatedly without any errors.
On STANDBY create a recovery.conf file in $PGDATA directory and add following lines to it.
standby_mode = 'on'
primary_conninfo = 'host=<IP_OF_OTHER_HOST> port=$PORT user=replicator password=changeme'
trigger_file = '/tmp/postgresql.trigger.$PORT'
Now start the STANDBY server.
At this point you can check the log file (/var/log/postgresql/postgresql-9.1-<cluster-name>.log
) to verify things
- On STANDBY look for a message “streaming replication successfully connected to primary”.
- Also you should see “wal receiver process” on STANDBY and “wal sender process” on MASTER.
- Finally run some queries against the STANDBY server to verify it contains same data as MASTER.
If it’s not working, check the firewall rules in Amazon EC2 Security Groups.
How to do Failover/Switchover
Setting up the system for high availability is of no use, if you are going to wait for actual disaster to strike before trying the recovery. Therefore you should try above steps in test environment and simulate disasters.
If MASTER is not down, make sure you stop it first, before you tell STANDBY to take up that role. This is to avoid the MASTER from processing further queries leading to a split-brain problem.
You can turn the STANDBY into a MASTER by simply touching a trigger file that was mentioned in the recovery.conf, /tmp/postgresql.trigger.$PORT
.
Now that STANDBY has turned into MASTER, point your application servers to it. Even if your old MASTER is running at this point, the new MASTER is not going to replicate any changes from it. Therefore it is necessary that you stop the old MASTER before you ask the STANDBY to become the new MASTER.
You can tell that STANDBY has become MASTER from the messages in the log that read “archive recovery complete. database system is ready to accept connections.”
How to do Switchback
After some downtime or maintenance period, your master node is back up again and you want to do switchback. You are going to first turn this node into a standby. In this mode it will catch up with the current MASTER replicating the changes that took place while it was down. So we refer to it as current STANDBY now.
Peform Base backup from current MASTER to current STANDBY
BACKUP_LABEL="base-backup"
psql -p $PORT -c "select pg_start_backup('$BACKUP_LABEL');"
rsync -cva --inplace --exclude=*pg_xlog* $PGDATA/ <IP_OF_OTHER_HOST>:$PGDATA/
psql -p $PORT -c "select pg_stop_backup();"
Create recovery.conf in $PGDATA on current STANDBY
standby_mode = 'on'
primary_conninfo = 'host=<IP_OF_OTHER_HOST> port=$PORT user=replicator password=changeme'
trigger_file = '/tmp/postgresql.trigger.$PORT'
After the catch up is over, you can turn the current STANDBY into MASTER by following above switchover procedure – touch the trigger file.
In addition to being an insurance against disasters, the STANDBY server can also be used for load balancing purposes. You can configure the app servers to spread their read queries across the Master and Standby servers. Tools like pgpool provide that kind of facility.
Once you get comfortable setting up a basic two node replication and hot standby scheme, you can move on to advanced configurations too.
If you find any mistakes or suggestions for improvements in this article please leave a comment.
Have you checked out Repmgr. ( http://www.repmgr.org/ ).
Although PostgreSQL 9+ supports replicated Hot Standby servers which we can query and/or use for high availability., the user is expected to manage the high availability part of it.
Yes, I learnt about repmgr during my research and it is indeed something I would like to use if it makes my life easier in HA management. But first I decided to learn nitty-gritty details of PostgreSQL administration.
You may want to take a look at pg_basebackup.
Using this binary would fundamentally change (simplify) the above process.
i get an error right at the top;
su – postgres
ssh-copy-id
> /usr/bin/ssh-copy-id: ERROR: No identities found
You probably don’t have a default SSH key in $HOME/.ssh. Try
ssh-keygen -t rsa
thanks Jyro. linux noob here.
should the archive_command be ‘cp .’ instead of ‘cd .’?
I think ‘cp .’ is a wrong command. But I always get cannot request XXX segment has been deleted error..Can you give me some suggestions?
“cd .” is basically a noop. This procedure doesn’t do anything on archive_command.
Not quite sure in what operations or commands $PGDATA is used implicitly, but it actually refers to the config directory (which is as you mention /etc/postgresql/9.1/*/ on ubuntu/debian) rather than the data directory, no? At least that’s what is suggested here: http://archives.postgresql.org/pgsql-docs/2011-10/msg00046.php
hi, i had ubuntu 12.04 and postgresql 9.1, y do all the step mentioned aboved but processes receiver and sender never show up, what i’m doing wrong? In the other hand, my variable $PGDATA it’s shown as empty. Thank a lot!
Thanks for sharing this. I am preparing to migrate form Postgres 8.3 (single server) to 9.2 and testing replication is my next step once I have cluster running from my pg data partition.
Gotta give my thanks for this easy to follow How-To. Can’t believe it was this easy to set up replication.
Followed this guide using Mint 14 MATE and PGSQL 9.1
Why do you suggest we create ‘replicator’ user when Streaming Replication feature doesn’t even require it? All you have to do is have correct pg_hba.conf record, which you illustrate correctly, and primary_conninfo = ‘host=$FQDN_or_IP’ would suffice in most cases (add port= if you use non-standard one). I’m just curious why you think/convinced we need it?
Answering my own question, it is in fact a recommended approach as explained here http://www.postgresql.org/docs/9.1/static/warm-standby.html in section 25.2.5.1. Authentication
Any qualms about me publishing a fabric script based on your instructions on github?
Was needing this exact thing and started with what you have as a reference for the script and plan on a few additions as I get time. But, I needed something I could do reproducibly…
Feel free to post a github script based on this article. I would appreciate if you post the link to your script in comments, I can link to it at the end of the article.
I plan on expanding it further, but here is the initial script —
https://github.com/jmspring/postgres/blob/master/master_slave_setup.py
Thanks again.
Hi, I tried to implement the replication in my company, we have many servers with postgres, we started with the report server just to try.. everything was fine with the replication except 1 thing… we have a little problem… let me explain the case… We have the Main server, we do a backup from differents schemas and then we take that backup to the report server.. we do that everyday.. we delete the schemas in the report server and then we reload the schema from the backup created from the Main server.. Every modification that we do in the report server it replicates with no problem to the report_server_slave(The one that we are trying to build) but the problem comes when we reload everyday the differents schemas from the main server… because we dont create the backup with regular INSERT TO tables, we create the backup with COPY, so I think when you do a copy to a table it doesnt replicate that to the slave server.. its like not change has being commited on the report server.
Any Idea how to fix this.
This is very helpful article. Thank admin.
I have one question. How about application database connection after change Slaver database to master database? Is it necessary change the application db connection to new master database?
hello,
i’m getting this error when i’m try to perform base backup from MASTER to STANDBY..can you please help me fix this error..
thanks in advanced,
Lissa
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6 ]
NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to be archived
WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (120 seconds elapsed)
HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
hai, the previous error is already fixed, now i’m facing with the new error where its occur while i’m running the base backup script, here is the error code:
WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (1920 seconds elapsed)
HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
can you please advise me how to solve this error
thanks in advanced 🙂
hai again, i got another error, this error occur after perform the base backup then i start the standby server. Below is the error code.
LOG: could not bind IPv4 socket: Cannot assign requested address
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
WARNING: could not create listen socket for “192.168.1.129”
LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory “pg_log”
please do assist me to solve this problem.
thanks in advanced
Lissa
Jyro, thanks for taking the time to write such a comprehensive article. It has been really useful to me.
A question though if I may. I’d like to understand why you’re using “cd .” for the archive_command. My understanding of the setup you describe above suggests that you’re not storing your wal segment files elsewhere.
Why so? Do you have a separate process for archiving your wal segment files? If not, is it fair to assume that you don’t think it’s important to store the segment files in another location?
I’d love to understand your decision a little more.
Thanks, Dan.
archive_command must return 0, as per the documentation. this may also have been set to /bin/true.
A question regarding switchovers between Master and Slave and then back. If you shutdown the master prior to making the slave the NEW master, would it be necessary to do another base backup prior to making the OLD master a slave to the NEW master?
We have some large databases, and having to perform a base backup of the NEW master and restore it on the OLD master prior to making an OLD master the master again is very time consuming.
Once the Primary/Standby node has been failed/restarted what type of notification does the user get
Pingback: PgPool 2 with HA Proxy - DL-UAT
Do you know how to set up a multi master scenario on a Windows box? Much appreciate the help!
Also, can you confirm something for me? With Master-Slave, does the master wait until the Slave WAL has been written before it considers the transaction complete? In which case, does this setup affect performance?