PostgreSQL Replication and Hot Standby – In Practice

This article describes how to setup PostgreSQL Replication and Hot Standby on two Amazon EC2 instances running Ubuntu. I spent good chunk of time learning these tricks in past two weeks, in order to implement High Availability solution for 3DTin database server. The standby node can take over when main node goes down, the way it did last month during electric outage of Amazon EC2’s East coast data centers. 3DTin was down for 6 hours during the outage and since then we have been implementing measures to avoid it from happening again in future. This post may help other web developers to build robust cloud backends as well.

The replication technique used here has become available in PostgreSQL only since version 9.1. There are several blog posts and few books that describe it, but they don’t explain everything that is necessary. I found the explanation of these setups in “PostgreSQL 9 Administration Cookbook” very useful. I tried all the replication recipes described in that book on two VMs on my laptop. After going through 7 different database clusters, each used for testing different recipe, I got a good handle on the things. Today we’ve successfully set up this replication solution on 3DTin production database. With the details fresh in mind I would like to document all the steps for future reference and in the hope that it may also help others, given that the documentation on this topic is not yet abundant.

Setup

The instructions here use Ubuntu 11.10 32-bit server, with PostgreSQL 9.1.4 (Ubuntu 12.04 will do as well with any PostgreSQL version 9.1.x).

We are going to prepare 2 hosts, one master and another standby. If you are using Amazon EC2 like us, you want to run these two hosts in two different Availability zones. That is important if your goal is High Availability. Any accidental downtime is not supposed to affect more than one Amazon EC2 availability zones simultaneously. Therefore if one of the host goes down, you can bring up your application servers in another zone where your standby DB server is running. You can also put them in two different regions, but that will cost you more in networking bandwidth.

Preparation

Install following packages on Ubuntu 11.10/12.04

postgresql-9.1
postgresql-client-9.1
postgresql-contrib-9.1

postgres user account

Installing these packages will automatically create postgres user account.

All following commands are supposed to be executed as postgres user. For security reasons this account doesn’t have any password (like the root account on Ubuntu). If you are working in test environment you can set a password for postgres with

sudo passwd postgres

Then perform all following commands by logging in as postgres

su - postgres

If you are on production server though, you may want to leave postgres passwordless. Instead run all following commands with sudo -u postgres prepended. I am going to omit it from every command for convenience.

Password-less ssh logins

MASTER server should be able to access STANDBY over ssh without password. More accurately – postgres account on MASTER should be able to access postgres account on STANDBY without password. This is needed for the rsync operation in base-backup step (explained below)

You can do it using ssh-copy-id utility.

On master run
ssh-copy-id <IP_OF_STANDBY_SERVER>

You can also specify user in above command, but I deliberately omitted it because it is recommended that on both machines you do these operations under same user account postgres. Not mentioning the user will make the command login to remote machine with same username as you are currently logged in on this machine (or mentioned in sudo -u).

In case of failover, MASTER server and STANDBY server will be switching roles, therefore run ssh-copy-id from current STANDBY server, so that it can access current MASTER server without password too.

Ubuntu specific PostgreSQL conventions

On ubuntu a PostgreSQL database instance is called cluster. Don’t confuse it with multi-node configuration of servers or even a certain SQL keyword. It’s simply an instance of PostgreSQL server that runs on a particular port and saves its data in its own directory (default being /var/lib/postgresql/9.1/<cluster-name> – also set to PGDATA environment variable in various scripts).

Ubuntu comes with pg_createcluster, pg_ctlcluster, pg_lscluster CLI commands to help you administer these clusters. You will use pg_ctlcluster to start, stop, reload the database instance.

Each cluster also has its own set of configuration files stored in /etc/postgresql/9.1/<cluster-name>. We will need to modify only two of them: pg_hba.conf and postgresql.conf.

For keeping things simple, it’s better if you ensure that $PGDATA is same on both MASTER and STANDBY servers. (Side note: On production server $PGDATA is expected to be mounted on a separate volume)

Replication with Streaming Log Shipping and Hot Standby

There are many different configurations that you can follow to achieve replication across two PostgreSQL instances. Each has its own benefits and pitfalls. The technique we are going to use is known as ‘Streaming Log Shipping’ and ‘Hot Standby’ in the Admin cookbook I referred to earlier. This setup seems most optimal in terms of immediate replication (i.e. smallest window of data loss) and minimum networking traffic (hence cheaper).

This configuration is of Master-Slave kind. The Slave or STANDBY server can be used to query the database, but you cannot make any writes to it. This configuration is NOT what is known as “Multi Master” configuration. The Master will accept all read and write queries from your application servers. The Standby server will copy the new modifications from Master with minimal delay. It will only be available for read-only queries while in Standby mode. When the Master goes down or we bring it down for other reasons, we can tell the Standby server to become the Master by creating a touch file (explained later). After this point the ex-Standby server will stop following the original Master and will now be ready to accept read-write queries, it will then be the new Master.

Let’s start

If you already have a PostgreSQL database running on production server, that will be your current MASTER server. But note that we are going to configure both MASTER and STANDBY machines as identically as possible, because you will want to make any one of them MASTER in case another goes down and switch back when it comes back up.

Let’s start with current MASTER running and STANDBY stopped.

On the MASTER create replicator user. The STANDBY server will login to the MASTER using this account to read latest changes that it has to replicate. You should not do this step on current STANDBY server, even if it may sometime become a new Master. That is because during the base backup step the replicator user role will get copied to STANDBY server automatically.

psql -c "CREATE USER replicator SUPERUSER LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'changeme';"

Modify pg_hba.conf on both MASTER and STANDBY server by adding this line. This tells respective PostgreSQL instances to accept connection from other node for replication purposes.

host  replication replicator <IP_OF_OTHER_HOST>/32  md5

Modify postgresql.conf on both MASTER and STANDBY servers and add following lines to it. (Check if these options are already set to different values elsewhere in the file)

hot_standby = on
max_wal_senders = 1
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .'
listen_addresses = 'localhost,<IP_ADDRESS_OF_THIS_NODE>'

At this point restart MASTER. Do NOT start STANDBY yet

Next we are going to perform base backup from MASTER to STANDBY

BACKUP_LABEL="base-backup"

psql -p $PORT -c "select pg_start_backup('$BACKUP_LABEL');"
rsync -cva --inplace --exclude=*pg_xlog* $PGDATA/ <IP_OF_OTHER_HOST>:$PGDATA/
psql -p $PORT -c "select pg_stop_backup();"

It’s advised that you put this in a bash script so that it can be run quickly and repeatedly without any errors.

On STANDBY create a recovery.conf file in $PGDATA directory and add following lines to it.

standby_mode = 'on'
primary_conninfo = 'host=<IP_OF_OTHER_HOST> port=$PORT user=replicator password=changeme'
trigger_file = '/tmp/postgresql.trigger.$PORT'

Now start the STANDBY server.

At this point you can check the log file (/var/log/postgresql/postgresql-9.1-<cluster-name>.log) to verify things

  • On STANDBY look for a message “streaming replication successfully connected to primary”.
  • Also you should see “wal receiver process” on STANDBY and “wal sender process” on MASTER.
  • Finally run some queries against the STANDBY server to verify it contains same data as MASTER.

If it’s not working, check the firewall rules in Amazon EC2 Security Groups.

How to do Failover/Switchover

Setting up the system for high availability is of no use, if you are going to wait for actual disaster to strike before trying the recovery. Therefore you should try above steps in test environment and simulate disasters.

If MASTER is not down, make sure you stop it first, before you tell STANDBY to take up that role. This is to avoid the MASTER from processing further queries leading to a split-brain problem.

You can turn the STANDBY into a MASTER by simply touching a trigger file that was mentioned in the recovery.conf, /tmp/postgresql.trigger.$PORT.

Now that STANDBY has turned into MASTER, point your application servers to it. Even if your old MASTER is running at this point, the new MASTER is not going to replicate any changes from it. Therefore it is necessary that you stop the old MASTER before you ask the STANDBY to become the new MASTER.

You can tell that STANDBY has become MASTER from the messages in the log that read “archive recovery complete. database system is ready to accept connections.”

How to do Switchback

After some downtime or maintenance period, your master node is back up again and you want to do switchback. You are going to first turn this node into a standby. In this mode it will catch up with the current MASTER replicating the changes that took place while it was down. So we refer to it as current STANDBY now.

Peform Base backup from current MASTER to current STANDBY

BACKUP_LABEL="base-backup"

psql -p $PORT -c "select pg_start_backup('$BACKUP_LABEL');"
rsync -cva --inplace --exclude=*pg_xlog* $PGDATA/ <IP_OF_OTHER_HOST>:$PGDATA/
psql -p $PORT -c "select pg_stop_backup();"

Create recovery.conf in $PGDATA on current STANDBY

standby_mode = 'on'
primary_conninfo = 'host=<IP_OF_OTHER_HOST> port=$PORT user=replicator password=changeme'
trigger_file = '/tmp/postgresql.trigger.$PORT'

After the catch up is over, you can turn the current STANDBY into MASTER by following above switchover procedure – touch the trigger file.

In addition to being an insurance against disasters, the STANDBY server can also be used for load balancing purposes. You can configure the app servers to spread their read queries across the Master and Standby servers. Tools like pgpool provide that kind of facility.

Once you get comfortable setting up a basic two node replication and hot standby scheme, you can move on to advanced configurations too.

If you find any mistakes or suggestions for improvements in this article please leave a comment.

 

Advertisements
This entry was posted in Development and tagged . Bookmark the permalink.

29 Responses to PostgreSQL Replication and Hot Standby – In Practice

  1. Have you checked out Repmgr. ( http://www.repmgr.org/ ).

    Although PostgreSQL 9+ supports replicated Hot Standby servers which we can query and/or use for high availability., the user is expected to manage the high availability part of it.

    • jyro says:

      Yes, I learnt about repmgr during my research and it is indeed something I would like to use if it makes my life easier in HA management. But first I decided to learn nitty-gritty details of PostgreSQL administration.

  2. flyingmayo says:

    You may want to take a look at pg_basebackup.
    Using this binary would fundamentally change (simplify) the above process.

  3. AndyB says:

    i get an error right at the top;
    su – postgres
    ssh-copy-id
    > /usr/bin/ssh-copy-id: ERROR: No identities found

  4. panyx0718 says:

    should the archive_command be ‘cp .’ instead of ‘cd .’?

  5. jberryman says:

    Not quite sure in what operations or commands $PGDATA is used implicitly, but it actually refers to the config directory (which is as you mention /etc/postgresql/9.1/*/ on ubuntu/debian) rather than the data directory, no? At least that’s what is suggested here: http://archives.postgresql.org/pgsql-docs/2011-10/msg00046.php

  6. Sebastian says:

    hi, i had ubuntu 12.04 and postgresql 9.1, y do all the step mentioned aboved but processes receiver and sender never show up, what i’m doing wrong? In the other hand, my variable $PGDATA it’s shown as empty. Thank a lot!

  7. wxcio says:

    Thanks for sharing this. I am preparing to migrate form Postgres 8.3 (single server) to 9.2 and testing replication is my next step once I have cluster running from my pg data partition.

  8. WellNow says:

    Gotta give my thanks for this easy to follow How-To. Can’t believe it was this easy to set up replication.

    Followed this guide using Mint 14 MATE and PGSQL 9.1

  9. Why do you suggest we create ‘replicator’ user when Streaming Replication feature doesn’t even require it? All you have to do is have correct pg_hba.conf record, which you illustrate correctly, and primary_conninfo = ‘host=$FQDN_or_IP’ would suffice in most cases (add port= if you use non-standard one). I’m just curious why you think/convinced we need it?

  10. JimS says:

    Any qualms about me publishing a fabric script based on your instructions on github?

    Was needing this exact thing and started with what you have as a reference for the script and plan on a few additions as I get time. But, I needed something I could do reproducibly…

  11. Joan Ventura says:

    Hi, I tried to implement the replication in my company, we have many servers with postgres, we started with the report server just to try.. everything was fine with the replication except 1 thing… we have a little problem… let me explain the case… We have the Main server, we do a backup from differents schemas and then we take that backup to the report server.. we do that everyday.. we delete the schemas in the report server and then we reload the schema from the backup created from the Main server.. Every modification that we do in the report server it replicates with no problem to the report_server_slave(The one that we are trying to build) but the problem comes when we reload everyday the differents schemas from the main server… because we dont create the backup with regular INSERT TO tables, we create the backup with COPY, so I think when you do a copy to a table it doesnt replicate that to the slave server.. its like not change has being commited on the report server.

    Any Idea how to fix this.

  12. Truong Pham says:

    This is very helpful article. Thank admin.
    I have one question. How about application database connection after change Slaver database to master database? Is it necessary change the application db connection to new master database?

  13. Lissa says:

    hello,
    i’m getting this error when i’m try to perform base backup from MASTER to STANDBY..can you please help me fix this error..
    thanks in advanced,
    Lissa

    rsync: connection unexpectedly closed (0 bytes received so far) [sender]
    rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6 ]
    NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to be archived
    WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
    HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
    WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (120 seconds elapsed)
    HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.

  14. Lissa says:

    hai, the previous error is already fixed, now i’m facing with the new error where its occur while i’m running the base backup script, here is the error code:

    WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (1920 seconds elapsed)
    HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.

    can you please advise me how to solve this error
    thanks in advanced 🙂

  15. Lissa says:

    hai again, i got another error, this error occur after perform the base backup then i start the standby server. Below is the error code.

    LOG: could not bind IPv4 socket: Cannot assign requested address
    HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
    WARNING: could not create listen socket for “192.168.1.129”
    LOG: redirecting log output to logging collector process
    HINT: Future log output will appear in directory “pg_log”

    please do assist me to solve this problem.
    thanks in advanced
    Lissa

  16. Dan Rough says:

    Jyro, thanks for taking the time to write such a comprehensive article. It has been really useful to me.

    A question though if I may. I’d like to understand why you’re using “cd .” for the archive_command. My understanding of the setup you describe above suggests that you’re not storing your wal segment files elsewhere.

    Why so? Do you have a separate process for archiving your wal segment files? If not, is it fair to assume that you don’t think it’s important to store the segment files in another location?

    I’d love to understand your decision a little more.

    Thanks, Dan.

  17. Allen Shatzer says:

    A question regarding switchovers between Master and Slave and then back. If you shutdown the master prior to making the slave the NEW master, would it be necessary to do another base backup prior to making the OLD master a slave to the NEW master?

    We have some large databases, and having to perform a base backup of the NEW master and restore it on the OLD master prior to making an OLD master the master again is very time consuming.

  18. venkatagurunadh says:

    Once the Primary/Standby node has been failed/restarted what type of notification does the user get

  19. Pingback: PgPool 2 with HA Proxy - DL-UAT

  20. Lewis says:

    Do you know how to set up a multi master scenario on a Windows box? Much appreciate the help!

    Also, can you confirm something for me? With Master-Slave, does the master wait until the Slave WAL has been written before it considers the transaction complete? In which case, does this setup affect performance?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s