Wednesday, September 19, 2007

The Correct way to Backup PostgreSQL

Begin a full time DBA can make you realize how important backups are. Enterprise database systems like Oracle and Microsoft's SQL Server provide excellent backup mechanisms for doing "live" backups of the databases that guarantee consistency and allow for point-in-time recovery.

In the open source world, however, we're left with old skoul methods of doing backups such as cold backups, or even worse, text-based "dumps" of the databases. I'll let you waste your time reading about how MySQL "recommends" you backup a database, and how PostgreSQL's pg_dump works. These methods are very resource intensive (creating a bunch of "insert" statements and formatting everything as plain text) and can even lock our real working processes (in the case of MySQL) while it is doing the backup!



Thankfully, the folks as PostgreSQL introduced Online-Backups in version 8.0. In my opinion, this is the single biggest advancement in this release of the database engine. It how handles it's backup is a more Oracle-before-RMAN fashion (which worked great for many many years for Oracle.) It basically works like this:

  1. You setup the database instance to make use of, and keep historical "logs" (WAL for PostgreSQL, "arhive redo logs" for Oracle.) A contiguous chain of these logs, plus a full backup from some point in the past (more in this in a second) can be used to "replay" every transaction/change in your database. MySQL sort of implements this with it's binary logs, but it is not quite the same thing.

  2. You tell the database how to backup these "logs"

  3. You do a "full backup" of the database which is on-line, so no body is blocked from doing any work. In fact, the only work involved in this backup is really the I/O operations required to do a binary copy of the data files to a backup location.

  4. Once you tell the database engine you are done with the backup, it will start (or continue) backing up the logs to a location of your choosing.


In the event of a failure or need for recovery you can take the full backups (step 3) and apply the saved logs (steps 1, 2 and 4) to recover your database. Even cooler then that, you can point-in-time recovery. Which essentially means that if you had somebody "accidentally" delete a bunch of data at 1:34:00 PM, you can restore the database up to 1:33:59 PM and get back the lost data!

I'll let you read more details on the official PostgreSQL backuppage about it, but I would highly recommend this method of backup and recovery to any PostgreSQL DBA that has critical data to take care of.

Labels: , ,