Friday, November 9, 2007

I tend to get myself involved with projects that have a lot of small files -- on the order of a few million. These files are often only good to me for a short period of time, such as when I'm importing a million xml files into a SQL Database for processing later. I've also worked on projects where millions of spam emails pile up in queues to be filtered by humans, which they obviously never will be, and simply need to be removed.

The annoying thing about this is the time/effort it takes to remove all of these files quickly and with as little system resource consumption as possible.

In this entry I'm going to show you a couple of ways you can set your self up for quick removal of files that you know (or are pretty sure) will be temporary, so you need to quickly remove them.

More...

Traditionally, if you had a big directory structure to remove, you would simply use rm -rf to remove the files:

rm -rf /home/mybigpath

Of course, this will take forever and chew up all sort of valuable system resources. We want our CPUs being used for business logic, not recursive directory searching and unlink operation!

So how do we avoid doing an unlink of all of those files? Well the key here is to set ourselves up knowing that we're going to remove these files in the first place. Basically, we're going to create another file system specifically for these millions of files, and when we're done with them, we'll simply remove/format the file system we were using. There are several ways we can do this, just to name a few:

  1. We can dedicate whole partitions of hard drives (logic or physical) to this temporarly location.
  2. We can use something lvm (or lvm2) to manage these temporary chunks of disk space.
  3. We can use loop-device file mounting to create truly temporary file stores that require one unlink operation to blow away!

I personally like #3 for thinks I know are going to be deleted, and #2 for things that I might want to keep around, so I need to be safe.

I'm going to first show you option #3 (loop-device file), as this is the easiest to make use of. The basic steps for this:

  1. Create an "empty" file on your real file system.
  2. Format that file as a file system block.
  3. Mount the file using the "-o loop" option.
  4. Create millions/billions/trillions of files within the newly mounted file system, and do whatever processing needs to be done on them.
  5. When you're done with the files, unmount the file and simply delete the large fake file system, which is a very simple and efficient operation for the OS to do.

Let's dig into the details



~# cd /mnt
/mnt# mkdir fakepoint
# This creates a 200MB file
/mnt# dd if=/dev/zero of=fake.img bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 6.93631 seconds, 30.2 MB/s

# We're using reiserfs in this example, but any file system should work
/mnt# mkreiserfs -f fake.img
/mnt# mount -o loop /mnt/fake.img /mnt/fakepoint
/mnt# df -h /mnt/fakepoint/
Filesystem Size Used Avail Use% Mounted on
/mnt/fake.img 200M 33M 168M 17% /mnt/fakepoint

/mnt# cd /mnt/fakepoint/

# create 1,000,000 dummy files
/mnt/fakepoint# perl -e
'for(1..1_000_000) { open(F,">",$_); print F "hello"; close(F) }'

# do a bunch of processing on your files

# what if we need to expand our file
# because we didn't allocate enough?
/mnt# umount /mnt/fakepoint/

# seek 201mb into the file (so we don't overwrite what we have)
# and add another 200mb
/mnt# dd if=/dev/zero of=fake.img bs=1M count=200 seek=201
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 5.97021 seconds, 35.1 MB/s

# resize the "file system"
/mnt# resize_reiserfs fake.img

# Re-mount
/mnt# mount -o loop /mnt/fake.img /mnt/fakepoint
/mnt# df -h /mnt/fakepoint/
Filesystem Size Used Avail Use% Mounted on
/mnt/fake.img 401M 132M 270M 33% /mnt/fakepoint

# all done with these files, we want to blow them away now
# simple as this:
/mnt# umount /mnt/fakepoint/
/mnt# rm fake.img

Notice that we can even re-size this "fake" file system. This allows us nearly total flexibility in how we handle these millions of files.

With this technique, you could even "transfer" these files from one Linux system to another by simply copying the file system-file (fake.img, in this case) from one computer to another.

This works very well for simple projects and things that are temporary. If you want a more permanent solution, you could use lvm2 to create a real logical disk partition that could be managed with enterprise class tools. Most of the concepts stay the same, except you would use lvm2 to do all the dist-size management (the "dd" steps.)

On a side note, here is a nice tutorial on using loop-devices to create encrypted data blocks. Very cool stuff.

Labels: ,

Wednesday, September 19, 2007

The Correct way to Backup PostgreSQL

Begin a full time DBA can make you realize how important backups are. Enterprise database systems like Oracle and Microsoft's SQL Server provide excellent backup mechanisms for doing "live" backups of the databases that guarantee consistency and allow for point-in-time recovery.

In the open source world, however, we're left with old skoul methods of doing backups such as cold backups, or even worse, text-based "dumps" of the databases. I'll let you waste your time reading about how MySQL "recommends" you backup a database, and how PostgreSQL's pg_dump works. These methods are very resource intensive (creating a bunch of "insert" statements and formatting everything as plain text) and can even lock our real working processes (in the case of MySQL) while it is doing the backup!



Thankfully, the folks as PostgreSQL introduced Online-Backups in version 8.0. In my opinion, this is the single biggest advancement in this release of the database engine. It how handles it's backup is a more Oracle-before-RMAN fashion (which worked great for many many years for Oracle.) It basically works like this:

  1. You setup the database instance to make use of, and keep historical "logs" (WAL for PostgreSQL, "arhive redo logs" for Oracle.) A contiguous chain of these logs, plus a full backup from some point in the past (more in this in a second) can be used to "replay" every transaction/change in your database. MySQL sort of implements this with it's binary logs, but it is not quite the same thing.

  2. You tell the database how to backup these "logs"

  3. You do a "full backup" of the database which is on-line, so no body is blocked from doing any work. In fact, the only work involved in this backup is really the I/O operations required to do a binary copy of the data files to a backup location.

  4. Once you tell the database engine you are done with the backup, it will start (or continue) backing up the logs to a location of your choosing.


In the event of a failure or need for recovery you can take the full backups (step 3) and apply the saved logs (steps 1, 2 and 4) to recover your database. Even cooler then that, you can point-in-time recovery. Which essentially means that if you had somebody "accidentally" delete a bunch of data at 1:34:00 PM, you can restore the database up to 1:33:59 PM and get back the lost data!

I'll let you read more details on the official PostgreSQL backuppage about it, but I would highly recommend this method of backup and recovery to any PostgreSQL DBA that has critical data to take care of.

Labels: , ,

Friday, September 14, 2007

ServAdmin Open Sourced

I've decided to release a web-hosting application that I had built from scratch a couple of years ago, ServAdmin. You can access the SVN repository on Google code here: http://servadmin.googlecode.com/svn/trunk/

Some of the features:

  • Single administrative panel to control any number of servers (unlimited)
  • Written from scratch -- does not rely on 3rd party packages
  • Written in about 95% PHP5 code.
  • Secure communication between servers using stunnel.
    • This allows you to have servers spread across the Internet where communications between them will happen on open networks, and still maintain a secure environment.
  • OS-independent - Right now I've only written the 'Linux' server pieces of code, but this could be easily applied to any operation system. It is designed in such a way that you simply have to write the layer for whatever OS you're concerned with.
  • Controls MySQL, Apache, Bind and vpopmail to make a complete hosting environment.

If anybody out there is interested in picking up development, just let me know. It is really only a few polishes away from being a complete product.


Labels: , , , ,