Minimal Backups

Shawn Bayern

Simple machines deserve simple, inexpensive backup plans. Good tape drives, network arrays, and commercial backup software can be expensive, and for many small but important systems, traditional expensive solutions can be overkill.

On the other hand, hard disks are plentiful and cheap, and they function fine as backup devices for systems with relatively modest backup demands. For several of my systems, I've written and deployed a simple, one-script backup program that can use an extra hard drive. The script offers incremental and full backups using only a handful of lines of shell-script code, and like most scripts, it's easily customized for your own environment.

Admittedly, this script is appropriate only for relatively small machines. In the interest of simplicity, it doesn't take advantage of potential optimizations for speed, disk usage, and so on. Furthermore, using an extra disk drive as a destination for backups isn't always appropriate. For example, if you need your backup media to be frequently transported off-site, hard disks are probably not right for your need. However, my script should work nicely with volumes mounted over NFS, JAZ drives, or other mountable media, so it might still be useful to you if off-site backups are a requirement.

This script (Listing 1) is intended to run once per backup period (e.g., day). The line beginning DEST_DIR points to the directory (presumably, on a mounted volume separate from the data that's being backed up) that will receive the backups. BACKUP_TIMESTAMP points to a file that is important only because of its inode's mtime, which is managed by the backup program. If this file doesn't exist, the backup script performs a full (not incremental) backup; otherwise, it backs up only those files that have an mtime later than the mtime of the timestamp file. (Thus, whenever you desire a full backup, you can simply erase this timestamp file.)

The script simply uses find to find all files modified since the last backup ran, and it simply uses cp to copy them to a predefined destination directory. The appropriate flags for cp (in this case, -aP) may need to vary depending on operating system or specific situation. The flags I've demonstrated are appropriate for the version of cp from the Free Software Foundation (as part of GNU fileutils 4.0x), and I've chosen to preserve as much information as possible about the original files (e.g., owner, permissions) in the versions copied to the backup directory. I recommend the GNU version of cp because it takes a flag --target-directory that simplifies the syntax; if your version of cp doesn't, you'll need to modify the script. (You can find out more about GNU fileutils, including information about where to download it, from:

http://prep.ai.mit.edu/gnulist/production/fileutils.html

The backup script segregates distinct runs of backup by date, creating subdirectories off of your main backup directory that are named by date. It's easy to find different versions of an incrementally modified file. To find all versions of /usr/local/etc/backup.sh, for example, you could run:

% ls -l $DEST_DIR/*/usr/local/etc/backup.sh

where DEST_DIR has the same value as it does in your copy of the script. Writing scripts to automate file restorations for various environments is, therefore, almost trivial.

One more piece of the script deserves explanation. The string passed to egrep excludes files in particular directories from backup. I've chosen not to back up /proc, /tmp, and /dev (and, of course, /hd2, which is the destination for backups). My goal, in addition to showing you a particular script that may simplify an important task, is to demonstrate how straightforward, scripted solutions are both easy to develop and widely useful on UNIX systems. They still have a place in the modern systems administrator's arsenal, amidst expensive and more complex solutions.

Shawn Bayern is a systems programmer in Yale University's ITS Technology and Planning group. As a student and, more recently, a staff member at Yale, he has administered (and written scripts for) systems that serve over 20,000 users. Shawn Bayern can be reached at: [email protected].