Amanda
Backup Enhanced with SolarisTM Snapshots
Julian Briggs
This article discusses the development of a system to improve
the reliability of backups, using Amanda backup software (see "Configuring
Amanda", Sys Admin, April 2002, http://www.samag.com/documents/s=7033/sam0204a/sam0204a.htm),
enhanced with Solaris filesystem snapshots (see "Free Snapshots?",
Sys Admin, January 2002, http://www.samag.com/documents/s=1824/sam0201j/0201j.htm).
In my group, we manage a medium-sized (about 400 hosts), heterogeneous
(Linux, Solaris, Windows), academic network. Four Solaris servers
provide central compute power, data storage, and network services.
We back up about 400 Gb of data on a 7-day cycle to an LTO 200 tape
drive (nominally 200 Gb using hardware compression, as we do) using
Amanda (http://www.amanda.org), an excellent, mature, free-software
backup utility, with about a 150-Gb holding disk. We suffered from
increasing dump errors and restore failures on our large (more than
50 Gb), filesystems used overnight by write-intensive research applications.
Backing up active filesystems is dangerous because inodes may
change during the dump. This makes restores unreliable because the
restored filesystem may be corrupt, or the restore may fail, typically
with this error:
changing volumes on pipe input
abort? [yn]
This problem is exacerbated by ufsdump dumping all the
directories first, then (perhaps several hours later on a large
filesystem) dumping the last of the files. Meanwhile, some of those
files may have been deleted so the directory inode on tape has entries
for non-existent files.
The standard recommendation to avoid this is to unmount the filesystem,
or to do the backup in single-user mode. Neither method is practical
on a large, heavily used server. Many sites simply run backups at
quiet times (overnight or weekends) and tolerate occasional dump
errors and restore failures.
The Solution
The solution we adopted was to create a snapshot of the filesystem,
which preserves a static view that we can then dump reliably. Sun
introduced a snapshot utility, fssnap, as a patch in Solaris
8, integrated into Solaris 9. The challenge is to make this work
with Amanda. We tried three approaches:
1. Create snapshots of all filesystems, run Amanda, then delete
the snapshots.
2. Use an executable automount map to create snapshots on demand.
3. Use a wrapper to ufsdump to create, dump, and delete snapshots.
In this article, I'll describe each of these approaches, addressing
both issues arising during implementation and issues remaining unresolved.
I'll discuss these as they emerge, so issues in an early approach
are relevant to later ones, and I'll show the code for each approach.
Create Snapshots, then Run Amanda
This is a simple approach. We start our Amanda run at midnight,
so prior to that, we run a cron job on each server to create snapshots
of all filesystems and mount these under /snap/ (e.g., /snap/_var_mail).
We found the following issues arising during implementation.
Bugs
There are several known bugs in fssnap. /usr/sbin/fssnap
behaves very differently from the man page description (in Solaris
8 and Solaris 9). A workaround is to use /usr/lib/fs/ufs/fssnap
(Sun Bug ID: 4446301. 17 Apr 2001).
fssnap fails to create snapshots of / and /var
if xntpd (NTP stands for Network Time Protocol) is running.
The problems are that xntpd runs in real-time mode and uses
/ and /var filesystems; fssnap temporarily
locks a filesystem with fslock when creating a snapshot;
and filesystems used by real-time processes cannot be locked. A
workaround is to stop xntpd, run fssnap, then start
xntpd (Sun Bug ID: 4699740. 12 Jun 2002). (fssnap
can happily delete snapshots while xntpd is running.)
fssnap errors from snapshots in use in /var/adm/messages:
... fssnap: [ID 964769 kern.warning] WARNING: snap_strategy: error
calling snap_getchunk, chunk = 611890, offset = 24576, len = 196,
resid = 196, error = 5.. "File offset of /dev/fssnap too large from
ufsdump read causes panic." (Sun Bug ID: 4769472. 27 Nov 2002.)
There is no workaround, but Sun advises, "Only use the block device,
not the raw device (/dev/fssnap/* vs /dev/rfssnap/*)." Unfortunately,
ufsdump dumps the raw device even if given the block device.
Backing Store
The backing store for a snapshot must be on a separate filesystem,
which cannot be "snapshotted", but can be NFS-mounted. We use an
automounted directory /share/backingstore/<hostname>.
We manually create and share the hostname directory using a netgroup
of our servers, servers:
mkdir /share/backingstore/ivy
share -F nfs -o rw=servers,root=servers /export0/backingstore
We use the fssnap unlink option, which creates a backing store
file (e.g., /share/backingstore/ivy/0), opens it, then unlinks
it, so the backing store is deleted when the snapshot is deleted.
Mount Points and Path to Snapshot
We create a snapshot of a front filesystem (e.g., /var/mail),
which gives a snapshot device (e.g., /dev/fssnap/3), which
we mount under /snap/ (e.g., /snap/_var_mail). Amanda
lists filesystems to dump, by host, in a file (disklist).
We cannot populate disklist with snapshot devices because
the snapshot device for a given front filesystem may change. For
example, today:
fssnap -o bs=/share/backingstore/ivy,unlink /export0
/dev/fssnap/0
Tomorrow:
fssnap -o bs=/share/backingstore/ivy,unlink /export0
/dev/fssnap/1
We "flatten" the mount point paths by replacing / with _,
(e.g., /var/mail by _var_mail), to avoid hierarchical
mount point issues (e.g., having to create snapshots of (and mount)
/ before /var before /var/mail). We then populate
disklist with snapshot mount points corresponding to the front filesystems
to be dumped:
ivy /snap/_
ivy /snap/_export0
ivy /snap/_var
ivy /snap/_var_mail
Delete Old Snapshots
Snapshots degrade performance so we really only want them around
while dumping, especially with write-intensive filesystems. Every
write to the front filesystem entails reading a block (from the
front filesystem), writing it (to backing store), then writing the
new block (to front filesystem). Furthermore, before creating a
snapshot, we check for and delete any existing snapshot for a front
filesystem. Otherwise, the create fails and we might back up an
old snapshot of the front filesystem.
Implementation
The prototype script, fssnap.sh (Listing 1), creates snapshots
of all ufs filesystems on a host (excluding those we never back
up) and mounts them. Running fssnap.sh on a host with two
snap-able filesystems (we exclude /export/swap):
ivy# df -k
/dev/dsk/c1t0d0s0 4133838 1781118 2311382 44% /
/dev/dsk/c1t0d0s6 41311843 3397124 37501601 9% /export
/dev/dsk/c1t0d0s7 16526762 2245857 14115638 14% /export/swap
creates two snapshots and mounts them:
ivy# df -k
/dev/dsk/c1t0d0s0 4133838 1781118 2311382 44% /
/dev/dsk/c1t0d0s6 41311843 3397124 37501601 9% /export
/dev/dsk/c1t0d0s7 16526762 2245857 14115638 14% /export/swap
/dev/fssnap/1 4133838 1781108 2311392 44% /snap/_
/dev/fssnap/0 41311843 3397124 37501601 9% /snap/_export
We delete snapshots on each host after the Amanda run by running fssnap.sh
-d. We considered several options for launching this:
- Ssh from dumphost to dumpclient -- This introduces a security
risk because the dumphost runs a command as root on each dumpclient.
- Run a single cron job on each dumpclient -- But when to run
it? We have conflicting requirements. We want to delete snapshots
as soon as the Amanda dump run is finished, typically 4-8am, but
occasionally much later as Amanda overruns, perhaps until noon.
So we must run it late.
Unresolved Issues
When using amrestore, the operator must use a filesystem
name of the form /snap/_var_mail, not the expected /var/mail.
This implementation works, but snapshots exist for much longer than
needed. To avoid this drawback, we next tried using an executable
automount map.
Executable automount Map
Here the automounter manages the mounts under /snap/ using
an executable, indirect map, auto_fssnap. When we access
a directory here (e.g., /snap/_var_mail), the executable
map creates a snapshot and the automounter mounts it. Thus, a snapshot
is only created when it is needed. However, as with the first approach,
we still have difficulty deleting snapshots promptly after use.
Several issues arose during implementation.
Amanda runs ufsdump S on each filesystem to estimate the
size of the dump. Usually it does this several times to get estimates
for several dump levels. This triggers creation and mounting of
snapshots early in an Amanda run.
Knowing when we can delete a snapshot is the main difficulty.
Some options we have explored are:
- Use the automounter to delete the snapshot after umounting
it. Unfortunately, the automounter does not support this, and
executable maps are not run when a filesystem is umounted.
- Find recently unmounted snapshots by watching automounter (in
verbose mode) logs for umounts of snapshot mounts, or using a
recurrent cron job and delete them. This fails because Amanda
accesses a filesystem by mount point (e.g., /snap/_var_mail
(given in disklist) only for the first size estimate). This access
triggers the automounter. Thereafter, Amanda directly accesses
the raw device associated with that mount point (e.g., /dev/rdsk/c0t0d0s4
for size estimates and dumps). These later Amanda accesses do
not trigger the automounter. Thus, we may prematurely delete snapshots
before (or while) they are ufsdumped, causing the dump
to fail.
- Watch Amanda logs for ufsdump "DUMP DONE" entries using
a cron job. This works but is superseded by the next method.
- Launch a single, background process (fssnapdel, Listing
2) from the automount map for each snapshot to watch the Amanda
log files (every five minutes) and delete the snapshot when the
dump is done.
Implementation
We create an executable automount map auto_fssnap (Listing
3) referenced from the NIS auto.master map:
/snap -ro /usr/local/etc/auto_fssnap
Unresolved Issues
If several fssnap commands run concurrently, only one succeeds.
(I have logged an RFE with Sun on this.) One of our hosts has 12
filesystems to dump, so occasionally we saw failures as Amanda triggered
the automounter to run several instances of auto_fssnap,
and hence fssnap, concurrently.
fssnapdel could conceivably delete a snapshot just created
by the automount map, before it is mounted. The fssnapdel
processes are vulnerable to being killed, in which case, a snapshot
may not be deleted.
Use a Wrapper to ufsdump
We also explored the use of a wrapper to ufsdump (Listing
4) to create, dump, and delete a snapshot of the front filesystem.
(Early concerns about signal and stream handling turned out to be
largely unfounded.) We encountered the following issues during implementation.
We built Amanda (amadmin amandad amgetconf amrecover amverify)
with the ufsdump wrapper (amusfsdump), by making global
substitutions in Amanda source between configure and make:
./configure ...
perl -pi.bak -e 's!/usr/sbin/ufsdump!/usr/local/etc/amufsdump!g' \
config/config.h Makefile */Makefile */*.sh
make ...
The wrapper is suid root because fssnap must be run as root,
which introduces potential security vulnerabilities. To reduce vulnerabilities,
the script does the following:
- Runs under Solaris. This fixes a generic vulnerability of suid
scripts due to a race condition in which a script may change between
the time the kernel opens the script to identify which interpreter
to run, then reopens the script to interpret it.
- Is executable only by root and users in group sys:
ls -l amufsdump
-rwsr-x--- 1 root sys 2822 Oct 30 11:01 amufsdump
- Dies unless it is run by the backup user dumpman.
- Runs with the lower privileges of the calling user, dumpman,
except where it must run as root (calling /etc/init.d/xntpd,
fssnap, ufsdump).
- Uses taint perl to check all input to the script. In
regard to environment, it sets a null PATH and sets IFS to a space.
It ensures the script is called with four appropriate arguments,
thus:
($OPTS, $SIZE, $TAPEDEV, $RAWDEV) =
("@ARGV" =~ m!^(\w+) (\d+) (-) (/dev/rdsk/c\d+t\d+d\d+s\d+)$!) or
die "@ARGV. Usage eg: amufsdump 0usf 1048576 - /dev/rdsk/c0t0d0s0";
It checks that external commands are referenced by absolute pathnames
and checks input from the matches.
- Avoid creating a snapshot if we are just getting an estimate
of dump size, calling amufsdump with the S option
:
amufsdump 0Ssf 1048576 - /dev/rdsk/c0t0d0s3
- Use locking to avoid running several instances of fssnap
together, otherwise all but one will fail.
- Ensure xntpd is stopped while creating snapshots of
/ and /var. Earlier approaches started xntpd
after a snapshot, now we only restart it if it was already running.
- Delete the snapshot immediately after the dump is done.
- Ensure that ufsdump records the front filesystem raw
device (/dev/rdsk/c0t0d0s0) in /etc/dumpdates, rather
than the snapshot device (e.g., /dev/fssnap/3). We use
the flag N to ufsdump (a feature introduced in Solaris
8) to specify the device to record in /etc/dumpdates. amandad
calls amufsdump, as follows:
amufsdump 0usf 1048576 - /dev/rdsk/c0t0d0s0
amufsdump calls ufsdump:
ufsdump N0usf /dev/rdsk/c0t0d0s0 1048576 - /dev/fssnap/0
fssnap ignores kill -15, but a kill -9, which
it cannot ignore, hangs the operating system with locked /
and /var. amufsdump terminates on kill -15, leaving
fssnap to complete. Killing amufsdump with kill -9,
which cannot be trapped, passes this kill to fssnap with the
risk of hanging the system. /etc/init.d/xntpd needs /usr/bin
in its PATH to call sleep.
Overall Evaluation
Using Amanda with a ufsdump wrapper to create, dump, and
delete snapshots works excellently. Dumps written by amufsdump-enhanced
Amanda cannot be read by vanilla Amanda, and vice versa, because
amdump encodes the UNIX dump program used (ufsdump,
amufsdump, or tar) in the header of the dumpfile on
tape). Both amrestore and amrecover look for this
and "error" if they do not find one they recognize. This could be
fixed if support for snapshots were integrated into Amanda. The
performance impact is low. The snapshot-backing store of our most
write-intensive filesystem grows to only about 1 Gb during a full
dump (about three hours). This represents reading, writing over
NFS, and writing locally 1-Gb data, a small overhead on a 50-Gb
dump. To see the size of the backing store, run fssnap -i -o
backing-store-len.
We now have a very reliable backup system. We dump and restore
without errors. We found no scalability issues in integrating snapshots
into Amanda. The system introduces three new potential vulnerabilities:
1. The suid, taint perl script amufsdump.
2. The snapshot devices themselves. fssnap creates these
with permissions:
brw-r----- 1 root sys 199, 0 Oct 9 15:05 /devices/pseudo/fssnap@0:0
Thus, they are no more vulnerable than their corresponding raw devices.
3. The snapshot-backing store is NFS shared, and read-write with
root access to each dump client. fssnap unlinks this file
immediately after creating it, which reduces the risk of cracking.
For tighter security, you can modify amufsdump to use local
backing store, which requires a dedicated local file system.
The enhanced system is transparent to the operator. However, if
an amufsdump process dies, it may leave an unwanted snapshot.
To maintain confidence in our backups, we evaluated several methods
for verifying them. We run amverify after each dump. This
lists the contents of each dump file on tape and gives some confidence
about the readability of the tape. It takes several hours to do
this, and it doesn't identify bad dumps files (written before we
introduced snapshots). We prefer to restore a full or partial dump
regularly and find that this works well and is often covered by
the steady trickle of requests from users to recover lost files.
Conclusion
Amanda enhanced with Solaris fssnap snapshots provides
an excellent backup system. The system should port simply to other
OSes that support filesystem snapshots. Built-in support for snapshots
in Amanda would further enhance transparency. These developments
are left as an exercise for the reader.
Julian Briggs is Director of IT, Department of Computer Science,
University of Sheffield, UK. He has practiced UNIX systems administration
since the mid 1990s and Buddhist meditation (exploring, debugging,
and enhancing the OS of his "neck-top" computer) since the early
1980s. He enjoys hill walking and is single with no children.
|