Homebrew
High Availability: Booting Linux from a RAID-1 Device
Drew Smith
Recently, a colleague told me about a trick his company
uses to make Windows NT remote administration easier. His firm provides
professional services for many small server rooms around town, and
the trick involved mirrored IDE hard disks in removable drive bays
-- by mirroring the primary disk, you provide an easier backout
path when doing upgrades. When doing service work, he always removes
the second hard disk, providing an up-to-the-minute backup in case,
for example, the latest Microsoft "Service Pack" does
more harm than good.
My first reaction was to say "Of course, you can do that
in Linux!". However, it brought to mind a few questions --
most importantly, just how would you go about doing this?
I mulled it over and eventually decided to explore it on my own.
I found that it was possible to boot Linux from a software RAID-1
device, along with a few LILO and mkinitrd tricks,
and this little hack could potentially give your Linux Web server
a performance boost. In addition to doubling the reliability of
your hard disk, RAID-1 configuration also gives your IDE or SCSI
bus a break, providing two different paths from which to read the
disk. Of course, write operations will be slower because the data
must be written to the drive twice, but in many situations (most
commonly Web servers, where read operations are a top priority),
slower writes are not a drawback.
This is a project for anyone with a Linux machine. If you're
new to Linux, you'll need a solid understanding of hard-disk
partitioning and the Linux command line, but it's surprisingly
easy and fun to do.
Getting Started
The machine you choose to work with probably shouldn't be
a production server. As with any project that involves partitioning,
you will run the risk of losing data. The machine should have two
identical hard disks. Although this process will work happily with
different-sized disks, I've decided to stay on the safe side.
Any size disks will do, and they can be either SCSI or IDE. For
this example, I've used two Quantum Fireball 20-GB drives,
and I've installed them as the primary drives on each of the
machine's IDE buses. The machine in the example is a VA Linux
2130 rackmountable server with a single 650-MHz Pentium III, and
I used a stock installation of RedHat 6.2.
The safest and easiest way is to start from a completely fresh
machine, installing the operating system as a part of the process,
but it doesn't necessarily have to be this way. (As of this
writing, RedHat 7.0 has been released, but I don't have a machine
around to re-test with at the moment.) If you're starting from
scratch, you'll end up with a cleaner system if you install
the operating system to the hard disk on the second IDE bus, or
/dev/hdc. You'll probably want to use the original fdisk
rather than DiskDruid or a similar tool. Making a program easier
to use often removes parts of the functionality, and fdisk
is a prime example. DiskDruid didn't allow me to create /dev/hdc1.
Your configuration should look similar to this:
/dev/hdc1 - 20M, mounted on /boot
/dev/hdc2 - <most of the drive>, mounted on /
/dev/hdc3 - 120M, for swap-space
Give /dev/hdc1 a boot flag, as you'll be using
it as the booting drive when you next power down. However, first install
the rest of the OS. Normally, security considerations would require
creating more than just these three partitions. A separate partition
for /tmp and /var is always a good idea to prevent an
attacker from filling your root partition with a denial-of-service
attack on your logfiles. Without quotas, a malicious user can do the
same if you don't have a separate partition for /home.
This article will only cover a basic example, but you can take it
to whatever lengths you feel appropriate.
Install the OS, and be certain to create a bootdisk. This step
should also be done if you're not installing from scratch;
see man mkbootdisk for more information on how to
do this step. Because we are installing from scratch, write the
boot information to the superblock of /dev/hdc, rather than
the master boot record (MBR). This may or may not allow you to boot
the system after the install is complete, but we've got a bootdisk
and are far from done anyway.
Finish installing and reboot. If it doesn't come up, use
the bootdisk, and log in as root.
Creating the RAID Devices
Now that we've got a running system, it's time to tackle
that second drive we've put in (the primary drive in the machine).
We'll configure it as a RAID device with two drives involved,
declaring /dev/hdc to be a mirror of /dev/hda. Here's
the really clever part -- we'll declare /dev/hdc
as "failed" until we move the operating system off of
it and onto the new RAID device. Then simply add /dev/hdc
to the RAID as a replacement for the failed disk, and allow it to
rebuild.
Note that Linux handles software RAID with the "md"
driver, which stands for "multiple devices". This driver
has the ability to control storage devices in several different
fashions -- RAID-0 through RAID5, or even combinations of two
or more types. Drives are allocated into a storage array, and when
the array is of RAID type 1 or higher, the drives work together
towards redundancy. Should one drive fail, the RAID subsystem will
mark that drive as "failed", stopping any subsequent requests.
Partition /dev/hda exactly the same as /dev/hdc.
As root, fdisk -l /dev/hdc provides a listing of the
partitions on that drive, which we'll then match. However,
we won't use the same partition types as before; instead, set
the main data partitions as type fd, or "Linux raid
autodetect". In my case, I set up /dev/hda like
this:
DEVICE BOOT START END BLOCKS ID SYSTEM
/dev/hda1 * 1 3 24066 fd Linux raid autodetect
/dev/hda2 4 2484 19928632+ fd Linux raid autodetect
/dev/hda3 2485 2498 112455 82 Linux swap
Build these partitions, but don't create filesystems on them
yet. We must first declare the RAID device to the system, using a
configuration file in /etc/, called a "raidtab".
Listing 1 shows my copy of the raidtab file. The format is
fairly straightforward, but for larger formats (e.g., partitions for
/var/, /tmp/, etc.) can become confusing pretty quickly.
There's also a manpage dedicated to this file.
After you've written a /etc/raidtab file, it's
time to create the actual RAID devices, which is accomplished with
mkraid, a program from the "raidtools". In an uncommon
show of forgiveness, this program will not actually let you create
the new RAID devices without adding an -f switch to "force"
it. The partitions involved are of type "Linux raid autodetect",
and mkraid assumes they are already a part of another RAID
device. You do want to force it, however, and the extra warnings
are humorous. Go ahead and create the first RAID device:
[root@tester /etc]# mkraid /dev/md0
Linux uses the special /proc filesystem to provide interesting
statistics about running processes and the kernel, and the raidtools
are no exception. A special file called /proc/mdstat will show
you the current status of any md devices in the system. cat
/proc/mdstat will show you some information on your newly created
RAID device. View that file, then create the second device.
[root@tester /etc]# mkraid /dev/md1
Check the /proc/mdstat file again. You should now see both
devices -- both with a disk marked as failed such as:
[root@tester /etc]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hda1[0] 24000 blocks [2/1] [U_]
md0 : active raid1 hda2[0] 19928512 blocks [2/1] [U_]
unused devices: <none>
/dev/md1 will hold the boot information, and /dev/md0
will be the root partition. You'll need to create filesystems
on the new devices before you can use them:
[root@tester /etc]# mke2fs /dev/md0
<stuff>
[root@tester /etc]# mke2fs /dev/md1
<stuff>
With that, two new filesystems are ready to be mounted.
Now What?
You now have a booting system and two working RAID devices, but
how do you switch the operating system over into the new devices
and make it boot? Actually, only the "making it boot"
part is difficult; some tricks with LILO and mkinitrd
will help. For now, move the filesystem across with a cp.
First mount the RAID device to an arbitrary directory:
[root@tester /etc]# mkdir -p /feh
[root@tester /etc]# mount /dev/md0 /feh
You should maintain all the permissions and datestamps, etc., so use
the -a switch to make cp treat the operation as an archive:
[root@tester /etc]# cp -a /bin /feh
[root@tester /etc]# cp -a /dev /feh
[root@tester /etc]# cp -a /etc /feh
[root@tester /etc]# cp -a /home /feh
[root@tester /etc]# cp -a /lib /feh
[root@tester /etc]# cp -a /root /feh
[root@tester /etc]# cp -a /sbin /feh
[root@tester /etc]# cp -a /tmp /feh
[root@tester /etc]# cp -a /usr /feh
[root@tester /etc]# cp -a /var /feh
Notice, however, that from a stock RedHat 6.2 system, I've omitted
the /opt, /proc, and /boot directories. We'll
create those by hand:
[root@tester /etc]# mkdir -p /feh/boot
[root@tester /etc]# mkdir -p /feh/opt
[root@tester /etc]# mkdir -p /feh/proc
The "lost+found" directory should have already been created
by mke2fs. Mount the other RAID device under /feh/boot,
and copy all of the boot files into it:
[root@tester /etc]# mount /dev/md1 /feh/boot
[root@tester /etc]# cp -a /boot /feh
Now make the necessary changes in the new filesystem to have it correctly
mount the new filesystems after reboot. You must edit the /feh/etc/fstab
file to change the / and /boot partitions. The current
/etc/fstab file looks like this:
/dev/hdc2 / ext2 defaults 1 1
/dev/hdc1 /boot ext2 defaults 1 2
/dev/fd0 /mnt/floppy auto noauto,owner 0 0
none /proc proc defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
/dev/hdc3 swap swap defaults 0 0
We'll only be changing it slightly, pointing it to the new devices:
/dev/md0 / ext2 defaults 1 1
/dev/md1 /boot ext2 defaults 1 2
/dev/fd0 /mnt/floppy auto noauto,owner 0 0
none /proc proc defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
/dev/hdc3 swap swap defaults 0 0
Nothing else in this file needs to change. We're almost ready
to reboot and try to boot into the new RAID-1 Linux machine for the
first time, but we'll definitely need a bootdisk to start. Grab
another blank floppy and build one:
[root@tester /etc]# mkbootdisk --mkinitrdargs "--preload raid1" 2.2.14-5.0
Notice the --mkinitrdargs switch, and the value afterwards.
The mkinitrd command is an extremely powerful tool for booting
machines with special requirements. At its simplest, an initrd
is an "initial ramdisk", which contains modules to be loaded
before anything else. For example, imagine you're trying to boot
a machine with a non-standard SCSI controller. You will have serious
problems booting if the kernel itself sits on a drive on that controller!
The initial ramdisk could hold the module needed to talk to that controller,
and as such, using it could prevent the need of a boot floppy. In
this case, we'll add the module required to speak to RAID-1 devices
to this initrd, and build a boot floppy accordingly. If you're
using SCSI devices, you may want to add a preload statement, which
loads the necessary modules for your SCSI controller.
Reboot the machine from the new floppy.
First Boot with RAID-1
As the machine begins to boot from the floppy disk, there will
be a BOOT: prompt displayed for about ten seconds. We want
to type in an argument here. Don't worry about time, because
after you start typing, it'll wait for you to finish.
BOOT: linux root=/dev/md0
This should bring up your machine with the RAID devices as your boot
and root disks! Log in and type df to see something like this:
[root@tester /etc]# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/md0 10615648 415012 18204212 2% /
/dev/md1 23239 2442 19597 11% /boot
[root@tester /etc]#
Next, the idea is to make LILO want to boot from these new devices,
and to make certain that it will boot from either one.
If you don't see output like this, something's wrong.
Is it your bootdisk? What are the error messages? Did your system
almost come up? How far did the boot get before stopping? Often,
the best step here is to return to your former setup (i.e., remove
the floppy and reboot), repartition the RAID disk, and try again.
Making It All Bootable
This section assumes you've got a recent backup of your system.
With that out of the way, repartition /dev/hdc and add it
to the new RAID devices as a replacement for the "failed disks".
Using fdisk, open /dev/hdc and use the "t"
key to toggle the partition type for the first two partitions from
83 (Linux) to fd (Linux raid autodetect). If you're
absolutely certain that you're not going to get in trouble
for losing any data, you can now use the "w" key
to write changes to the disk.
Adding the new partitions to the RAID device shows just how easy
it is to work with RAID under Linux:
[root@tester /etc]# raidhotadd /dev/md0 /dev/hdc2
<stuff>
[root@tester /etc]# raidhotadd /dev/md1 /dev/hdc1
<stuff>
Don't worry about the <stuff>, unless there's
something extremely alarming in the messages. If something goes horribly
pear-shaped, delete all the partitions on /dev/hdc and try
again. If that fails, you're hooped -- bring out the backups
and start over.
At this point, we need a proper system-wide initrd image,
containing all the modules needed to boot the system into RAID-1.
This is like the bootdisk we made earlier, but it will be written
to the hard disk.
[root@tester /]# mkinitrd /boot/initrd-2.2.14-5.0.img \
> --preload raid1 2.2.14-5.0
It's important that you specify your kernel version here. In
my case, the stock RedHat 6.2 kernel is 2.2.14-5.0.
As a final step, use LILO to make the system bootable.
We're going to be a bit sneaky here and use two slightly different
LILO configuration files -- one for the booting of each
drive. This way, either drive can fail and the machine will still
boot.
Listing 2 shows a working lilo.conf.hda configuration file.
Note that I specified the disk as /dev/md0. Also note that
the sectors, heads, and cylinders are included (different from standard
LILO config). These numbers can be obtained with fdisk
-l /dev/hd<x> and are extremely important here.
In the second file (Listing 3), I changed only one parameter --
the boot= flag. After you've written these two files,
run LILO with the -C flag to specify which configuration
file to use:
[root@tester /etc]# lilo -C /etc/lilo.conf.hda
Added LinuxRAID *
[root@tester /etc]# lilo -C /etc/lilo.conf.hdc
Warning: /dev/hdc is not on the first disk
Added LinuxRAID *
Reboot. If all went well, you're booting into a mirrored, high-availability
(well, higher availability) Linux machine!
Conclusion
Although the benefits of this configuration are obvious, the fact
remains that it is a reasonably simple procedure, and the resulting
machine is much more stable than before. The hard disk, usually
being the only moving part in a Linux system (with the exception
of cooling devices), is usually the first to fail. Adding monitoring
and paging capabilities is fairly simple (although beyond the scope
of this article), but for deployment to remote locations, an hour
or so of work could save you the trouble of getting out of bed to
fix a downed system.
Acknowledgements
Thanks to Peter Lincoln for putting the idea into my head and
to Linus Vepstas for writing the HOWTO that pointed me in the right
direction. Also, my girlfriend Erin, for putting up with my near-constant
geeking.
Drew Smith lives in East Vancouver, has blue hair, lives in
a house full of geeks, and works as the UNIX Network Administrator
for a stock trust company. When not geeking, he makes live electronic
music for raves. He can be reached via the geek-house Web site,
at: http://eastvan.bc.ca.
|