RAM 
              RAID: Improving Web Access
             Bo Adler
              Using a file cache, such as Squid, is a familiar strategy to increase 
              the throughput to a Web site. It eliminates the overhead of disk 
              access by keeping static HTML content in memory, but fails to address 
              the issue of CGI scripts that need to write data back to disk. Given 
              that written data is often small and infrequent (compared to reads), 
              the OS file-buffering strategy is sufficient to accommodate this 
              load. However, in the case where written data becomes a serious 
              issue, the traditional solution is to implement some form of RAID 
              array to increase the bandwidth of disk access.
              The development group at a company I work with constructed a Web 
              application that stored files within thousands of directories. Their 
              analysis of the application performance led them to conclude that 
              writes were happening one tenth of the time, and the nature of the 
              accesses were such that they were routinely getting cache misses 
              from the file cache for both reads and writes. The ultimate solution 
              would be a way to lock the important directory tree into the file 
              cache so that it would never experience a cache miss.
              I explored using the new ramdisk implementation, which is available 
              in Linux 2.4 kernels, and frequently mirroring the data to disk. 
              This turned out to be impractical because it took rsync (and similar 
              tree-walking programs) several hours to run through the 50,000 directories, 
              jeopardizing the data if a reboot were to occur. All told, the filesystem 
              containing these directories could fit into 900 MB, but the sheer 
              size of the directory listing was more than most programs could 
              handle. Clearly, any mirroring solution would have to operate a 
              disk block at a time, rather than try to traverse the directory 
              tree.
              There are several RAID levels that perform mirroring and redundancy 
              at the disk block level. One of the simplest of these is RAID-1 
              with two disk drives. RAID-1 is a straight mirroring approach -- 
              the array is treated as a single "storage device", and 
              any data written to the RAID array is written to both disk drives. 
              This slows down the writing throughput to the RAID array because 
              twice as much data is being written and accounted for. The benefit 
              of RAID-1 is that the two drives can be independently fetching data 
              for separate requests, thereby increasing the throughput for read 
              operations over that of a single drive. It occurred to me that if 
              one of the disk drives in a RAID-1 array could be replaced with 
              a ramdisk, perhaps the speed-up would be comparable to that of just 
              a standalone ramdisk, while still retaining a physical copy of the 
              data in case of reboots.
              I set out to test this theory with a series of tests using a Pentium 
              II 390 MHz running a stock Red Hat 7.1 installation. I chose to 
              use two benchmarking programs: dt, which can measure the raw sequential 
              block access to a device, and bonnie++, which measures filesystem 
              accesses. Typically, the benchmarking process requires that data 
              sizes to be about four times the size of physical memory, to minimize 
              the effect of the various caches. Since one of my "drives" 
              would be a ramdisk, this wasn't possible because the whole 
              test had to fit inside of the ramdisk. After some tests of various 
              configurations, I determined that the best way to minimize any kernel 
              caches and buffers would be to allocate as much memory as possible 
              to the ramdisk. This would leave very little memory for the kernel 
              (especially the file cache), and allow me to use fairly small data 
              sets for testing.
              Creating the RAID-1 Array
              By default, the Linux kernel is compiled to create 16-MB ramdisk 
              buffers (named /dev/ram0, /dev/ram1, etc.). To increase 
              the size of the buffer, it is necessary to pass an additional argument 
              to the kernel at boot time. Because my machine had 128 MB of RAM, 
              I created a 100-MB ramdisk buffer by editing the image section of 
              my /etc/lilo.conf file to read as shown in Listing 1. I rebooted 
              after running /sbin/lilo, but a check via /usr/bin/free 
              revealed that the ramdisk hadn't actually been allocated yet, 
              so I allocated it with:
              
             
bash# dd if=/dev/zero of=/dev/ram bs=1024k count=100
100+0 records in
100+0 records out
bash# free -t
Free memory:
             total     used     free      shared  buffer   cached
Mem:        126648   124708     1940        0     102748     5072
-/+ buffers/cache:    16888     109760
Swap:       102776        0     102776
Total:      229424   124708     104716
 
            Creating a software RAID array is straightforward as well, given the 
            guidance in the Software RAID-HOWTO. My /etc/raidtab file came 
            straight from the howto, with minor modifications (Listing 2). The 
            RAID-1 array can then be created via:
             
             
bash# mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sda1, 48163kB, raid superblock at 48064kB
disk 1: /dev/ram, 51200kB, raid superblock at 51136kB
 
            This configuration is all that's necessary for simple testing. 
            In a production environment, the /etc/raidtab file must be 
            modified to mark the /dev/ram device as a failed disk, so that 
            the kernel does not try to use it at boot time. The ramdisk can be 
            added back into the array by using dd to allocate the ramdisk 
            buffer, and then using raidhotadd to initiate a reconstruction 
            of /dev/ram based on the data from the hard disk.
             Caveats
              The "reconstruction" from hard drive to ramdisk was 
              very slow, proceeding at only 100-KB/sec. I did not find the same 
              to be true for reconstruction from one disk to another -- disk-to-disk 
              recovery proceeded at approximately the bandwidth supported by the 
              hard drives. It's been suggested to me that this reconstruction 
              time could be evaded by not using a persistent superblock. In that 
              case, you could just dd the partition from the hard drive 
              to the ramdisk.
              More important than a lack of speed, I found the RAID array to 
              be very touchy during reconstruction. The Software RAID-HOWTO says 
              that the RAID array is available for use right away, even during 
              reconstruction. I found that if the array was used sparingly, this 
              was true, but any significant usage caused lockups in the system. 
              I was able to consistently cause a lockup by using dt to 
              write to the whole array, if I did so before reconstruction was 
              complete. Furthermore, at least once during my testing I ended up 
              in a state where a benchmark of a standalone SCSI drive was producing 
              values half as large as normal; even removing the RAID modules from 
              memory did not correct the problem. Thus, I recommend caution when 
              employing the software RAID modules for valuable data.
              According to the kernel sources, the memory allocated to a ramdisk 
              can be deleted by sending a BLKFLSBUF ioctl to the 
              appropriate /dev/ram device. Be warned that programs like 
              /usr/bin/free won't show the memory as available until 
              it is actually needed by another program.
              Benchmarking
              When I reached the point where testing could be performed, I ran 
              a series of tests similar to the following:
              
             
bash# ./dt of=/dev/md0 bs=8k limit=30m
[...]
bash# mkdir /mnt/test
bash# mkfs -t ext2 /dev/md0
[...]
bash# mount /dev/md0 /mnt/test
bash# ./bonnie++ -d /mnt/test -s 30 -n 9:9000:10:999 -r 0 -u root
[...]
 
            I chose four configurations, which I thought would offer a suitable 
            comparison between options: standalone ramdisk, standalone SCSI disk, 
            RAID-1 array including both a SCSI disk and a ramdisk, and a RAID-1 
            array using two identical SCSI disks.
             When analyzing Figure 1, we see that the performance of various 
              configurations is generally as expected. The ramdisk takes the lead, 
              in both writing and reading tests. Since a RAID-1 array has the 
              overhead of having to write data to both devices, it makes sense 
              that a single SCSI drive would perform faster than both RAID arrays 
              (which included the same drive). A RAID array with two disks should 
              naturally be slower than a RAID array made up of one disk and one 
              ramdisk so that makes sense as well.
              The read test shows a different result, in that a RAID array made 
              up of a SCSI disk and a ramdisk outperformed a standalone SCSI disk. 
              Again, this makes sense when you consider that a RAID-1 implementation 
              will sometimes read from the hard drive and sometimes read from 
              the ramdisk. Each time the RAID implementation chooses the ramdisk 
              to answer a request, a speed-up over a regular disk is realized. 
              (The Linux implementation of RAID-1 tries to balance requests between 
              the devices in the array, so that no single device receives too 
              many requests in a row.)
              The only real surprise in the results of the dt test is 
              how little improvement is imparted by using the ramdisk as part 
              of a RAID array. It should be noted, for both the dt and bonnie++ 
              tests, that a benchmark that is non-threaded (and thus not generating 
              simultaneous requests) is not the best showing for a RAID array. 
              While a hard drive can only answer requests sequentially, a RAID 
              array can parallelize requests by distributing them over multiple 
              devices. In a non-threaded program, requests are always issued sequentially 
              and thus never exercise this advantage of RAID arrays. (See the 
              References section at the end of this article for more information 
              on the dt and bonnie++ test applications.)
              Despite the limitations of sequential benchmarks, the bonnie++ 
              tests in Figure 2 indicate a nice showing for the RAID array that 
              included the ramdisk. Only the standalone ramdisk performed better 
              on each test. (Please note the use of a logarithmic scale on the 
              bonnie++ graphs.) Performing the same benchmarks with memory available 
              for kernel buffers produced the results found in Figure 3. I include 
              it here, because I noticed a change in the relative performance 
              of the various configurations. The RAID array that included the 
              ramdisk saw improvement in only a few of the tests, but the standalone 
              SCSI disk saw several significant improvements, to the point where 
              it outperformed the two RAID configurations.
              The superior performance of a standalone SCSI disk in the presence 
              of plentiful RAM would seem to shoot a hole into the technique of 
              using a ramdisk as part of a RAID-1 array -- as any serious 
              enterprise situation would be sure to have lots of RAM. A potential 
              explanation I had was the smaller RAM available to the kernel buffering 
              (because 50 MB was used for the ramdisk), but the graph shows that 
              even a normal RAID array made up of two hard drives performed relatively 
              poorly compared to that of a single hard drive. This leads me to 
              guess that perhaps the CPU speed is somehow a limiting factor, but 
              I don't know why this should become relevant in the face of 
              kernel buffering.
              To clear up this question, I ran some additional tests on a dual 
              processor Pentium III 700-MHz installation of Red Hat 7.1, with 
              512 MB of memory. Under these conditions, many of the tests had 
              immeasurably high results (shown as 100,000 values in Figure 4), 
              but it is worth noting that the RAM RAID configuration performed 
              as well as or better than a standalone SCSI disk, thus dispelling 
              my fears.
              Conclusion
              These tests show the viability of incorporating a ramdisk into 
              a RAID-1 array. Under benchmarking conditions, there is a measurable 
              advantage over a standalone disk or a software implementation of 
              a RAID-1 array of two disks. While not as fast as a pure ramdisk, 
              such a configuration confers the property of data persistence across 
              reboots without the troublesome (and sometimes impractical) problem 
              of running a data-synchronizing process. Also encountered were two 
              issues that merit further investigation: the slow rate of reconstruction 
              to the ramdisk, and the crash situation when writing to the RAID 
              during reconstruction.
              References
              Software RAID-HOWTO: http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html
              Data Test (dt) program: http://www.bit-net.com/~rmiller/dt.html
              bonnie++ program: http://www.coker.com.au/bonnie++/
              Bo Adler is a freelance consultant specializing in network 
              programming and security. He can be contacted at: [email protected].
            |