What to Do When the Server Doesn't Serve -- Using CODA

Brett Lymn

In the previous articles in this series, I looked at NFS failover methods using fairly mainstream free software and the operating system facilities themselves. In some environments, this is probably a good fit for providing some measure of redundancy in your file servers. This time, I shall look at another approach to providing access to files by using a file system called CODA that is specifically designed to handle server outages.

CODA

CODA is an intriguing file system being developed by a group headed by M. Satyanarayanan at Carnegie Mellon University (More information on CODA can be found at http://coda.cs.cmu.edu/.) This file system has been designed to be able to operate when the client has been disconnected from the server without disrupting normal operations. The CODA file system is an experimental file system. The decision to use it in a production environment must be weighed very carefully, because glitches in the software may result in CODA eating your file system and causing loss of data. Having said this, the actual situation is somewhat better; CODA does work. It is not very forgiving in some circumstances, but if CODA is treated reasonably in terms of system resources, it works well. A major advantage of using CODA as a failover file system is that it is, by design, able to gracefully cope with network outages. This makes it ideal for providing failover file serving.

On a CODA client, the files currently being worked on are cached on the local client's disk, and updates to the files are passed between the client and the server to ensure the cached copies of the files are kept in sync with the server. If the server goes down, then the CODA client simply keeps a log of the changes made to the cached files. When the client loses contact with the server, then the client is said to be running in disconnected mode. When the server comes back up, the changes made on the client are reintegrated into the server's copy of the file to bring it up to date. Later, I will look at what CODA does when two clients modify the same file when they are disconnected.

Installation and Use

The instructions for installing CODA can be found on the CODA Web site. The installation of CODA is a bit involved and requires some manual setup, but by carefully following the instructions, you should be able to get a server and client running. Note that you can have a CODA server and client running on the same machine, so you can view the CODA file system on the server itself.

Using CODA is quite simple; the client simply runs the client software, which mounts the CODA volume from the server. Once this is done, the CODA file system looks like any other file system, and the files therein can be treated the same as any other file on the system. Initially, when you mount the CODA volume, the volume is in read-only mode. To modify files on the volume, you must authenticate your session to the CODA server by using the clog command. The clog command contacts the server and acquires an authentication token, which is used by the client during transactions with the CODA server. The token has an expiry time of about 25 hours, after which the client must again authenticate to the server. The status of the authentication token can be viewed by using the ctokens command.

Tools

CODA comes with some tools that allow you to monitor the status of the CODA file system. The developers strongly recommend this to ensure the CODA file system is running properly. One of these tools is called codacon. Running this tool produces a running log of the actions performed by the CODA client. Some of the output is cryptic and probably only makes sense to a CODA developer, but it does provide a good monitor of the status of the CODA file system -- alerting you to server disconnections, conflicts, and other important information. Another tool that comes with CODA is the cmon tool, which provides a periodically updating display of the status of the CODA servers including statistics on resource utilization.

The CODA file system can be managed by using the cfs tool. This tool will allow you to change a lot of parameters with the CODA file system. You probably will not use most of the commands available with cfs. One of the things you can do with cfs is force the CODA client to check whether the CODA server is up or not by using the checkserver cfs command like this:

cfs cs

This command is particularly handy when you know the CODA server has just come back up and you want to make the CODA client aware of it. Normally, the CODA client will poll the server periodically when the client is in disconnected mode, and will realize that the server is up when the poll is done; but the checkserver command can be used if you are impatient.

When a CODA client loses contact with the server, the client enters what is called disconnected mode. In this mode, the local cache will be accessible on the client, and modifications will be logged locally. When the server is once again available, the local change log will be integrated back to the server. If CODA detects that another client has modified the file, this is called a conflict, and I will show later how it is handled. An attempt to access a file not in the local cache will result in a file error. This is unfortunate, but there is little that can be done if the server is down and the client does not have a copy of the file to work with. Clearly, it is undesirable to find that the files you need are not in the local CODA cache when the server goes down. To address this, the CODA tool set includes a tool called hoard that allows you to build a database of hints to CODA as to which files you wish to be cached locally. You can assign priorities to a file or set of files that will affect which files are discarded from cache when the cache is full.

In order to use CODA as a failover file system, we want all the shared file system to be available on the client and we would create a hoard command file (say, hoard.conf) with the following contents:

a /coda/file/tree    600:d+

This tells hoard to add the file tree /coda/file/tree with a priority of 600 (the higher the number, the higher the priority). It also says that you want to hoard all subdirectories, including any new subdirectories created after the hoard command is run, under the given root directory. Once the hoard.conf file is set up, you can feed it into hoard like this:

hoard -f hoard.conf

This loads the hoard database with the commands in hoard.conf. The database need only be set once as it persists across restarts of the CODA client. Once the hoard database is set up, you can run:

hoard walk

which starts filling the local CODA client cache from the server with files, as per the specifications in the hoard database. When the hoard command has completed, the cache on the CODA client contains all the files requested on disk so they will be accessible even if the server gets disconnected.

As mentioned before, if the CODA server is down or disconnected for some reason, an awkward situation can occur if multiple clients update the same file. In CODA parlance, the file is said to be in conflict, and the CODA client stops normal operations until the conflict is resolved by a human. When the client is in conflict, the name of the volume that is in conflict is logged to the CODA logs. Using this, an administrator can run the repair tool that presents a list of changes pending on the client. Also in repair mode, the versions of the files on the server are viewable as separate entities, so the server versions of files can be checked against the local copies. These pending changes can either be integrated with the server (if there is no conflict), or if the file conflicts, the administrator can elect to discard the local changes or overwrite the server copy. Once all the conflicts have been resolved, the client can mount the CODA volume again and resume normal operation.

In addition to the client-server model, CODA also supports replication between servers, so you can have multiple CODA servers with the data automatically replicated between the servers. Thus, you can either provide redundancy of the CODA server, or you can have multiple servers separated by a relatively low bandwidth link. Then the CODA clients bind to the local server but still have up-to-date files. Again, if the servers lose contact because of a network outage, the file modifications are logged and reintegrated when the servers regain connectivity. The process for reintegrating server-to-server conflicts is a bit different from client-to-server reintegration. The same repair tool is used, and the process is similar in that the changes are reviewed and one version is accepted as the correct one. But, the process is a bit more complex than a client-server conflict.

By now it should be fairly apparent that the failover when using CODA happens automatically because it is part of the design of the file system. By setting up some CODA servers and clients, and by using the hoard command to fill the local client caches, the clients will be able to continue functioning if they lose contact with the server. By using the server-to-server replication, the chances of clients losing contact with a server can be minimized, as well.

As good as this may sound, CODA does, unfortunately, have some disadvantages. For a start, it is still an experimental file system, which means that most places would not even dream of placing CODA into a production system. Also, CODA is not a solution supported by any vendor, so the support for CODA is totally reliant on answers from a mailing list. These factors mean that CODA will probably not be a serious contender for a while, however, it does show considerable promise. I currently use CODA at home, and although it was a bit challenging to set up initially, I have found it very usable for keeping my home server synchronized with the work I do on my laptop.

Conclusion

This article completes my series on failover, but with a subject this broad, there are bound to be approaches that have not been mentioned. I hope that these articles have given you good food for thought. If you need write access to your files when the server is down, then CODA is the clear winner here. None of the other approaches discussed will allow write access by the clients. If you are limited to implementing vendor-supported solutions, then the cachefs approach discussed in the first article gives a close approximation to what CODA can do, but only with read-only clients and manual cache filling. If you have a large file tree that is fairly static and needs to be accessed in an ad hoc manner, then implementing redundant servers is probably the best way to go. Here Sun's automounter has some fairly clear advantages over AMD in that the failover of the file servers on the client is more elegant. Presumably this is because Sun put some smarts into the kernel allowing the automounter to recover from a server going down part way through a read of a file. Because AMD is purely done at the user level, these smarts are not available; thus, the failover is not as smooth.

Brett Lymn works for a global company helping to manage a bunch of UNIX machines. He spends entirely too much time fiddling with computers but cannot help himself.