What
to Do When the Server Doesn't Serve -- Using CODA
Brett Lymn
In the previous articles in this series, I looked at NFS failover
methods using fairly mainstream free software and the operating
system facilities themselves. In some environments, this is probably
a good fit for providing some measure of redundancy in your file
servers. This time, I shall look at another approach to providing
access to files by using a file system called CODA that is specifically
designed to handle server outages.
CODA
CODA is an intriguing file system being developed by a group headed
by M. Satyanarayanan at Carnegie Mellon University (More information
on CODA can be found at http://coda.cs.cmu.edu/.) This file
system has been designed to be able to operate when the client has
been disconnected from the server without disrupting normal operations.
The CODA file system is an experimental file system. The decision
to use it in a production environment must be weighed very carefully,
because glitches in the software may result in CODA eating your
file system and causing loss of data. Having said this, the actual
situation is somewhat better; CODA does work. It is not very forgiving
in some circumstances, but if CODA is treated reasonably in terms
of system resources, it works well. A major advantage of using CODA
as a failover file system is that it is, by design, able to gracefully
cope with network outages. This makes it ideal for providing failover
file serving.
On a CODA client, the files currently being worked on are cached
on the local client's disk, and updates to the files are passed
between the client and the server to ensure the cached copies of
the files are kept in sync with the server. If the server goes down,
then the CODA client simply keeps a log of the changes made to the
cached files. When the client loses contact with the server, then
the client is said to be running in disconnected mode. When the
server comes back up, the changes made on the client are reintegrated
into the server's copy of the file to bring it up to date.
Later, I will look at what CODA does when two clients modify the
same file when they are disconnected.
Installation and Use
The instructions for installing CODA can be found on the CODA
Web site. The installation of CODA is a bit involved and requires
some manual setup, but by carefully following the instructions,
you should be able to get a server and client running. Note that
you can have a CODA server and client running on the same machine,
so you can view the CODA file system on the server itself.
Using CODA is quite simple; the client simply runs the client
software, which mounts the CODA volume from the server. Once this
is done, the CODA file system looks like any other file system,
and the files therein can be treated the same as any other file
on the system. Initially, when you mount the CODA volume, the volume
is in read-only mode. To modify files on the volume, you must authenticate
your session to the CODA server by using the clog command.
The clog command contacts the server and acquires an authentication
token, which is used by the client during transactions with the
CODA server. The token has an expiry time of about 25 hours, after
which the client must again authenticate to the server. The status
of the authentication token can be viewed by using the ctokens
command.
Tools
CODA comes with some tools that allow you to monitor the status
of the CODA file system. The developers strongly recommend this
to ensure the CODA file system is running properly. One of these
tools is called codacon. Running this tool produces a running
log of the actions performed by the CODA client. Some of the output
is cryptic and probably only makes sense to a CODA developer, but
it does provide a good monitor of the status of the CODA file system
-- alerting you to server disconnections, conflicts, and other
important information. Another tool that comes with CODA is the
cmon tool, which provides a periodically updating display
of the status of the CODA servers including statistics on resource
utilization.
The CODA file system can be managed by using the cfs tool.
This tool will allow you to change a lot of parameters with the
CODA file system. You probably will not use most of the commands
available with cfs. One of the things you can do with cfs
is force the CODA client to check whether the CODA server is up
or not by using the checkserver cfs command like this:
cfs cs
This command is particularly handy when you know the CODA server has
just come back up and you want to make the CODA client aware of it.
Normally, the CODA client will poll the server periodically when the
client is in disconnected mode, and will realize that the server is
up when the poll is done; but the checkserver command can be used
if you are impatient.
When a CODA client loses contact with the server, the client enters
what is called disconnected mode. In this mode, the local cache
will be accessible on the client, and modifications will be logged
locally. When the server is once again available, the local change
log will be integrated back to the server. If CODA detects that
another client has modified the file, this is called a conflict,
and I will show later how it is handled. An attempt to access a
file not in the local cache will result in a file error. This is
unfortunate, but there is little that can be done if the server
is down and the client does not have a copy of the file to work
with. Clearly, it is undesirable to find that the files you need
are not in the local CODA cache when the server goes down. To address
this, the CODA tool set includes a tool called hoard that
allows you to build a database of hints to CODA as to which files
you wish to be cached locally. You can assign priorities to a file
or set of files that will affect which files are discarded from
cache when the cache is full.
In order to use CODA as a failover file system, we want all the
shared file system to be available on the client and we would create
a hoard command file (say, hoard.conf) with the following
contents:
a /coda/file/tree 600:d+
This tells hoard to add the file tree /coda/file/tree
with a priority of 600 (the higher the number, the higher the priority).
It also says that you want to hoard all subdirectories, including
any new subdirectories created after the hoard command is run,
under the given root directory. Once the hoard.conf file is
set up, you can feed it into hoard like this:
hoard -f hoard.conf
This loads the hoard database with the commands in hoard.conf.
The database need only be set once as it persists across restarts
of the CODA client. Once the hoard database is set up, you can run:
hoard walk
which starts filling the local CODA client cache from the server with
files, as per the specifications in the hoard database. When the hoard
command has completed, the cache on the CODA client contains all the
files requested on disk so they will be accessible even if the server
gets disconnected.
As mentioned before, if the CODA server is down or disconnected
for some reason, an awkward situation can occur if multiple clients
update the same file. In CODA parlance, the file is said to be in
conflict, and the CODA client stops normal operations until the
conflict is resolved by a human. When the client is in conflict,
the name of the volume that is in conflict is logged to the CODA
logs. Using this, an administrator can run the repair tool that
presents a list of changes pending on the client. Also in repair
mode, the versions of the files on the server are viewable as separate
entities, so the server versions of files can be checked against
the local copies. These pending changes can either be integrated
with the server (if there is no conflict), or if the file conflicts,
the administrator can elect to discard the local changes or overwrite
the server copy. Once all the conflicts have been resolved, the
client can mount the CODA volume again and resume normal operation.
In addition to the client-server model, CODA also supports replication
between servers, so you can have multiple CODA servers with the
data automatically replicated between the servers. Thus, you can
either provide redundancy of the CODA server, or you can have multiple
servers separated by a relatively low bandwidth link. Then the CODA
clients bind to the local server but still have up-to-date files.
Again, if the servers lose contact because of a network outage,
the file modifications are logged and reintegrated when the servers
regain connectivity. The process for reintegrating server-to-server
conflicts is a bit different from client-to-server reintegration.
The same repair tool is used, and the process is similar in that
the changes are reviewed and one version is accepted as the correct
one. But, the process is a bit more complex than a client-server
conflict.
By now it should be fairly apparent that the failover when using
CODA happens automatically because it is part of the design of the
file system. By setting up some CODA servers and clients, and by
using the hoard command to fill the local client caches, the clients
will be able to continue functioning if they lose contact with the
server. By using the server-to-server replication, the chances of
clients losing contact with a server can be minimized, as well.
As good as this may sound, CODA does, unfortunately, have some
disadvantages. For a start, it is still an experimental file system,
which means that most places would not even dream of placing CODA
into a production system. Also, CODA is not a solution supported
by any vendor, so the support for CODA is totally reliant on answers
from a mailing list. These factors mean that CODA will probably
not be a serious contender for a while, however, it does show considerable
promise. I currently use CODA at home, and although it was a bit
challenging to set up initially, I have found it very usable for
keeping my home server synchronized with the work I do on my laptop.
Conclusion
This article completes my series on failover, but with a subject
this broad, there are bound to be approaches that have not been
mentioned. I hope that these articles have given you good food for
thought. If you need write access to your files when the server
is down, then CODA is the clear winner here. None of the other approaches
discussed will allow write access by the clients. If you are limited
to implementing vendor-supported solutions, then the cachefs approach
discussed in the first article gives a close approximation to what
CODA can do, but only with read-only clients and manual cache filling.
If you have a large file tree that is fairly static and needs to
be accessed in an ad hoc manner, then implementing redundant servers
is probably the best way to go. Here Sun's automounter has
some fairly clear advantages over AMD in that the failover of the
file servers on the client is more elegant. Presumably this is because
Sun put some smarts into the kernel allowing the automounter to
recover from a server going down part way through a read of a file.
Because AMD is purely done at the user level, these smarts are not
available; thus, the failover is not as smooth.
Brett Lymn works for a global company helping to manage a bunch
of UNIX machines. He spends entirely too much time fiddling with
computers but cannot help himself.
|