Unix
Administration with Self-Parsing Shell Scripts
John Kozubik
Unix systems administration often consists of repeating common,
similar operations on different sets of targets. The notion of using,
and stringing together, building-block primitives such as cat,
sort, strings, wc, sed, and awk
is a core tenet of Unix philosophy. Sometimes jobs are too complicated
to solve with a single command-line of pipelined or redirected primitives,
or perhaps the job will be repeated often enough that it makes sense
to create a more permanent script to produce the desired outcome.
The problem that arises is how to allow the script to handle a variety
of related inputs, so that it will be useful in the future when
different data needs to be processed using the same procedures.
The obvious solution is to write a script that operates on data
files so that the script can exist on its own, and be used to process
any number of new data sets as they are introduced in the future.
A good example of this is the rc system used for system initialization
in FreeBSD. In this arrangement, a number of rc scripts (such as
rc.network, rc.sendmail, and rc.diskless[1,2]) are run, each of
which in turn parse through a single user-edited configuration file,
/etc/rc.conf. In this configuration, changes to /etc/rc.conf alter
the way that rc scripts run and what they perform.
This solution is a useful and elegant one and lends itself to
modular organization and an increase in overall ease of use --
especially when implemented as well as it is in the FreeBSD rc system.
However, there are more discreet tasks (tasks that do not fit into
some larger framework, such as system initialization) that can be
simplified by removing even the data files on which the script operates.
By incorporating the data into the script itself, you can
create powerful systems administration tools in the form of simple
shell scripts that consist merely of a single file.
The basic organization of such a script is a set of one or more
data items that are commented out, as they are not actual commands,
but commented in such a way that they can be distinguished from
normal comments:
01: #!/bin/sh
02: #
03: # Here is the data set, and perhaps we will add some other comments here
04: #
05: ##DATA var1 var2 var3
06: ##DATA var1 var2 var3
07: ##DATA var1 var2 var3
As you can see, normal comments are commented out with one # character,
but data items are commented with ##. Not only does this allow us
the ability to parse through the script and easily identify which
lines are data lines (as opposed to normal comments), but it also
allows us to quickly disable a data line that we temporarily do not
want to use. Simply remove one of the # characters from the data line
that is not to be used; it will not be parsed because it has no leading
##, but it still starts with #, and thus does not affect the script
because it is still commented out.
The body of the script consists of a variable defining the path
of the script itself (obviously the script needs to know the path
to itself if it is to parse itself), and then a while loop that
reads in every line of the script, but filters out (using grep)
only those lines that are data lines, which begin with ##DATA:
08: myself=/usr/local/bin/script.sh
09:
10: while read line
11: do
12:
13: if echo $line | grep "^##DATA"
14: then
15:
16: var1=`echo $line | awk '{print $2}'`
17: var2=`echo $line | awk '{print $3}'`
18: var3=`echo $line | awk '{print $4}'`
19:
20: diff $var1 $var2 >> $var3
21:
22: fi
23:
24: done < $myself
Here the script (in line 08) defines the path to itself and uses a
"while read line" construct, the end of which (in line 24)
takes as input the name of the script itself. While the script reads
every line of itself, it selects, through the use of grep,
only the lines that begin with ##DATA. As noted above, disabling a
particular line of data can be done by simply removing one of the
# characters, as it will still be commented out, but will not be selected
as an input line by grep.
What the script above actually does is trivial -- as you can
see, in lines 16, 17, and 18, we simply echo each line and pipe
the output through awk to grab the second, third, or fourth
word in that line (since the first word is ##ENTRY) and assign it
to a variable. Then we perform a simple operation on the variables
from that line.
Because we have three total lines of data in this script (lines
05, 06, and 07), the diff action in line 20 will be run a total
of three times, each time with a different group of data. There
is no limit to the number of data lines we could put in the script.
It should be noted, however, that every line in the entire script
will be parsed and tested by the grep conditional in line 13, regardless
of whether it is a data line.
What are some practical uses for self-parsing shell scripts such
as this? In June 2000, I wrote a simple, but powerful backup utility
that, three years later, I now use to back up every system in my
organization. The script, available here:
http://www.kozubik.com/software/kozubik_backup-1.0.tar.gz
contains data lines like this:
##ENTRY /usr/local/etc www7.kozubik.com-usr-local-etc /mnt/backup1
##ENTRY /var/mail www7.kozubik.com-var-mail /mnt/backup4
The first piece of data is a local path in the filesystem. The second
piece of data is a label for that backup, and the final piece of data
is a destination for the backup to be placed onto.
The body of this backup script is somewhat complicated, but can
be simplified as follows:
00: myself=/usr/local/etc/backup.sh
01: while read line
02: do
03:
04: if echo $line | grep "^##ENTRY"
05: then
06:
07: directory=`echo $line | awk '{print $2}'`
08: name=`echo $line | awk '{print $3}'`
09: backupdir=`echo $line | awk '{print $4}'`
10: date=`date '+%y-%m-%d'`
11:
12: if test -d $backupdir
13: then
14: if test -d $dir
15: then
16: tar cvzf $backupdir/$date-$name.tar $dir
17: else
18: echo "Directory to back up does not exist"
19: fi
20: else
21: echo "Target directory does not exist"
22: fi
23: done < $myself
The end result is, given the two data lines above, the creation of
two gzipped tar files -- one named 03-07-20-www7.kozubik.com-usr-local-etc.tar
that is placed in /mnt/backup1, and one named 03-07-20-www7.kozubik.com-var-mail.tar
that is placed in /mnt/backup4. Note that different source directories
for the backup can be placed in different target directories, as that
is simply a variable defined in each data line.
Another, more complicated (but very useful) script can be found
here:
http://www.kozubik.com/software/kozubik_pulse-1.0.tar.gz
This script contains data lines like this:
##QUERY www.example.com 80 "GET /index.html HTTP/1.0" "HTTP/1.1 \
200 OK" "responds to HTTP requests" "DOES NOT respond to HTTP \
requests" 2
##QUERY www.example.com 25 "QUIT" "221" "responds to SMTP \
requests" "DOES NOT respond to SMTP requests" 2
The first piece of data is an address. The second piece of data is
a port number. The third piece of information is a string to send
to that socket. The fourth piece of information is a string we expect
to be returned to us. The fifth and sixth data items are strings to
return on success or failure, respectively. Finally, the seventh data
item is the number of tries the script should make to query this network
service before returning a negative report.
The body of the script is more complicated, as it calls netcat
(nc) to query the services in the data lines of the script, but
again, the basic functionality can be simplified as follows:
00: myself=/usr/local/etc/pulse.sh
01: while read line
02: do
03:
04: if echo $line | grep "^##QUERY"
05:
06: then
07:
08: address=`echo $line | awk '{print $3}'`
09: port=`echo $line | awk '{print $4}'`
10: send=`echo $line | awk -F\" '{print $2}'`
11: receive=`echo $line | awk -F\" '{print $4}'`
12: upmsg=`echo $line | awk -F\" '{print $6}'`
13: downmsg=`echo $line | awk -F\" '{print $8}'`
14: retries=`echo $line | awk -F\" '{print $9}'`
15: count=0
16: successes=0
17:
18: while ( test $count -lt $retries )
19: do
20:
21: output=`printf "$send\r\n\r\n" | nc -v -w 3 $address $port`
22: count=`$expr $count + 1`
23: if echo $output | grep "$receive"
24: then
25: successes=`expr $successes + 1`
26: fi
27:
28: done
29:
30: if test $successes -eq 0
31: then
32: echo "$address $downmsg on port $port" >> /tmp/kozubik.pulse.fail.$$
33: else
34: echo "$address $upmsg on port $port" >> /tmp/kozubik.pulse.succeed.$$
35: fi
36:
37: fi
38: done < $myself
In this script, the end result is that each line in the data portion
of the script causes netcat to run with the corresponding destination
and port number, and is given a number of retries within which to
receive a positive response from the server daemon that is presumed
to be running at that remote location. Although we do not show it
here, the file that this script outputs (either failure or success)
can then be emailed to one or more email addresses defined earlier
in the script.
Note one significant difference between the backup script shown
above, and this pulse.sh script. Because the backup script has data
lines whose elements always consist of single words (such as /usr/local/etc),
we can simply parse the lines using awk to grab words in
the data line by their number. In the pulse.sh script, however,
some of the data elements have multiple words, and look like "responds
to HTTP requests", which means when we parse the data lines
and pull out individual fields with awk, we need to specify
a field separator and offset, as shown in lines 10-14:
upmsg=`echo $line | awk -F\" '{print $6}'`
The original basic template, shown as the first code example above,
combined with the specific examples of a backup script and a remote
monitoring script, show how using a self-parsing shell script can
dramatically simplify common Unix systems administration tasks. Compare
the simplicity of the pulse.sh script above, which monitors remote
server daemons (which I use to monitor more than 800 servers across
the world) to the complexity of other remote monitoring applications
that encompass complicated software installations and utilize hundreds
of program and data files. The abilities to quickly alter one or more
data lines, disable lines (but leave them in place) by simply removing
one of the two leading # characters, and transport the abilities of
these programs in a single, self-contained file represent a significant
gain in simplicity and ease of use.
I encourage you to download and begin using the three fully functional
examples at http://www.kozubik.com, and to begin grouping
common tasks into new self-parsing shell scripts to extend this
simplicity and ease to other tasks I have yet to write examples
for.
John Kozubik has been designing, administering, and programming
Unix-based systems for more than 10 years. John owns and operates
JohnCompanies, one of the largest Unix colocation providers in the
United States.
|