Unix 
              Administration with Self-Parsing Shell Scripts
             John Kozubik
              Unix systems administration often consists of repeating common, 
              similar operations on different sets of targets. The notion of using, 
              and stringing together, building-block primitives such as cat, 
              sort, strings, wc, sed, and awk 
              is a core tenet of Unix philosophy. Sometimes jobs are too complicated 
              to solve with a single command-line of pipelined or redirected primitives, 
              or perhaps the job will be repeated often enough that it makes sense 
              to create a more permanent script to produce the desired outcome. 
              The problem that arises is how to allow the script to handle a variety 
              of related inputs, so that it will be useful in the future when 
              different data needs to be processed using the same procedures.
              The obvious solution is to write a script that operates on data 
              files so that the script can exist on its own, and be used to process 
              any number of new data sets as they are introduced in the future. 
              A good example of this is the rc system used for system initialization 
              in FreeBSD. In this arrangement, a number of rc scripts (such as 
              rc.network, rc.sendmail, and rc.diskless[1,2]) are run, each of 
              which in turn parse through a single user-edited configuration file, 
              /etc/rc.conf. In this configuration, changes to /etc/rc.conf alter 
              the way that rc scripts run and what they perform.
              This solution is a useful and elegant one and lends itself to 
              modular organization and an increase in overall ease of use -- 
              especially when implemented as well as it is in the FreeBSD rc system. 
              However, there are more discreet tasks (tasks that do not fit into 
              some larger framework, such as system initialization) that can be 
              simplified by removing even the data files on which the script operates. 
              By incorporating the data into the script itself, you can 
              create powerful systems administration tools in the form of simple 
              shell scripts that consist merely of a single file.
              The basic organization of such a script is a set of one or more 
              data items that are commented out, as they are not actual commands, 
              but commented in such a way that they can be distinguished from 
              normal comments:
              
             
01: #!/bin/sh
02: #
03: # Here is the data set, and perhaps we will add some other comments here
04: #
05: ##DATA var1 var2 var3
06: ##DATA var1 var2 var3
07: ##DATA var1 var2 var3
 
            As you can see, normal comments are commented out with one # character, 
            but data items are commented with ##. Not only does this allow us 
            the ability to parse through the script and easily identify which 
            lines are data lines (as opposed to normal comments), but it also 
            allows us to quickly disable a data line that we temporarily do not 
            want to use. Simply remove one of the # characters from the data line 
            that is not to be used; it will not be parsed because it has no leading 
            ##, but it still starts with #, and thus does not affect the script 
            because it is still commented out.
             The body of the script consists of a variable defining the path 
              of the script itself (obviously the script needs to know the path 
              to itself if it is to parse itself), and then a while loop that 
              reads in every line of the script, but filters out (using grep) 
              only those lines that are data lines, which begin with ##DATA:
              
             
08: myself=/usr/local/bin/script.sh
09:
10: while read line
11: do
12:
13:    if echo $line | grep "^##DATA"
14:    then
15:
16:      var1=`echo $line | awk '{print $2}'`
17:      var2=`echo $line | awk '{print $3}'`
18:      var3=`echo $line | awk '{print $4}'`
19:
20:      diff $var1 $var2 >> $var3
21:
22:    fi
23:
24: done < $myself
            Here the script (in line 08) defines the path to itself and uses a 
            "while read line" construct, the end of which (in line 24) 
            takes as input the name of the script itself. While the script reads 
            every line of itself, it selects, through the use of grep, 
            only the lines that begin with ##DATA. As noted above, disabling a 
            particular line of data can be done by simply removing one of the 
            # characters, as it will still be commented out, but will not be selected 
            as an input line by grep.
             What the script above actually does is trivial -- as you can 
              see, in lines 16, 17, and 18, we simply echo each line and pipe 
              the output through awk to grab the second, third, or fourth 
              word in that line (since the first word is ##ENTRY) and assign it 
              to a variable. Then we perform a simple operation on the variables 
              from that line.
              Because we have three total lines of data in this script (lines 
              05, 06, and 07), the diff action in line 20 will be run a total 
              of three times, each time with a different group of data. There 
              is no limit to the number of data lines we could put in the script. 
              It should be noted, however, that every line in the entire script 
              will be parsed and tested by the grep conditional in line 13, regardless 
              of whether it is a data line.
              What are some practical uses for self-parsing shell scripts such 
              as this? In June 2000, I wrote a simple, but powerful backup utility 
              that, three years later, I now use to back up every system in my 
              organization. The script, available here:
              
             
http://www.kozubik.com/software/kozubik_backup-1.0.tar.gz
 
            contains data lines like this:
             
             
##ENTRY /usr/local/etc www7.kozubik.com-usr-local-etc /mnt/backup1
##ENTRY /var/mail www7.kozubik.com-var-mail /mnt/backup4
 
            The first piece of data is a local path in the filesystem. The second 
            piece of data is a label for that backup, and the final piece of data 
            is a destination for the backup to be placed onto.
             The body of this backup script is somewhat complicated, but can 
              be simplified as follows:
              
             
00: myself=/usr/local/etc/backup.sh
01: while read line
02: do
03:
04:   if echo $line | grep "^##ENTRY"
05:   then
06:
07:   directory=`echo $line | awk '{print $2}'`
08:   name=`echo $line | awk '{print $3}'`
09:   backupdir=`echo $line | awk '{print $4}'`
10:   date=`date '+%y-%m-%d'`
11:
12:   if test -d $backupdir
13:   then
14:     if test -d $dir
15:     then
16:       tar cvzf $backupdir/$date-$name.tar $dir
17:     else
18:       echo "Directory to back up does not exist"
19:     fi
20:   else
21:     echo "Target directory does not exist"
22:   fi
23: done < $myself
            The end result is, given the two data lines above, the creation of 
            two gzipped tar files -- one named 03-07-20-www7.kozubik.com-usr-local-etc.tar 
            that is placed in /mnt/backup1, and one named 03-07-20-www7.kozubik.com-var-mail.tar 
            that is placed in /mnt/backup4. Note that different source directories 
            for the backup can be placed in different target directories, as that 
            is simply a variable defined in each data line.
             Another, more complicated (but very useful) script can be found 
              here:
              
             
http://www.kozubik.com/software/kozubik_pulse-1.0.tar.gz
 
            This script contains data lines like this:
             
             
##QUERY www.example.com 80 "GET /index.html HTTP/1.0" "HTTP/1.1 \
  200 OK" "responds to HTTP requests" "DOES NOT respond to HTTP \
  requests" 2
##QUERY www.example.com 25 "QUIT" "221" "responds to SMTP \
  requests" "DOES NOT respond to SMTP requests" 2
 
            The first piece of data is an address. The second piece of data is 
            a port number. The third piece of information is a string to send 
            to that socket. The fourth piece of information is a string we expect 
            to be returned to us. The fifth and sixth data items are strings to 
            return on success or failure, respectively. Finally, the seventh data 
            item is the number of tries the script should make to query this network 
            service before returning a negative report.
             The body of the script is more complicated, as it calls netcat 
              (nc) to query the services in the data lines of the script, but 
              again, the basic functionality can be simplified as follows:
              
             
00: myself=/usr/local/etc/pulse.sh
01: while read line
02: do
03:
04:   if echo $line | grep "^##QUERY"
05:
06:   then
07:
08:   address=`echo $line | awk '{print $3}'`
09:   port=`echo $line | awk '{print $4}'`
10:   send=`echo $line | awk -F\" '{print $2}'`
11:   receive=`echo $line | awk -F\" '{print $4}'`
12:   upmsg=`echo $line | awk -F\" '{print $6}'`
13:   downmsg=`echo $line | awk -F\" '{print $8}'`
14:   retries=`echo $line | awk -F\" '{print $9}'`
15:   count=0
16:   successes=0
17:
18:   while ( test $count -lt $retries )
19:   do
20:
21:    output=`printf "$send\r\n\r\n" | nc -v -w 3 $address $port`
22:    count=`$expr $count + 1`
23:    if echo $output | grep "$receive"
24:    then
25:    successes=`expr $successes + 1`
26:    fi
27:
28:    done
29:
30:    if test $successes -eq 0
31:    then
32:    echo "$address $downmsg on port $port" >> /tmp/kozubik.pulse.fail.$$
33:    else
34:    echo "$address $upmsg on port $port" >> /tmp/kozubik.pulse.succeed.$$
35:    fi
36:
37:  fi
38:  done < $myself
            In this script, the end result is that each line in the data portion 
            of the script causes netcat to run with the corresponding destination 
            and port number, and is given a number of retries within which to 
            receive a positive response from the server daemon that is presumed 
            to be running at that remote location. Although we do not show it 
            here, the file that this script outputs (either failure or success) 
            can then be emailed to one or more email addresses defined earlier 
            in the script.
             Note one significant difference between the backup script shown 
              above, and this pulse.sh script. Because the backup script has data 
              lines whose elements always consist of single words (such as /usr/local/etc), 
              we can simply parse the lines using awk to grab words in 
              the data line by their number. In the pulse.sh script, however, 
              some of the data elements have multiple words, and look like "responds 
              to HTTP requests", which means when we parse the data lines 
              and pull out individual fields with awk, we need to specify 
              a field separator and offset, as shown in lines 10-14:
              
             
upmsg=`echo $line | awk -F\" '{print $6}'`
            The original basic template, shown as the first code example above, 
            combined with the specific examples of a backup script and a remote 
            monitoring script, show how using a self-parsing shell script can 
            dramatically simplify common Unix systems administration tasks. Compare 
            the simplicity of the pulse.sh script above, which monitors remote 
            server daemons (which I use to monitor more than 800 servers across 
            the world) to the complexity of other remote monitoring applications 
            that encompass complicated software installations and utilize hundreds 
            of program and data files. The abilities to quickly alter one or more 
            data lines, disable lines (but leave them in place) by simply removing 
            one of the two leading # characters, and transport the abilities of 
            these programs in a single, self-contained file represent a significant 
            gain in simplicity and ease of use.
             I encourage you to download and begin using the three fully functional 
              examples at http://www.kozubik.com, and to begin grouping 
              common tasks into new self-parsing shell scripts to extend this 
              simplicity and ease to other tasks I have yet to write examples 
              for.
              John Kozubik has been designing, administering, and programming 
              Unix-based systems for more than 10 years. John owns and operates 
              JohnCompanies, one of the largest Unix colocation providers in the 
              United States.
            |