Do 
              It Yourself with the Shell
             Ed Schaefer and Michael Wang
              Handy Unix tools such as Perl, awk, sed, and grep are frequently 
              used to solve programming problems, while the shell may be overlooked 
              as a problem-solving tool. Most of us garner Unix knowledge by learning 
              Unix commands and, to the uninitiated, shell programming is just 
              chaining a series of commands together. As a command interpreter, 
              the shell does not distinguish its built-in commands from external 
              commands; you do not call external commands using a function as 
              Perl's system() function. Using the shell over an external 
              tool is analogous to performing a task yourself versus hiring somebody 
              else to do it. Provided you have the knowledge and experience, it's 
              generally better to do it yourself.
              This article presents "do it yourself" techniques in 
              shell using real-world examples -- examples gathered from printed 
              material, USENET postings, and our own work. In this article, we 
              contrast standard Unix tool solutions with corresponding shell solutions 
              and compare the different approaches. We present 13 specific sample 
              problems with corresponding solutions within 5 broader technique 
              categories. The techniques considered are: parameter expansion and 
              positional parameters, pattern matching, while-loop, shell arithmetic, 
              and file and process operations.
              All of the shell solutions were tested using bash 2.x and ksh93. 
              Most of them work under ksh88 unaltered or with minor modification; 
              however, we did not purposefully twist the code to work under ksh88. 
              Major commercial Unixes (e.g., Solaris, AIX, HP-UX) ship with ksh93. 
              Recent versions of ksh93 and bash are available via open source 
              licenses. When we refer to portability, we mean portability across 
              different platforms, not across different shells.
              
            Technique: Parameter Expansion and Positional Parameters
              Problem 1: Find Year, Month, Date from String
              Consider the string i="2003-07-04" in the format of 
              YYYY-MM-DD. Find the year, month, and day. Here are two possible 
              awk solutions (Examples 1a and 1b):
              
             
set -- $(echo $i | awk '{gsub(/-/, " "); print}')
set -- $(echo $i | awk -F- '{print $1" "$2" "$3}')
# Your Unix variant might require nawk
            And, here are two sed solutions (Examples 1c and 1d):
             
             
set -- $(echo $i | sed "s/-/ /g")
set -- $(echo $i | sed "s/\([0-9]*\)-\([0-9]*\)-\([0-9]*\)/\1 \2 \3/")
 
            Two possible Perl solutions (Examples 1e and 1f) are as follows:
             
             
set -- $(echo $i | perl -pe "s/-/ /g")
set -- $(echo $i | perl -ne "print(STDOUT join(' ', split(/-/)))")
            There are also many shell solutions, and we will show four separate 
            ones. The first one (Example 1g) looks like this:
             
             
set -- $(IFS=-; echo $i)
 
            This solution is similar to Examples 1b and 1f. Inside, the subshell 
            "$(...)" assigns field separator variable IFS to "-" 
            and expands "$i" into three fields. The built-in set shell 
            assigns the objects into positional parameters $1, $2, and $3, which 
            are year, month, and day.
             The use of "subshell $(...)" localizes IFS to the subshell 
              environment, for the purpose of $i expansion. IFS is not affected 
              in the parent shell environment.
              echo works in our example because the string in $i 
              is known. In general, it may contain a space or other shell special 
              character. The following shell command takes care of these (Example 
              1h):
              
             
eval set -- $(IFS=-; printf '"%s" ' $i)
 
            However, if this looks complicated, use the more straightforward approach:
             
             
oIFS=$IFS
IFS=-
set -- $i
IFS=$oIFS
 
            Here is the second shell solution (Example 1i):
             
             
set -- ${i//-/ }
            This solution is similar to Examples 1a, 1c, and 1e, which replace 
            every occurrence of "-" with a space, and set the expansion 
            parameters. This solution does not work if $i contains a space. 
            In that case, use:
             
             
eval set -- \"${i//-/\" \"}\"
            Most of the time, we know our data structure, so the above (complicated) 
            command is not necessary. For clarity, we do not use this general 
            form of command in the rest of article.
             The third shell solution (Example 1j) looks like this:
              
             
year=${i%%-*}
month=${i#*-}
month=${month%-*}
day=${i##*-}
            "%" and "%%" remove the smallest and largest pattern 
            matched from right side, and "#" and "##" remove 
            from the left side. The sed equivalents are as follows (Example 1k):
             
             
year=$(echo $i | sed "s:-.*$::")
month=$(echo $i | sed "s:^[^-]*-::")
month=$(echo $month | sed "s:-[^-]*$::")
day=$(echo $i | sed "s:^.*-::")
 
            This method is useful when a variable does not contain a separator. 
            For example, $i is of the form of YYYYMMDD. In this case, we 
            just need to slightly change the shell construct to the following 
            (Example 1l):
             
             
year=${i%%????}
month=${i#????}
month=${month%??}
day=${i##??????}
            To be more precise, replace "?" with [0-9].
             Using the original format, YYYY-MM-DD, the fourth shell solution 
              (Example 1m) is probably the most straightforward solution, if the 
              exact position of the data is known:
              
             
year=${i:0:4}
month=${i:5:2}
day=${i:8}
            which corresponds to:
             
             
year=$(echo $i | cut -c1-4)
month=$(echo $i | cut -c6-7)
day=$(echo $i | cut -c9-)
 
            Problem 1: Discussion
             As shown in these examples, the shell provides multiple ways to 
              do the string manipulation. There is no reason to use sed, awk, 
              or Perl to perform this function. The shell solution is preferred 
              for independence and efficiency. We do not need to worry about PATH, 
              tool availability, and tool behavior on different platforms.
              Performance can be an issue when using external commands -- 
              especially in a loop. Suppose you need to process a large file, 
              editing data in each line. The time saved using the shell over other 
              tools may be significant.
              Problem 2: File System Usage
              Parse df -Pk /usr output, and find the usage percentage. 
              The -P implies the POSIX output format. The POSIX-compliant 
              command under Solaris is /usr/xpg4/bin/df. Example 2a shows 
              an awk solution:
              
             
# may require nawk under Solaris 7 & SCO Open Server V
df -Pk /usr | awk '!/^Filesystem/ {sub(/%/, ""); print $5}'
            Here is a sed solution (Example 2b):
             
             
df -Pk /usr | sed -n '/^Filesystem/!s/.* \ \([0-9]\{1,\}\)%.*/\1/p'
            We can also solve this with Perl as follows (Example 2c):
             
             
df -Pk /usr | perl -pe "BEGIN { $/ = '' } s/.*\s([0-9]+)%\s.*/\$1\n/s"
            This solution combines grep, awk, and sed (Example 2d):
             
             
df -Pk /usr | grep -v "^Filesystem" | awk '{print $5}' | sed "s:%::"
            Note that the two shell techniques introduced in Problem 1 can be 
            applied here. Here is one possible shell solution (Example 2e):
             
             
set -- $(df -Pk /usr); echo ${12%\%}
            This solution should work on all native Unix systems, but UWIN, a 
            Unix environment running on Microsoft Windows, fails since that file 
            system name has an extra space:
             
             
$ df -Pk
Filesystem             Kbytes     Used    Avail Capacity Mounted on
C:\Program Files\UWIN  4000153  3868891   131262     97% /
 
            where C:\Program Files\UWIN is considered as two fields.
             A second shell solution (Example 2f) is as follows:
              
             
(IFS=%; set -- $(df -Pk /usr); echo ${1##* })
            This solution works on both native Unix systems and UWIN. It splits 
            the "df" output by "%", sets the first part to 
            "$1", removes everything to the last space, and leaves the 
            percentage number.
             Problem 2: Discussion
              Example 2f demonstrates simple techniques on this not-so-obvious 
              case, providing the shortest, quickest, and most elegant solution.
              
            Technique: Pattern Matching
              Problem 3: Find Whether $i Is an Integer
              To solve this problem, we need to define an integer. If integer 
              definition is 0 or 1 plus or minus signs, followed by a series of 
              digits, the phrase can be directly translated to a shell function 
              (Example 3a):
              
             
function is_integer {
  [[ $1 = ?([+-])+([0-9]) ]]
}
            Is this function complete? Yes. This function's return code determines 
            whether the condition is true (0) or false (non zero). We can use 
            the function like this (Example 3b):
             
             
is_integer $i && echo "yes" || echo "no"
 
            This function thinks example data "-1e2" and "0xff" 
            are not integers. This may be the exact requirement. For example, 
            the script asks the user to input a date in YYYYMMDD format, and checks 
            whether the entered string is an integer before further processing.
             However, if decimal and scientific notation are accepted, Bolsky 
              and Korn provide a more generic solution in their book The New 
              KornShell Command and Programming Language (Example 3c):
              
             
isnum()
{
    case $1 in
    ( ?([-+])+([0-9])?(.)*([0-9])?([Ee]?([-+])+([0-9])) )
        return 0;;
    ( ?([-+])*([0-9])?(.)+([0-9])?([Ee]?([-+])+([0-9])) )
        return 0;;
    *)  return 1;;
    esac
}
            Examples 3a and 3c use extended pattern matching operators, which 
            are not enabled by default in bash. To enable the extended pattern 
            matching, use:
             
             
shopt -s extglob
 
            In the rest of article, we assume the extended pattern matching is 
            enabled.
             It happens that the external command expr, which evaluates 
              arguments as an expression, only accepts integers as a series of 
              digits. This is a tool solution (Example 3d):
              
             
function is_integer {
  expr "$1" + 0 > /dev/null 2>&1
}
            However, if expr ever becomes smarter and handles floating-point 
            numbers, scientific notations, and hex numbers, this solution will 
            not work as planned.
             We can use regular expressions used by grep -E (Example 
              3e):
              
             
function is_integer {
  printf "%s\n" $1 | grep -E "^[+-]?[0-9]+$"  >/dev/null
}
            And, we can use Perl's regular expressions as follows (Examples 
            3f and 3g):
             
             
function is_integer {
  perl -e "\$_ = \"$1\"; /^[+-]?\d+$/ || exit 1"
}
function is_integer {
  printf "%s\n" $1 | perl -e "<> =~ /^[+-]?\d+$/ || exit 1"
}
            Problem 3: Discussion
             The shell provides a more direct pattern-matching solution than 
              external tools. You must either feed the tool's standard input, 
              or expand it to fit the tool's syntax (Example 3f). Using the 
              shell eliminates the extra process.
              Problem 4: Add a Directory in the PATH
              In this problem, we need to add a directory in the PATH if it's 
              not already in the PATH (path munge). The /etc/profile that comes 
              with a certain Linux distribution contains this function (Example 
              4a):
              
             
pathmunge () {
        if ! echo $PATH | /bin/egrep -q "(^|:)$1($|:)" ; then
           if [ "$2" = "after" ] ; then
              PATH=$PATH:$1
           else
              PATH=$1:$PATH
           fi
        fi
}
            This function checks whether an object already exists in $PATH and, 
            if not, it adds it to the beginning or the ending of the PATH variable, 
            depending on $2. We rewrote this typical pattern-matching exercise 
            as follows (Example 4b):
             
             
shopt -s extglob 2>/dev/null
pathmunge() {
  [[ $PATH = ?(*:)$1?(:*) ]] && return 0
  [[ "$2" = "after" ]] && PATH=$PATH:$1 || PATH=$1:$PATH
}
            This significant change replaces "/bin/egrep" with pattern 
            matching. The other changes are a matter of style using "&&" 
            and "||" versus if-then-else.
             When running each version of the function 10 times in a ksh93 
              environment, and using $SECONDS for timing (bash does not support 
              floating-point arithmetic), the results are:
              
             
pattern version: 0.016 seconds
egrep   version: 0.542 seconds
 
            The egrep version has other problems. The author of this function 
            must have worried about PATH, because the path of egrep is 
            hard-coded. However, egrep is defined in Red Hat 9 as:
             
             
#!/bin/sh
exec grep -E ${1+"$@"}
            and no path is given to grep. If we use the function to bootstrap 
            PATH:
             
             
PATH=
pathmunge /bin
 
            the egrep version fails.
             You may think that an easier solution is to use /bin/grep -E 
              directly instead of /bin/egrep. While this works on Linux, 
              it produces an error on Solaris:
              
             
/bin/grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .
 
            The POSIX version of grep on Solaris resides in /usr/xpg4/bin, 
            not /bin. getconf PATH produces a POSIX-compliant PATH. Since 
            getconf is not built-in on all shell versions, the PATH for getconf 
            consists of all the unique paths on which getconf may reside.
             In ksh-style functions that support localized variables, you can 
              do this:
              
             
function name {
  typeset PATH=$(PATH=/usr/bin:... getconf PATH)
  ...
}
            In POSIX-style functions, do this:
             
             
function_name() {
  OPATH=$PATH
  PATH=$(PATH=/usr/bin:... getconf PATH)
  ...
  PATH=$OPATH
}
            The extra lines for portability reduce readability and efficiency.
             Problem 4: Discussion
              A shell program is more portable because it does not depend on 
              external commands; therefore, we need not worry about the PATH definition. 
              Besides portability, there is a cost associated with spawning a 
              process, especially when the process executes often (as in a loop). 
              While it's not always possible to totally avoid external commands, 
              make an effort to use available shell resources.
              
            Technique: While-Loop
              Problem 5: Get infile
              An Oracle SQL*Loader control file looks like the following:
              
             
-- This is comment line: Oracle Sql*Loader reads records in infile
-- ANNOTATIONS.dat, separated by <er>, and loads them into table
-- ANNOTATIONS.
load data
infile 'ANNOTATIONS.dat' "str '<er>'"
into table ANNOTATIONS
...
 
            We want to retrieve the file name "ANNOTATIONS.dat" as part 
            of a larger task. Someone new to shell programming may jump to grep 
            and other tools within easy reach like this (Example 5a):
             
             
cat $file | grep -v -- "--" | grep "infile" | awk "{print \$2}" | \
  sed "s:'::g" | head -1
            This can also be done without the unnecessary use of cat:
             
             
grep -v -- "--" $file | grep "infile" | awk "{print \$2}" | sed \
  "s:'::g" | head -1
            But, this solution is inefficient. Six tools are used to solve this 
            problem. A rule of thumb is that when three or more pipes are used, 
            there's an opportunity for optimization. A more efficient one-tool 
            solution using awk might look like this (Example 5b):
             
             
awk -F\' "/^infile/ {print \$2; exit}" $control_file
            A sed solution looks like (Example 5c):
             
             
sed -n "1,/^infile/s/^infile[^']*'\([^']*\)'.*$/\1/p" $control_file
 
            And, here is a Perl solution (Example 5d):
             
             
perl -ne "s/^infile[^']*'([^']+)'.*/\$1/ && print && last" $control_file
 
            Our first shell solution (Example 5e) uses a new trick -- read 
            the file line-by-line, and use the previously covered techniques to 
            process the lines.
             
             
while read i; do
  [[ $i == infile* ]] && { (IFS="'"; set -- $i; echo $2); break; }
done < $control_file
            Here is a second shell solution (Example 5f):
             
             
while read i; do
  [[ $i == infile* ]] && { i=${i#*\'}; echo ${i%%\'*}; break; }
done < $control_file
            Problem 5: Discussion
             This example represents a generic problem. You often must read 
              a file line by line, retrieving some data and making changes to 
              the current line (or the previous or next line). When many files 
              need to be processed, we prefer a shell solution. We processed 450 
              such files using a for loop:
              
             
for ctl in *.ctl
do
  ...
done
 
            On a Sun Fire box with 12 750-MHz CPUs and 12G memory, the timings 
            for the shell solution (Example 5e) were:
             
             
real        0.190
user        0.112
sys         0.073
 
            For the awk solution (Example 5b), they were:
             
             
real        6.649
user        0.290
sys         1.022
 
            Using the shell makes a real difference!
             Problem 6: Skip head and tail
              In this problem, we want to print a file ($file), but skip the 
              first "$first" lines and last "$last" lines. 
              In these examples, assume that shell variables "file", 
              "first", and "last" are predefined such as file="/etc/passwd", 
              first=2, and last=3.
              Many shell programmers will immediately think of the head 
              and tail commands. Yes indeed, head and tail can do the job. 
              Here is a three-tool solution (Example 6a):
              
             
total=$(wc -l < $file)
(( head = total - last ))
(( tail = head - first ))
(( tail > 0 )) && head -$head $file | tail -$tail
 
            The file is read three times. You can also use sed to replace head 
            and tail, and reduce the number of passes. Here is a two-tool 
            solution (Example 6b):
             
             
total=$(wc -l < $file)
(( start = first + 1 ))
(( stop = total - last ))
(( total > first + last )) && sed -n "${start},${stop}p" $file
            This problem can be solved by sed alone with an advanced sed solution 
            (Example 6c):
             
             
sed -ne :a -e "1,$((first + last))!{P;N;D;}" -e "${first}d;N;ba" $file
            We can also solve it using the shell alone, without external tools. 
            If we translate the sed solution to shell, it would be (Example 6d):
             
             
(( a = 0 ))
set --
while IFS= read -r i; do
  (( a++ ))
  (( a <  first )) && set -- "$@" "$i"
  (( a == first )) && set --
  (( a >  first )) && set -- "$@" "$i"
  (( a >  first + last )) && { printf "%s\n" "$1"; shift; }
done < $file
            This solution first reads in lines before the "{first}-th" 
            line. At the "${first}-th" line, it clears the positional 
            parameters (pattern space in sed). Then, it reads and holds the "$last" 
            lines. Finally, it reads in a new line onto the bottom, prints the 
            line on top, and removes it.
             However, the shell has its own idioms, and it does not follow 
              sed's idiosyncrasies. Here is a simplified version (Example 
              6e):
              
             
(( a = 0 ))
set --
while IFS= read -r i; do
  (( a++ ))
  (( a >  first )) && set -- "$@" "$i"
  (( a >  first + last )) && { printf "%s\n" "$1"; shift; }
done < $file
            We can also use a named array without using shift (Example 6f):
             
             
(( a = 0 ))
while IFS= read -r i; do     
  (( a++ ))
  (( b = a % last ))
  (( a > first + last )) && printf "%s\n" "${L[b]}"
  (( a > first )) && L[b]="$i"
done < $file
            Problem 6: Discussion
             The advantage of this example is its flexibility. The difference 
              between the tool solution and the shell solution is analogous to 
              the SQL language (the tool) and the PL/SQL environment (the shell). 
              With the tool solution, you generally tell the tool what you want 
              to do, but with the shell solution, you build a step-by-step procedure 
              for a solution. Because the work is done step-by-step, it is easy 
              to add new functionality. For example, you can skip certain lines 
              or calculate line lengths with ease.
              Example 6c is the fastest when processing a few larger files, 
              and the shell solutions are the fastest when processing many smaller 
              files. If maximum clarity is your goal, then Examples 6a and 6b 
              are easiest to read.
              
            Technique: Shell Arithmetic
              Problem 7: What is Next Month
              Given a year and month in YYYYMM format, we need to return the 
              next month in the same format. Here is a shell solution (Example 
              7a):
              
             
function next_month {
  typeset ym=$1 y m
  (( y = ym / 100 ))
  (( m = ym - y*100 ))
  (( y += m / 12 ))
  (( m = m % 12 + 1 ))
  printf "%.4d%.2d\n" $y $m
}
            This example uses only shell arithmetic. Assume the "next_month" 
            parameter or "ym" is 200307. Since (( y = ym / 100 ) 
            is an integer operation, y is the largest integer not greater than 
            the division, which is the year, 2003 in this case. This is similar 
            to the floor() function in SQL.
             (( m = 200307 - 2003*100 )) calculates the month, which 
              is 7. Since (( 7/12 )) is 0, the year is 2003 + 0, which 
              is still 2003. Only when the current month equals 12 is the year 
              incremented by 1.
              The modulo "%" operator delivers the remainder. This 
              is similar to the mod() function in SQL. (( 7 % 12 )) is 
              7, so next month is 7 + 1, which equals 8. When the current month 
              is 12, (( 12 % 12 )) is 0, so next month is 1. The printf 
              command delivers the results in the required format.
              If the math makes you queasy, use any combination of shell techniques 
              for "next_month". Without using any external commands, 
              the function can be rewritten as such (Example 7b):
              
             
function next_month {
  typeset y=${1%??} m=${1#????}
  (( m += 1 ))
  (( m == 13 )) && { (( m = 1 )); (( y += 1 )); }
  printf "%.4d%.2d\n" $y $m
}
            Using expr (Example 7c):
             
             
function next_month {
  typeset ym=$1 y m
  y=$(expr $ym / 100)
  m=$(expr $ym - $y \* 100)
  y=$(expr $y + $m / 12)
  m=$(expr $m % 12 + 1)
  printf "%.4d%.2d\n" $y $m
}
            Using bc (Example 7d):
             
             
function next_month {
  typeset ym=$1 y m
  y=$(print "$ym / 100" | bc)
  m=$(print "$ym - $y*100" | bc)
  y=$(print "$y + $m / 12" | bc)
  m=$(print "$m % 12 + 1" | bc)
  printf "%.4d%.2d\n" $y $m
}
            Using GNU date (Example 7e):
             
             
GNU_DATE=/path/to/GNU/date
function next_month {
  ${GNU_DATE} -d "${1%??}-${1#????}-01 + 1 month" +%Y%m
}
            Problem 7: Discussion
             Generally, in the modern shells, there is no reason to use external 
              commands such as expr or bc to perform arithmetic. 
              For complicated date arithmetic, using GNU date or Perl is better 
              than building the logic yourself using shell. Do what you can, but 
              don't hurt yourself.
              Problem 8: Sum across Rows
              Harry Potter, taking a shell class at Hogwarts School of Witchcraft 
              and Wizardry, posted this problem on comp.unix.shell. He has an 
              input file like the following:
              
             
Date,2003-05-01,2003-05-02,2003-05-03,2003-05-04,2003-05-05
item-a,1,1,1,0,3
item-b,2,2,0,0,2
...
 
            He writes, "How can I sum the numbers across each row and add 
            a "total" field to the end of each line?" Example 8a 
            shows an awk solution:
             
             
awk -F"," '/^Date/ {print $0 ",total"}
           /^item/ {sum=0; for(i=2;i<=NF;i++){sum+=$i};
                    print $0 "," sum}' input.txt
            Our shell translation is (Example 8b):
             
             
while IFS= read -r; do
  if [[ $REPLY = "Date"* ]]; then
    echo "$REPLY,total"
  elif [[ $REPLY = "item"* ]]; then
    (( sum = 0 ))
    IFS=","; set -- $REPLY; shift
    for i; do (( sum += i )); done
    echo "$REPLY,$sum"
  fi
done < input.txt
 
            We can take advantage of shell idioms, and shorten the code as follows 
            (Example 8c):
             
             
while IFS= read -r a; do
  [[ $a = "Date"* ]] && { echo "$a,total"; continue; }
  [[ $a = "item"* ]] && { 
    b=${a#*,}; eval echo "$a,$(( ${b//,/+} ))"
  }
done < input.txt
            The parameter operation ${b//,/+}, available in ksh93 and bash 
            2.x, is equivalent to $(echo "$b" | sed "s:,:+:g").
             Problem 8: Discussion
              We compared the performance of Example 3c with that of Example 
              3a on a 300-MHz Intel box running UWIN. We found that ksh93 performs 
              better for smaller data sets, while awk performs better for larger 
              datasets. At 200 records, ksh93 and awk break even (0.25 seconds). 
              Even for 1000 records, the ksh93 program finishes within a second 
              (0.75 seconds) while awk finishes in 0.35 seconds. Having a pure 
              shell solution -- especially inside a shell program -- may 
              outweigh the sub-second loss for moderate-sized files.
              For files with millions of records, however, neither shell nor 
              Unix tools are good solutions. You probably need to load the data 
              into a database, and process it with SQL. Use the right tools for 
              the job.
              
            Technique: File and Process Operations
              Problem 9: Move All Files Except "old" to "old" 
              Directory
              When applied to this problem, the simple command (Example 9a):
              
             
mv * old
 
            produces an error:
             
             
mv: old: cannot rename to old/old
 
            You could use grep -v (Example 9b):
             
             
ls -1 | grep -v "^old$" | xargs -I {} -t mv {} old
            Yes, this works, but a shell pattern can be used for path name expansion. 
            Additionally, pattern matching and substring expansion provide a much 
            simpler solution (Example 9c):
             
             
mv !(old) old
 
            This reads as "move everything that is not old to old". 
            On rare occasions, this error may happen:
             
             
-ksh: /usr/xpg4/bin/mv: cannot execute [Arg list too long]
 
            This error occurs because ksh calls execve() to start the mv 
            command, and execve() limits its arguments to a size defined by kernel 
            parameter ARG_MAX, which has default value of 1M on Solaris 8 and 
            128K on Red Hat Linux 9. A shell for loop solves the problem (Example 
            9d):
             
             
for i in !(old); do mv $i old; done
 
            Problem 9: Discussion
             Shell patterns can be used in more sophisticated situations. For 
              example, to uncompress all *.gz files except the following:
              
             
FRAMEWORK.dat1.gz
VARHISTORY.dat1.gz
VARHISTORY.dat2.gz
SCENARIOS.dat1.gz
SCENARIOS.dat2.gz
SCENARIOS.dat3.gz
 
            you would use:
             
             
gzip -d !(FRAMEWORK.dat?|VARHISTORY.dat?|SCENARIOS.dat?).gz
 
            Replace ! with @ if you want only to uncompress these files.
             Problem 10: Find Files Created Today
              In this problem, we want to find all the files in the current 
              directory created today. (For simplicity, assume there are no subdirectories.) 
              For all solutions, we need to create a reference file (Example 10a):
              
             
touch -t $(date +%Y%m%d)0000.00 /tmp/timestamp$$
 
            The following solution uses find (Example 10b):
             
             
find * -type f -newer /tmp/timestamp$$ -print
 
            Here is a sed solution (Example 10c):
             
             
ls -1rt * /tmp/timestamp11354 | sed "1,/timestamp11354/d"
 
            And here is a shell solution (Example 10d):
             
             
for i in *; do [[ $i -nt /tmp/timestamp$$ ]] && echo $i; done
 
            Problem 10: Discussion
             We can use a combination of tool and shell solutions to solve 
              problems in more complicated situations. For example:
              
             
find . -type f | while read file; do ...; done
 
            As Solaris systems processes are represented in the pseudo file system 
            /proc, we use the shell operator to compare the timing of two Unix 
            processes, like:
             
             
[[ /proc/$PID1 -nt /proc/$PID2 ]] && { echo "$PID1 is started after $PID2"; }
            Problem 11: Test Whether a File Is Empty
             The following example is taken from a magazine published by a 
              commercial database vendor (Example 11a):
              
             
#!/bin/ksh
...
v_corruptions="'cat $TMP_FILE | wc -l'"
if [ $v_corruptions -ne 0 ]; then
  echo "Data Block Corruptions Occurred."
fi
 
            Since the shell already provides an operator to test for empty files, 
            the above code can be rewritten as (Example 11a):
             
             
[[ -s $TMP_FILE ]] && { echo "Data Block Corruptions Occurred."; }
            Besides avoiding starting two commands, the "-s" operator 
            calls the stat() function retrieving information from the stat structure 
            without cycling through the file contents, and, therefore, is more 
            efficient. The ls command does the same, except shell executes 
            an extra fork() function call.
             Problem 11: Discussion
              This technique may not make a big difference when the $TMP_FILE 
              file is small, but we think it is important to habitually use more 
              efficient solutions.
              Problem 12: Check the Existence of a Process
              Quite often, we see shell programs like this (Example 12a):
              
             
ps -ef | grep ora_smon_$SID | grep -v grep >/dev/null && {
  echo "Instance $SID is up."
}
            We can improve the efficiency (Example 12b) as follows:
             
             
ps -u$OWNER -f | grep "[o]ra_smon_$SID"
 
            The "-u$OWNER" limits the ps output. The double quotes tell 
            the shell to pass "[o]" unaltered to grep for interpretation. 
            The regular expression "[o]" matches "o", but 
            since the command in the process table is:
             
             
grep [o]ra_smon_$SID
 
            it won't match itself. This eliminates the unnecessary grep 
            -v.
             Use a more strict expression:
              
             
ps -u$OWNER -f | grep "[0-9] ora_smon_$SID"
 
            to make it more unlikely to match itself or other unwanted processes.
             Problem 12: Discussion
              Some systems (Solaris, Red Hat Linux) provide a pgrep command 
              that "will never consider itself a potential match", according 
              to the man page. Use pgrep if portability is not a concern.
              Problem 13: Shut Down the Database
              We want to gracefully shut down the Oracle database (shutdown_immediate 
              function). If this does not shut down the database within, say, 
              five minutes, it kills the shutdown process and issues the forceful 
              shutdown command (shutdown_abort):
              
             
shutdown_immediate & (( BPID = $! ))
(( SECONDS = 0 ))
while (( SECONDS <= ${MAXTIME:-300} )); do
  [[ -d /proc/$BPID ]] || break
  sleep 1
done
[[ -d /proc/$BPID ]] && kill -TERM $BPID
shutdown_abort
            [[ -d /proc/$BPID ]] is more efficient than ps -p $BPID, 
            which potentially executes 300 times inside the while loop. A test 
            on a Sun Ultra 5 (Solaris 8, 360 MHz CPU, 320M memory) shows running 
            the [[ -d /proc/$BPID ]] 300 times takes 0.06 seconds, while 
            running ps p $BPID 300 times takes 34 seconds. The bulk of 
            the time is consumed in fork() calls. The CPU cycles are better used 
            elsewhere.
             Additionally, [[ -d /proc/$BPID ]] && kill -TERM $BPID 
              is unlikely to produce an error "kill: $BPID: no such process" 
              than ps -p $BPID && kill -TERM $BPID.
              Problem 13: Discussion
              In theory, this solution might force a race condition when shutdown_immediate 
              exists, if its PID is reused by another process within a second. 
              In practice, the chance of this happening is nil; however, if it 
              is a concern, check the timestamp and inode number of /proc/$BPID.
              Conclusion
              Through the problems and solutions presented in this article, 
              we have shown that shell solutions are often more portable, using 
              only the shell, you don't need to worry about various tool 
              availability and behavior. Shell solutions are more efficient -- 
              especially when used within a loop -- and can provide greater 
              flexibility in solving complex problems when no existing tool fits 
              the purpose.
              We do not advocate replacing Unix tools with the shell, but encourage 
              you to use tools wisely.
              We often see tools overly used when more efficient shell solutions 
              exist. Although it is a path less traveled, the shell offers power 
              and advantages over Unix tools.
              Acknowledgements
              The comp.unix.shell code snippets from the following individuals 
              are used in our examples: Al Sharka and Dan Mercer (Example 2e), 
              Stephane Chazelas (Example 6c), Charles Demas (Example 8a), Chris 
              F.A. Johnson (Example 8c). We are responsible for any errors in 
              the interpretations and modifications.
              References
              Bolsky, Morris, David Korn. The New KornShell Command and Programming 
              Language, 1995. Upper Saddle River, NJ: Prentice Hall PTR.
              Dougherty, Dale, Arnold Robbins. Sed & Awk, March 1997. 
              Sebastopol, CA: O'Reilly & Associates, Inc.
              Wall, Larry, Tom Christiansen, and Jon Orwant. Programming 
              Perl, July 2000. Sebastopol, CA: O'Reilly & Associates, 
              Inc.
              Michael Wang earned Master Degrees in Physics (Peking) and 
              Statistics (Columbia). Currently, he is applying his knowledge to 
              Unix systems administration, database management, and programming. 
              He can be reached at: [email protected].
              Ed Schaefer is a frequent contributor to Sys Admin. 
              He is a Software Developer and DBA for Intel's Factory Integrated 
              Information Systems, FIIS, in Aloha, OR. Ed also edits the UnixReview.com 
              monthly Shell Corner column. He can be reached at: [email protected].
            |