A Community-Style Overnight Job Spooler
Leor Zolman
For a small business running a single multi-user UNIX
system, processes
typically fall into one of two categories: real-time,
interactive
programs or batch style/background jobs. Interactive
programs such
as the system shell, editors/word processors, spreadsheets,
and data
entry systems all vie concurrently for slices of the
CPU pie. Such
programs spend most of their real time blocked waiting
for user input,
so they tend not to have much impact on system performance
(as long
as there is enough main memory to keep the jobs from
getting swapped
out to disk).
Batch-style jobs such as reports, backup scripts, or
any CPU- or disk-intensive
processes, on the other hand, have a relatively large
impact on system
performance. Such jobs demand as much of the available
CPU resources
as they can possibly get. It doesn't take many such
CPU- or disk-intensive
background jobs running simultaneously to slow user
terminal response
time down to a crawl.
In many cases, those "expensive" batch-style
jobs would be
less of a pain-in-the-CPU if they could be scheduled
so as not to
compete head-to-head with the interactive processes
for system resources.
In a business environment, the natural solution would
be to run those
jobs overnight whenever feasible, and reserve the business
hours for
interactive processes and high-priority batch jobs only.
Users should routinely be given the option of running
batch-style
jobs overnight. For jobs that must run immediately,
background execution
should also be an option. However, with a little bit
of encouragement
(and after enough instances of the molasses-syndrome
due to an overloaded
system), users will understand that overnight queueing
works out best
for everyone.
Pros and Crons
The basic UNIX System V configuration includes only
a rudimentary
set of job scheduling tools. The primary facilities,
cron and
at, allow the scheduling of jobs for execution at particular
dates and times, but have no provision for prioritizing
or sequencing
those jobs in order to maximize system performance.
Three users might,
unbeknownst to each other, all schedulejobs for execution
at 8:30
P.M. cron will dutifully start them all up at 8:30 P.M.,
resulting in some serious context-switching overhead
while the jobs
vie for system resources.
Now consider the case of automated daily backups. You
could just set
up the cron table to run the backup software every morning
at 5:00 A.M., but what happens if those three long batch
jobs
are still slugging it out at 5:00 A.M., and changing
critical
data in the process? cron doesn't care, it just runs
the backup;
if the backup utility cannot properly coordinate file-locking
issues
in the course of a backup, the result may be lost data.
A better solution would be for all jobs scheduled for
overnight processing
to be registered with a single overseeing system, and
for that system
to be responsible for running the jobs in an orderly,
non-interfering
manner. The simplest way to implement this "ordering"
is to
ensure that all jobs are scheduled sequentially, such
that each job
is run to completion with as little competition from
other jobs as
possible -- especially other resource-intensive jobs.
With the addition of a prioritizing scheme, critical
job-sequencing
issues can also be properly managed. Then, for example,
the daily
backup script can be configured at the lowest possible
priority, so
that it runs only after all other jobs have been completed.
In this article, I describe a set of Bourne Shell scripts
that
work together to provide a sequential overnight job-spooling
facility.
The package is geared towards a "community-style"
computing
environment -- that is, an environment that allows any
user to
invoke a particular overnight job and that prints out
or places the
output resulting from the job in a public destination
area on-line,
so that any other user may choose to view or print out
the results,
as required by the specific application.
Any stdout/stderr output not explicitly directed into
an output
file by an overnight job will be captured into a default
location,
generally accessible only by a system administrator.
This feature
may be used as a simple status- and error-logging mechanism.
Directory Structure
The onitesetup.sh script (Listing 1) may be used to
set up
the directory structure and appropriate permission settings
for the
basic onite system. I've chosen a master directory location
of /usr/spool/onite for the example implementation;
another
location may be more appropriate for your site. In those
scripts where
applicable, the SPOOLDIR configuration variable identifies
the master onite directory.
Several subdirectories exist immediately beneath the
master directory.
The subdirectory jobs itself contains another tier of
subdirectories
corresponding to the various job priority levels. The
system may be
configured for any number of priority levels; when there
are n
levels of priority, the subdirectories are named P1
through
Pn.
In scripts where applicable, the NPRIORITIES variable
defines
the number of priority levels implemented.
The subdirectory stdout receives the intermixed, non-directed
("bit bucket") output of both the stdout and
stderr
streams for the last NTOLEAVE jobs that have been run
through
the spooler. The value of NTOLEAVE is configured in
the master
driver script, onitego.sh.
The subdirectory jobsdone receives the "used"
job scripts
for the last NTOLEAVE completed jobs. The contents of
this
directory, along with the contents of stdout, as previously
noted, exist primarily to support post-mortem analysis
by the system
administrator.
The onitego.sh script emits a log of all overnight spooler
activity on its standard output and error streams. I've
arbitrarily
configured the log file to record this output in /usr/spool/onite/onite.log.
The log file is created with the proper permissions
by the installation
script, setuponite.sh, but no other scripts explicitly
write
to the log file. With the following line in the "root"
cron
table,
0 20,23 * * *
/usr/local/onitego.sh
>>/usr/spool/onite/onite.log 2&1
the output of the master driver script is appended onto
the end of the log file every time the master driver
script executes.
A brief description of each individual script and auxiliary
tool in
the onite package follows.
The Configuration Script
onitesetup.sh (Listing 1) initializes the directory
structure
for your custom implementation of the onite system.
Configure
lines 15-18 for your system; line 14, the debug flag,
may be used
to create a "dummy" hierarchy in the current
directory for
testing purposes. To test the onite system using this
dummy
directory, copy all the scripts into your testing directory
and change
the initialization of debug to Y in all scripts where
debug appears. This is especially useful once the system
has
been officially installed and you wish to test some
new modifications
without corrupting the currently active code and job
queue directories.
The Master Driver Script
onitego.sh (Listing 2) invoked from the cron table,
as shown above, "wakes up" to execute all
spooled overnight
job scripts in sequence. It scans all the $SPOOLDIR/jobs/P*
directories in order, beginning with P1, looking for
job files
and submits each job file encountered to the shell for
processing.
The standard output and standard error from each job
is written to
a file in the $SPOOLDIR/stdout directory with the same
name
as the job file. All program output from the job script
should take
the form of explicit output files or physical output.
Any output emitted
through the stdout and stderr streams should be considered
for the system administrator's eyes only.
After the job has finished executing, the job file itself
is moved
to the $SPOOLDIR/jobsdone directory.
The standard output of the onitego.sh script provides
a running
log of job activity. If no jobs at all were queued for
overnight processing,
then a message to that effect is emitted. Otherwise,
the script
creates a lock file that exists for the duration of
all job processing,
and, for each job, writes a message announcing the name
of that job
and the time it begins its run.
When all jobs have been processed, the fleave.sh utility
script
is called to delete all files in the jobsdone and stdout
directories except for those corresponding to the most
recent $NTOLEAVE
jobs. This keeps those directories from filling up with
too much junk.
There are some basic limitations to the design of the
onite
system. The primary hazard is the case where a user
is permitted to
queue a job after the driver script has already begun
execution for
the evening. If the job is queued at a priority level
equal to or
greater than the priority level currently being processed,
then the
job may not be run until the next night. I've partially
addressed
this issue by scheduling the driver script for two runs
per night,
so that a job missed during the "first round"
is picked up
for execution in the "second round." This
approach, however,
assumes that all jobs from the first round are completed
before the
scheduled time for the second round comes up; if the
earlier instance
of the driver script is still running when the later
instance "wakes
up," the later instance will see the lock file,
immediately abort,
and go back to sleep. Also, a high-priority job that
ends up running
in the second round will effectively have been bumped
down to the
lowest possible priority, since all jobs from the first
round will
by then have already completed. In other words, if the
priorities
are really critical, then don't schedule the master
driver script
for more than one run per night.
The best way to prevent these kinds of conflicts is
to make sure no
jobs are queued past the time when the first instance
of onitego.sh
wakes up (see the discussion of spoolonite.sh below
for some
built-in protective measures).
"Run Driver NOW" Script
From time to time, you might discover that onitego.sh
has not
executed as normally scheduled. For instance, someone
may have inadvertently
broken the root cron table entry while doing administrative
maintenance, or perhaps the system had experienced a
crash before
spooler startup time and hadn't been brought back up
until after the
startup time, so cron never had a chance to start the
process.
onitenow.sh (Listing 3) is designed for one-shot invocation
by the system administrator in just such an event. The
script simply
starts up the master driver immediately as a background
task immune
to hang-up, and sends the output into the appropriate
log file.
The Job Queuing Script
The last of the major scripts in this package, spoolonite.sh
(Listing 4), schedules an overnight job for execution.
spoolonite.sh
is typically run from within a shell script, accepting
the text
of the job to be spooled on its standard input stream.
There is only
one mandatory command line parameter, the job name,
and one optional
parameter, the job priority level. If no priority level
is specified,
then the job is assigned a priority of $DEFAULT_PRIORITY
as
defined in the script.
The two variables USE_CUTOFF and CUTOFF_TIME may be
configured to reject job submissions past a particular
time of day.
If USE_CUTOFF is Y, then any attempt to queue a job
after the clock time specified by CUTOFF_TIME will be
rejected
(lines 40-47).
The variable CHECK_LOCK may be configured to reject
job submissions
once the nightly queue has begun executing; this, in
conjunction with
the USE_CUTOFF mechanism, effectively eliminates the
possibility
of "orphaned" jobs in the queue after the
master driver script
has completed its run (lines 49-57).
Since the contents of the stdout and jobsone directories
are not broken down by priority level, only one instance
of any specific
job name is allowed per night (lines 55-65). It is left
up to the
system administrator, using the tools provided in this
package (such
as oname.sh), to construct unique names for all job
scripts.
Environmental Issues
Since the master driver script is invoked from root's
cron
table, all jobs are actually run under the root's user-ID
and environment,
not under the user-ID and environment of the invoking
user. Thus,
spoolonite.sh must see to it that the original user's
environment
is replicated as faithfully as possible at the time
his/her overnight
job script is run.
Line 79 begins to construct the job file by dumping
the entire contents
of the user's environment settings into it. Line 78
prevents a nasty
problem in the case where the user's PS1 (primary prompt
string) variable
was exported and happens to contain a multiline string.
If PS1 were
not redefined in this case to isolate the embedded newline
within
a set of quote marks, then the shell would become confused
by the
multiline string when the time came to interpret the
job script. If
there are any other variables in your user's environments
that could
conceivably be set to multiline string values and then
exported, those
variables must be redefined in a similar manner before
line 79 executes.
If any programs invoked from a user's job script need
access to any
variables in the user's environment, then those environment
variables
must be exported by the job script. The design of this
package assumes
that "unsophisticated" users will not be creating
their own
custom environment variables and spooling jobs for overnight
execution
that depend on those variables. Sophisticated users
can include the
commands to define and export such variables, if necessary,
on their
own when preparing their scripts.
When the list of common critical environment variables
is known, however,
then that list may be specified as the value of toexport
(line
29). For our installation, this list includes the PATH,
two variables
relating to database configuration, and two that affect
printer output
routing. I know these variables are defined in every
user's startup
profile, because I maintain those profiles.
In line 83, spoolonite.sh generates a cd statement that
sets the current directory for job execution to the
user's actual
current directory. Finally, the explicit job script
text is copied
from the standard input onto the end of the job file.
Displaying the List
showonite.sh (Listing 5) summarizes all jobs queued
for overnight
processing, showing the job name, name of the invoking
user, and priority
level. The contents of each priority directory are displayed
by piping
the output of the l command to awk for formatting.
Cancelling a Job
A user may change his/her mind about an overnight job,
and need to
cancel it. killonite.sh (Listing 6) performs that duty.
It
may be configured to restrict users to killing only
their own jobs,
or to allow users to kill anyone's queued jobs, depending
upon the
value of the OwnOnly variable (line 9).
This script uses the utility script lpick.sh, described
below,
to let the user pick a job "by number".
Looking for a Particular Job
It may not make any sense for certain kinds of jobs
-- for example,
a process that checks a mailing list for illegal addresses
before
a monthly mailing -- to be run more than once per night.
If someone
requests such a job for the second time in a single
day, it can only
be because they didn't realize someone else had already
scheduled
it. isonite.sh (Listing 7) helps the system administrator
detect
such duplications. Given a job name as the command-line
parameter,
it returns a true status if a job by that name has already
been scheduled.
Generating a Unique Name
When it makes sense for a certain type of job to be
scheduled for
multiple runs in one evening, each instance of that
job must still
be given a unique job name. The oname.sh script (Listing 8)
is a simple inline tool for generation of unique file
names; it uses
the tmpname.c program described below to generate a
file name
in the system /tmp directory, then chops off the /tmp/
prefix to return just the base file name on the standard
output.
For example, to generate a unique job name for an instance
of a report
identified as ren, I might use:
jobname=`oname.sh ren`
General Utility Programs and Scripts
All the scripts described above were written specifically
for the
Overnight Spooler system. The short scripts and C programs
described
in this section are general-purpose tools used by many
of our shell
scripts, including the onite system.
checknum.c (Listing 9)
This C program examines its first command-line parameter,
converts
the leading portion of it into a number value, and returns
that ASCII
number alone on the standard output. If the parameter
contains no
leading numeric component, the string ERROR is returned
instead
and the script terminates with an error status of 1.
checknum
is used by spoolonite.sh and onitego.sh.
tmpname.c (Listing 10)
tmpname.c simply extends the functionality of the tempnam()
C library function to create a tool available for use
directly in
a shell script. For example, the following command creates
a unique
file name in /tmp that begins with the characters "abc":
filename=`tmpname abc`
pick.sh (Listing 11)
Given a text file containing a list of items to select
from and a
generic description of the flavor of item being chosen,
this script
describes, sequentially numbers, and displays the list,
then waits
for the user to select one of the items according to
the displayed
sequence numbers. The user may either enter a sequence
number to make
a selection, or press the return key alone to indicate
"none."
If the user makes a selection, lpick.sh returns the
text of
the selected item on the standard output; else, the
text ABORT
is returned. killonite.sh uses lpick.sh for prompting
the user to select a job to cancel.
fleave.sh (Listing 12)
onitego.sh calls this utility script to clean out old
files
in the jobsdone and stdout subdirectories.
ask.sh (Listing 13)
This little script prompts the user with a given text
string, insists
upon a y/n response, and returns Y or N accordingly
on the standard output.
A Report Queuing Example
Listing 14 shows an example script that spools a user-requested
report
program as an overnight job. This script, invoked from
a menu system
in our case, prompts the user for a publication code
(using the getmag
shell tool) and proceeds to set up a job that runs a
set of mailing
address consistency checks for the specified publication.
Some other
internal shell tools, such as magname and nissue, appear
in the script, but their use is related to the specific
application
and not to the spooler system in general.
The job text is first written to a temporary file, then
the temporary
file is fed to spoolonite.sh in line 49. After return
from
spoolonite.sh, the temporary file is deleted.
A Periodic Job Spooling Example
Earlier I mentioned the problem of backup scheduling
conflicts. By
spooling the backup routine as the lowest-priority overnight
job,
all potential concurrency issues can be avoided, and
it is guaranteed
that the backup program doesn't run until after all
other processes
have completed their tasks.
Say you have a backup driver script named dump.sh that
performs
the physical backup operations, and you're currently
calling it directly
from the cron table at some fixed hour of the night.
To convert
this task into a spooled overnight job, create a special
driver to
spool the dump.sh script as an overnight job. Such a
driver,
named spooldumps.sh, is shown in Listing 15.
Then, in your cron table, simply change the line that
used
to call dump.sh to call spooldumps.sh instead, some
time before the nightly onitego.sh run is scheduled
to begin.
For example, here is the root cron table entry from
our system:
30 18 * * 1-5 /usr/local/spooldumps.sh
This causes the spooldumps.sh script to execute
every evening at 6:30 P.M. (our onitego.sh is scheduled
to start up at 8:00 P.M.). spooldumps.sh schedules the
dump.sh process (which resides in the /u3/Backup directory)
at priority 7, the lowest priority. Thus, the dump.sh
script is the last program to execute every night.
About the Author
Leor Zolman wrote BDS C, the first C compiler targeted
exclusively
for personal computers. He is currently a system administrator
and
software developer for R&D Publications, Inc., and
columnist for both
The C Users Journal and Windows/DOS Developer's Journal.
Leor's first book, Illustrated C, has just been published
by
R&D. He may be reached in care of R&D Publications,
Inc., or via net
E-mail as leor@rdpub.com ("...!uunet!bdsoft!rdpub!leor").
|