An older Option in C for find
Larry Reznick
Why a C Program?
UNIX comes with so many utilities that a lot of work
can be done by
shell scripts that use those utilities -- and shell
scripts can
usually be put together faster than C programs. Sometimes,
however,
when the utility that does exactly what you want simply
does not exist,
you just have to write a C program.
A client of mine is putting together a system that will
receive files
from customers transmitted by uucp. These files will
come in daily
from all over the country and will contain various transaction
records
that need further processing locally. At least once
a day, we want
cron to wake up on my client's machine and move the
files out
of the uucppublic directory into a directory where the
additional
processing can be done. However, new files can come
in literally at
any time, including the time that the cron job wants
to move
the files away. The file currently being transmitted
by uucp must
not be moved away while it is still being uploaded,
but all prior
files will qualify.
My first thought was to use the find utility, knowing
that
it has a bunch of interesting options for qualifying
files and then
emitting the names of those that qualify. None of the
timestamp comparisons
(such as -atime) works with arguments of minutes, only
days.
find has a -newer option that allows a specific file's
timestamp to be the base time, so that all files newer
than that time
will qualify. What we really needed was an -older option,
since
we didn't want to take the file being uploaded currently,
but did
want all files prior to that one. Using ! -newer might
do the
trick if I could touch a file with the appropriate timestamp.
However, find has no -older option that would work with
a specific amount of time in minutes, so I decided to
write one.
I wanted this program to act as if I had given a find
command
that had this unsupported syntax:
find dirname -older
num_minutes -print
where dirname was the name of the directory containing
files I cared about, and num_minutes was a number, not
a filename.
If any file in that directory was older than the specified
minutes,
the pathname for that file would be printed out. Since
most of the
customer files would take only a couple of minutes maximum
to transmit
at 2,400 bps, and a few might take as long as 5 to 10
minutes, files
older than 15 minutes would qualify. Anything more recent
than that
would be picked up next time around.
How older.c Works
older.c, shown in Listing 1, meets these requirements.
To make
it more generally useful, the program accepts a number
of minutes
and a set of directories on the command line. While
this particular
application would work with a 15-minute timeframe and
only with a
specific directory, I was sure we would find other uses
that would
have different requirements. If no time is given, the
program defaults
to 15 minutes, and if no directories are named, it defaults
to the
current directory.
Looking at Listing 1, start with the FEBDAYS() macro.
This
implements the rule that it is leap year in every year
divisible by
4 except the century years, which must be divisible
by 400. The mod
(%) operator gives the remainder after a division, so
if the result
of a mod operation is zero, the value is evenly divisible
by
the divisor. C implements short-circuiting for the Boolean
logic operators
and (&&) and or (||). This means that if the
total truth
can be determined by the first part of the operation,
the second part
does not need to be evaluated. If the first part of
an && operation
is FALSE, the whole thing is FALSE. If the first part
of an || operation is TRUE, the whole thing is TRUE.
The opposite requires that the second part be evaluated
to be certain.
So, if the year is 2000, which is evenly divisible by
400, this routine
will set February to 29 days without checking further.
A year that
is not a century year is recognized as a leap year if
it is evenly
divisible by 4. Instead of using the % operator for
the division
by 4, whenever the divisor is a power of 2, the binary
AND
operator (&) can be used with a value one less than
the divisor (3)
to get the remainder. The quotient, if needed, can be
delivered by
using a shift to the right (>> ) instead. The
shift
and binary operators are faster than the mod and division
operators.
The variable progname is made global for error handling.
If
the program reports an error, the program's name will
be part of that
error message. Since any of the functions might deliver
an error message,
but only main() can know the program name from the command
line, using a global variable to hold the name eliminates
the need
to pass it around to all the functions as a parameter.
So, the first
thing that main() does is grab argv[0], the program's
name, and put it into that variable. No matter how many
other arguments
will be given on the command line, even the wrong number,
argv[0]
will be present.
The next step in main() is to check the command line
argument
list. The total number of arguments on the command line
is in argc,
the arguments themselves being in argv[]. For this program,
no arguments need be given, but there are no hyphenated
options. So,
if a hyphen is the first character of the first option,
the program
takes it as a request for help and outputs a Usage message.
If no arguments other than the program's name are given,
argc
will be 1 and the default time of 15 minutes will be
taken off the
current system time. Otherwise, the command line has
the number of
minutes to be taken off. The program will presume that
only an int
capacity will be needed. (On most UNIX systems, an int
is the
same size as long, so this can be quite a number of
minutes.
If a long versus short is strictly required, one of
them should be used, not int. While the int is theoretically
the most efficient type for the system [this is a religious
issue
and different compiler writers will disagree for the
same system],
it is also the least portable type since the ANSI C
Standard allows
it to be the same size as short or long, or somewhere
in between.) If an int is 16 bits, the maximum value
of a signed
int is 32,767 minutes, which amounts to over 22 days.
If it
is 32 bits, the maximum signed value is 2,147,483,647
minutes, which
amounts to almost 4,083 years! Either way is sufficient
for this program's
needs.
If a minutes argument is given on the command line,
directory names
may also be given (directory names may not be given
without a minutes
argument). If no directories are named, the current
directory is used.
If directories are named, a loop runs through them one
at a time.
If an error occurs, the program quits the loop.
The reduce_time() function takes the passed number of
minutes
off the current time. While the ANSI C Standard gave
us a lot of flexibility
in working with calendar and clock times, it did not
provide date
arithmetic functions. The closest it came to that was
the difftime()
function, which takes two time_t values and subtracts
them,
giving the difference as a type double.
It is extremely important to avoid the trap of assuming
that the time_t
values are arithmetic types. While this may be the case
for many
compilers, some might use a structure instead. A specific
calendar
date and time can be converted into a time_t by the
mktime()
function, but a specific amount of time cannot be added
or subtracted
from a time_t value, since there is no guarantee that
the time_t
is a number of seconds. Moreover, even where compilers
do deliver
a time_t as a number of seconds elapsed from an epoch,
you
cannot assume that all will use the same epoch. difftime()
allows you to handle these differences.
While it would be easy to multiply minutes by 60 and
take the resulting
seconds off the current time represented as a time_t
value
to get the starting time for the timestamp comparisons,
to do so would
risk making the code nonportable. The only truly portable
solution
is to go through the struct tm data type and muck around
with
the various parts of the calendar and clock.
Therefore, I take the current time() and plug that into
the
localtime() function, which translates the time_t value
into calendar and clock information for the local timezone.
The minutes
and hours can be adjusted easily, and the days will
be whatever is
left over if enough minutes were given. I take the total
minutes (0
to 59) and subtract those from the current time's minutes.
A negative
result means that the time crossed backward into the
previous hour,
so I add the hour back into the minutes and subtract
one from the
hour. I do the same thing with the hours, except this
time a negative
result means a cross back into the previous day.
These steps may seem to be a lot of trouble, but if
the current time
is just a few minutes after midnight, the subtraction
will have to
deal with a day on the calendar. The real problem is
in the lack of
standard functions for doing date arithmetic. Maybe
the ANSI committee
will do something about this the next time around. While
accounting
requirements of, say, 30-, 60-, and 90-day aging or
more are usually
met by adding 1, 2, 3, or more to the month number rather
than using
a strict number of days, other applications might need
to be more
precise. It would help tremendously if mktime() would
take
unusual numbers in its struct tm argument. Then, if
it were
given more seconds, minutes, hours, days, or months
than is reasonable
-- or even a negative value -- it could convert the
number to
the correct calendar amount and hand back the adjusted
time_t
value with leap years, timezones, and so forth accounted
for.
Once the problem has been reduced to a specific number
of days by
which the calendar should be adjusted, a loop is needed
to work within
the days of each month. Leap day fluctuation is accounted
for by adjusting
February's days (day[1]). If the number of days to be
removed
from the date is greater than the day of the month,
I reduce the day
of the month by that number of days; since this brings
the calendar
date back to the previous month, I reduce the month
number. If reducing
the month number requires it, I reduce the year also
and recalculate
February's days. Regardless, I take the number of days
in this new
month and repeat the operation until the number of days
to be taken
off becomes less than the value of the day of the month.
At this point,
I take that number of days off, and the correct date
of the adjusted
month (in the adjusted year if needed) is delivered.
By building this
directly into the struct tm, the result can be passed
directly
to mktime(), which returns the resulting time_t value
from
the reduce_time() function.
Two interesting side-effects result from this. First,
since subtracting
a negative number is equivalent to adding, a negative
number of minutes
will add minutes to the current time. While not useful
for this particular
application, since it works with file timestamps, this
capability
could be handy for other programs using this function.
Second, since
mktime() takes the timezone and Daylight Savings Time
into
consideration, the result will be plus or minus an hour
depending
on whether the new time has crossed over one of the
DST boundary dates.
This program is calculating an absolute time in minutes
without regard
to adjustment of clocks made at DST boundary dates,
so the hour lost
or subsequently recovered will show up in the difftime()
between
the starting time and the new reduced time. Nevertheless,
the result
is a correct absolute number of minutes prior to the
current time.
One remaining issue about the reduce_time() function
would
be to eliminate its association with the current time.
Instead of
calculating the now variable from the time() function,
you could pass it into reduce_time() as a parameter,
also named
now. With that, a specific number of minutes can be
removed
from (or added to by using a negative number of minutes)
any time.
Finding the Target Files
The show_files() function takes a directory name and
a starting
time. It comes up with every filename in the specified
directory and
checks each file's timestamp against the starting time.
Reading the
filenames from a directory is no more complicated than
reading records
from a sequential data file. The directory is opened
with the opendir()
function, the filenames are delivered with the readdir()
function
in a structure, and the directory is closed with the
closedir()
function.
The function takes the given directory name, opens the
directory,
and copies the name into the pathname[] variable. A
trailing
slash is concatenated to it and the null terminator
is replaced to
make it a regular string again. Notice that strlen()
is used
to figure the subscript of the terminator. That information
gets translated
into a direct placing of the / character on top of the
terminator
without having to use strcat(), which would make yet
another
pass through the string to find that terminator. (The
dirlen variable
needs to be increased to represent the adding of that
/ character.)
Since the length of the pathname part containing the
directory's name
is known, every filename within that directory can be
appended to
the same path, once the name is discovered. All you
have to do is
keep track of where the pathname part ends -- and that
is what
dirlen is for.
pathname[] variable is set to 256 characters, allowing
the
full pathname to be no more than 255 characters. Since
BSD and SVR4
allow 255-character filenames, the path added to that
would exceed
this buffer, so this is not a particularly safe strategy.
Still, this
method should work with most pathnames, and serves to
keep the example
simple. A more robust solution would allocate the buffer
from the
heap and allow it to grow on demand. You might want
to challenge yourself
to rewrite it that way.
The readdir() loop checks the file's name to see if
the first
character is a dot (.). The readdir() function delivers
every
name, including the directory names . and .., and the
hidden files beginning with a dot. The program should
ignore such
files, so, if the name does not begin with a period,
the program appends
it to the pathname[] variable and passes the result
to the
stat() function. This handy function reads the inode
information
for that file, delivering all sorts of useful facts
about the file,
including the time of the last modification (st_mtime).
That
time is a time_t type, so it can be plugged directly
into the
difftime() function.
The difftime() function delivers the difference between
two
time_t values in seconds, represented as a double type.
Treating time as an increasing value, regardless of
the form of that
value, difftime() subtracts the second argument from
the first.
If the result is greater than zero, the file's modification
timestamp
must be older than the start_time, and so the file's
name is
printed.
Conclusion
The older program emits full pathnames as the -print
option might do in the find program. Since we have started
using it in shell scripts, we have found additional
uses for it. I
hope you'll find it equally handy.
About the Author
Larry Reznick has been programming professionally since
1978.
He is currently working on systems programming in UNIX
and DOS. He
teaches C language courses at American River College
in Sacramento
and is the owner of Rezolution Technical Books. He can
be reached
via email at: rezbook!reznick@csusac.ecs.csus.edu.
|