Searching
in Unusual Ways and Places
Æleen Frisch
A few weeks ago, I was reading an article that cited some statistics
about how many times various actions were performed in the course
of a lifetime: how many hours a person sleeps, how many miles are
driven to work, how much food is consumed -- you get the idea.
I started to think about how many times I've done various things,
including how many times I'd run various UNIX commands. For
me, the top two most frequently used commands are ls and grep. In
the course of my career so far, I've run each of them more
than 100,000 times.
Clearly, grep is a command I can't live without. I constantly
use it on its own and in pipes with other commands. For example:
% ps -aux | egrep 'chavez|PID'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
chavez 14355 0.0 1.6 2556 1792 pts/2 S 10:23 0:00 -tcsh
chavez 18684 89.5 9.6 27680 5280 ? R N Sep25 85:26 /home/j03/l988
I use this command combination often enough with different usernames
that I've defined an alias for it.
There are times, however, when I want to perform grep-like search
operations but grep itself is cumbersome or impossible to use: finding
data within network traffic, looking for a software package, locating
a specific email message. In these contexts where grep can't
be applied easily, I have to turn to other tools (some are open
source, others are vendor provided). This article will look at some
of them.
Searching Network Packets
Searching network traffic for patterns in real time is a useful
technique for debugging a variety of network problems. It's
not easy to apply grep to this task. It is possible to run a packet-capturing
utility like tcpdump and then search the resulting output with grep,
but this can be awkward and ineffective. What you often want is
to examine the entire packet when some part of its data matches
a pattern. Unfortunately, using grep with a packet dump will return
only the lines containing the pattern. There are also times when
this approach is simply too slow.
Fortunately, there is an open source utility that does exactly
this job: ngrep. It is developed and maintained by Jordan Ritter,
and the project's home page is http://ngrep.sourceforge.net.
ngrep has the following general syntax:
ngrep [options] pattern [filter]
where pattern is a regular expression to search for in network
packets, and the optional filter is an expression indicating
the sorts of network packets to which to pay attention. This filter,
technically known as a Berkeley packet filter (BPF), consists of a
series of keywords specifying rules for selecting packets (BPF filters
are also used by packet-dumping utilities). These keywords specify
the source and/or destination host, network, protocol, and/or port.
See the manual page for information about constructing BPF expressions.
If the filter is omitted, then all packets are included.
Using a filter in combination with a search string usually makes
searching more efficient and ngrep's output more readable.
As an example, consider this ngrep command that looks for packets
containing output from the finger command among all network traffic:
$ ngrep "No Plan\."
The command searches for a recognizable string from the finger output.
If you run this on most systems and run a finger command within earshot
of that host, you'll see the relevant packet intermixed with
lots of number signs and dot series, and it will often be repeated
several times. A number sign is printed for each packet examined,
and dots indicate output interruptions. If you use the following command
instead, then you won't get the ugly output, and ngrep will do
less work:
# ngrep -q "No Plan\." port finger
Now, only packets to or from the finger port (79) are examined. The
-q option suppresses the number signs.
Here is a more complex ngrep command that locates some specific
FTP connection operations. It examines FTP-related packets sent
from host hamlet to host ophelia and searches their data for the
string USER:
# ngrep -q -t "USER" src host hamlet and dst host ophelia and tcp port 21
T 2002/09/24 14:11:15.413069 192.168.9.212:32813 -> 192.168.9.84:21 [AP]
USER chavez..
T 2002/09/24 14:32:07.776476 192.168.9.212:32814 -> 192.168.9.84:21 [AP]
USER amber..
For each packet, the output begins with a letter indicating the protocol
(TCP here), followed by a time stamp (requested with the -t
option), and then the source and destination host and port. The second
line displays the packet's data. A command like this is one way
of capturing all FTP connection attempts. As this example indicates,
complex filters can be created by joining clauses with "and".
You can also use the "or" and "not" logical operators
as well as parentheses for grouping (the latter will need to be escaped
to protect them from the shell).
ngrep can be very useful when testing and debugging network services.
For example, I found it very useful when I enabled the secure versions
of LDAP. Everything seemed to work fine after I was finished, but
I wanted to verify that the secure version of the protocol was being
used. If it was, then I would not be able to detect any clear text
passwords in LDAP traffic. An ngrep command like this one enables
me to test this functionality:
# ngrep 'badpassword' src host ldapserver and \( port 636 or port 389 \)
This command watches the ports corresponding to SSL and TLS-secured
LDAP. While this command was running, I ran three queries: one using
the normal ldap protocol, one using the ldaps protocol, and a third
using a GUI LDAP client in TLS mode. All three queries succeeded and
displayed the appropriate records. The ngrep command returned output
only for the first one, so it was clear that the password had been
encrypted in the latter two cases. ngrep was a better choice than
a general packet dumper for this job because it limited its scope
to exactly the packets in which I was interested.
This example illustrates that ngrep can be useful for not finding
things as well as finding them. In this case, the lack of output
was what I hoped for. Those of you who had qualms about the finger
example earlier can apply this principle as well: because running
a finger daemon is not a good idea in most environments, such a
ngrep command can function as a security trap. If finger traffic
appears on the network, ngrep will detect it and let you know there
is a problem.
ngrep has several other useful options:
-i -- Perform a case-insensitive search.
-A n -- Display the n packets following each matched
packet.
-d dev -- Use the specified network device.
-O file -- Save matching packets in file in addition
to displaying them.
-X -- Interpret the search pattern as hexadecimal.
ngrep is useful for a wide variety of tasks ranging from testing
network applications to monitoring network traffic. It is also quite
useful for debugging specific operations or programs on busy systems
because of its ability to extract very narrow ranges of packets
for examination.
Searching Mailboxes
At first thought, grep ought to be able to perform a task like
searching mailboxes for specific text. You can search mail files
for text, but using grep has at least two disadvantages. First,
you may want to retrieve the entire message(s) that matches the
pattern, and grep only returns matching lines by default. Second,
if any of the mailboxes contain lengthy MIME attachments, searching
with grep can produce voluminous output arising from an unlucky
false positive within the binary attachment.
A better tool for this job is grepmail, an open source utility
written by David Coppit (see http://grepmail.sourceforge.net
for more information). grepmail is designed specifically for searching
mail folders. Here is a simple example of its use:
% grepmail -R -i -l hilton ~/Mail
Mail/conf/acs_w01
I was looking for the phone number of a specific Hilton hotel, which
was in a mail message somewhere, but I couldn't remember where
I'd filed it. This command searches for the string "hilton"
(-I says to perform a case-insensitive search) in all mail
folders under the specified starting directory (-R means recursive),
and lists the names of files containing messages that match (-l
option). The advantage of this approach is that I can search for the
string I remember and find the telephone number even though the two
items may be lines apart in the actual message. This command yields
the phone number:
% grepmail -i hilton '!!' | grep -i telephone
Telephone: 619-231-4040
This grepmail command searches for the same string in the mail folder
returned by the previous command. This time, grepmail will return
the entire message as its output (since -l is omitted). The
result is then piped to grep to isolate the phone number.
Here is a somewhat more complicated command that uses grepmail
twice. Its goal is to find messages from user nadia that mention
something related to Naples, Italy:
% grepmail -R -h "^From: .*nadia" ~/Mail | grepmail -b -i "naples|napoli|neapolit"
The first command searches mail headers (-h) for "From"
lines including "nadia" somewhere in their text. The second
command searches only the body (-b) of the matching messages
for the specified strings.
grepmail has several other useful options:
-d date -- Limit search to messages on the specified
date or within the specified date range. The date format is very
flexible; see the manual page for details.
-v -- Display only non-matching messages.
-u -- Display only unique messages.
-M -- Don't search non-text MIME attachments.
-r -- Display a report listing each folder searched
and the total number of matching messages within it.
-m -- Add an X-Mailfolder header to displayed messages;
the header's text will be the path to the message's mail
folder.
-H -- Display only the headers of matching messages.
It is also very easy to forward a mail message located in this
manner. Here is a simple method:
% grepmail -m -u ... | mail -s subject someone@somewhere
Finally, some people prefer to view the search results from a mail
client. This is usually easy to accomplish via a simple script that
redirects grepmail's output to the mailer's default folder.
Several have been created for this purpose:
- pine: grepine by Cristin Pietsch -- http://www.dfki.de/~pietsch/software
- mutt: grepm by Moritz Barsnick -- http://www.barsnick.net/sw/grepm.html
- VM: vm-grepmail.el by Robert Fenk -- http://www.robf.de/Hacking/elisp/vm-grepmail.el
Search Operations for Software Packages
Software packages are another item whose contents are hard to
search with grep. More specifically, I often want to answer questions
like these:
- Is a specific package installed?
- What package does a specific file belong to?
- What packages are available on an individual CD (or other media)?
- What is included within a package (installed and not)?
On many systems, one or more of these questions can be answered
using the package management tools supplied with the operating system.
For example, the following commands can be used to list all currently
installed packages on various UNIX systems:
Linux: rpm -q -a
FreeBSD: pkg_info -a -I
Solaris: pkginfo
HP-UX: swlist
AIX: lslpp -l all
You can pipe any of these commands to grep to determine whether
a specific package is present to find its actual package name. For
example, the following command lists all packages related to LDAP
installed on a Linux system:
% rpm -q -a | grep -i ldap
nss_ldap-184-1
openldap-2.0.23-4
openldap-clients-2.0.23-4
openldap-servers-2.0.23-4
This system has the OpenLDAP servers and client utilities installed,
as well as the modules that interface LDAP to PAM and to the name
service switch file, /etc/nsswitch.
It's often useful to find out which package a particular
file is part of (e.g., when you delete it accidentally and need
to restore it). These command forms will indicate which package
installed the specified file:
Linux: rpm -q ---whatprovides path
Solaris: pkgchk -l -p path
AIX: lslpp -w path
Here is an example from a Solaris system:
% pkgchk -l -p /etc/init.d/sendmail
Pathname: /etc/init.d/sendmail
Type: editted file
Expected mode: 0744
Expected owner: root
Expected group: sys
Referenced by the following packages: SUNWsndmr
Current status: installed
When you want to know what is contained in an installed package, use
these commands:
Linux: rpm -q -l name
FreeBSD: pkg_info -L name
Solaris: pkgchk -l name | grep "^Pathname:"
HP-UX: swlist -l file
AIX: lslpp -f name
Here is an example from a FreeBSD system:
% pkg_info -L grub-0.91_1
Information for grub-0.91_1:
Files:
/usr/local/bin/mbchk
/usr/local/info/grub.info
/usr/local/info/multiboot.info
/usr/local/sbin/grub
...
In general, if you want to list the contents of an uninstalled package,
you can replace the package name with the path to the package file
in the preceding commands. On Linux systems, however, you must precede
the package name with the -p option.
Only HP-UX and AIX have easy-to-use commands for listing the packages
available on CDs or other media:
HP-UX: swlist -s path-or-device
AIX: installp -l -d device
On Linux, FreeBSD, and Solaris systems, you must rely on GUI package
management tools to handle this function. On Linux systems, you
can use gnorpm and similar packages (as well as yast2 on SuSE Linux
systems). Under FreeBSD systems, you can use the sysinstall utility
and select the Configure=>Packages menu path. On Solaris systems,
the Supplementary Software CD includes a GUI installation tool that
starts automatically when the CD is inserted, and it can be used
to view the contents of the CD as well. On all three systems, you
can also examine the directory containing the package files with
ls for a quick listing of what is available.
Searching Net-SNMP MIBs
The Simple Network Management Protocol (SNMP) can be used to monitor
and reconfigure a wide variety of computer systems and other network
devices. The items that can be queried or set are defined in Management
Information Bases (MIBs). A MIB is a collection of value and property
definitions, and the various items are organized as a tree structure.
This hierarchical organizational scheme serves to group related
data together. MIB definitions are stored in files and are implemented
in the software on the actual computers and devices. The MIB does
not hold any data -- it is a schema, not a database.
Here is an example MIB item:
iso.org.dod.internet.mgmt.mib-2.system.sysLocation = "Machine Room"
The long string on the left is the setting's name, and its value
is the string to the right of the equals sign. The name is separated
into components by periods, and each corresponds to successive levels
of the MIB tree. Thus, we can see that the sysLocation node is eight
levels from the top of the tree.
Although the MIB is organized as a tree, it is not uniformly populated.
The top four levels of the standardized MIB tree exist mainly for
historical reasons. Given this rather ad hoc structure, searching
the MIB tree for specific items is often essential. However, it
is not a job for grep.
Most SNMP implementations provide utilities for examining MIBs.
The open source SNMP implementation Net-SNMP is used on Linux and
FreeBSD systems (and other UNIX systems, if desired). The tool the
package provides to examine the MIB structure is snmptranslate.
This command provides information about the MIB structure and its
items. For example, you can use it to display a MIB subtree, as
in this example:
% snmptranslate -Tp .iso.org.dod.internet.mgmt.mib-2.system
+--system(1)
|
+-- -R-- String sysDescr(1)
| Textual Convention: DisplayString
| Size: 0..255
+-- -R-- ObjID sysObjectID(2)
+-- -R-- TimeTicks sysUpTime(3)
+-- -RW- String sysContact(4)
| Textual Convention: DisplayString
| Size: 0..255
...
I've truncated the output after four entries.
snmptranslate can also provide detailed information about a specific
MIB item, as in this example using the sysLocation leaf:
% snmptranslate -Td .iso.org.dod.internet.mgmt.mib-2. system.sysLocation
1.3.6.1.2.1.1.6
sysLocation OBJECT-TYPE
-- FROM SNMPv2-MIB, RFC1213-MIB
-- TEXTUAL CONVENTION DisplayString
SYNTAX OCTET STRING (0..255)
DISPLAY-HINT "255a"
MAX-ACCESS read-write
STATUS current
DESCRIPTION "The physical location of this node (e.g., 'telephone closet, 3rd
floor'). If the location is unknown, the value is the zero-length string."
::= { iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) system(1) 6 }
However, the most important searching feature -- finding the location
within the tree of a specific leaf -- is not provided automatically
by snmptranslate. This command will provide that information for the
memTotalReal item:
% snmptranslate -Ts | grep memTotalReal\$
.iso.org.dod.internet.private.enterprises. ucdavis.memory.memTotalReal
This item, the total real memory present on a system, is located at
the specified point within the hierarchy. A slightly more complex
command can provide both the full location and a description for a
MIB leaf:
% snmptranslate -Td 'snmptranslate -Ts | grep memTotalReal\$'
I use it often enough that I've defined an alias for this command:
% alias snmpwhat 'snmptranslate -Td `snmptranslate -Ts | grep \!:1\$`'
Unusual Pattern Matching Requirements
I'll conclude this article with a quick look at two searching/pattern
matching topics that can be a bit tricky.
Filtering Foreign Language Email
Like many people, I use procmail to preprocess mail messages,
including attempting to remove spam. My current recipes work reasonably
well for mail messages in Western languages, but they fail for ones
in many other languages (e.g., Japanese, Chinese, Russian). Currently,
I get 15-20 such spam messages each day.
Some people deal with this situation by discarding all email from
the corresponding countries, but this approach does not work for
me as I get legitimate mail from these countries on a regular basis
(from non-predictable senders). What I needed was a procmail recipe
to identify the foreign characters, which are above the normal ASCII
range. The trick here is to get all of these characters into the
.procmailrc file. This is easiest to do by entering them on a system/application
that supports two-byte characters. The next step is to copy that
file in binary mode to the system where procmail is run where its
contents can be pasted into the initialization file.
A quick and dirty procmail recipe will look something like this
when viewed with most text editors:
:0BH:
* [\200\201\202...\377][\200\201\202...\377][\200\201\202...\377]
$MAILDIR/foreign_spam
For me, three such characters in a row was a good enough first attempt
at solving this problem. There are many more elegant solutions available
on the Web. One of the best is by Walter Dnes, and it is available
at:
http://www.waltdnes.org/email/chinese/index.html
It takes advantage of procmail's weighting capabilities to detect
messages containing more than 5% non-ASCII characters.
Less Well-Known Regular Expression Constructs
Most people are familiar with the asterisk, plus sign, and question
mark modifiers to regular expression items (match zero or more,
one or more, or exactly one of the item, respectively). However,
you can specify how many of each item should be matched even more
precisely using some extended regular expression constructs (use
egrep or grep -E):
Form Meaning
{n} Match exactly n of the preceding item.
{n,} Match n or more of the preceding item.
{n,m} Match at least n and no more than m of the preceding item.
Here are some simple examples:
% grep -E "t{2}" bio
She has written eight books, including Essential Cultural Studies
from Pitt. When she's not writing
% grep -E "[0-9]{3,}" bio
network of Unix and Windows NT/2000/XP systems. She
% grep -E "(the ){2,}|(and ){2,}" bio
and and creating murder mystery games. She
you'd like to receive the the free newsletter
The first command searches for double t's; the second command
looks for numbers of three or more digits; and the third command searches
for two consecutive instances of the words "the" and "and"
(it's a primitive copy editor). You might be tempted to formulate
the final item as:
(the |and ){2,}
However, this won't work, as it will match "and the,"
which is not generally an error.
Finally, be aware that the constuct {,m}, which might mean "match
m or fewer of the preceding item," is not defined.
Æleen Frisch is a systems administrator currently looking
after a pathologically heterogeneous collection of computers. She
is also the author of Essential System Administration, just
released in an expanded third edition, the new System Administration
Pocket Reference, as well as several other books. She can be
reached by email at: aefrisch@lorentzian.com.
|