Writing
an SNMP Agent
Damir Delija
A few years ago, during the summer
of 2000, we faced serious overheating problems. Our Sun ULTRA 250
machines were in trouble, and we had an immediate need for monitoring.
Our first try was with the prtdiag command and syslog report,
but that turned out to be unusable and unreliable. The prtdiag
output dump was too big and caused additional trouble to our log
monitor. It was obvious that we needed something better.
By coincidence, I read Sun Performance and Tuning: Java and
the Internet by A. Cockcroft and R. Pettit (Sun Microsystems)
and got the idea of implementing a small, easily parsable output
program to report temperature, fan, and disk status. In a few days
I had created a tool called envstat, which mimics the basic output
of prtdiag and also can send data by cron to syslog. However,
my new tool was still missing a few things. The log monitor was
sending reports by email, which was handled manually at the receiving
end. This solution was clumsy, and the worst part was that the tool
was not integrated with a network management or integrated monitoring
system. Here is an example of envstat -t output:
#TABLE: temperature
#ROWS: label value
CPU0 48
CPU1 48
MB0 36
MB1 31
PDB 31
SCSI 28
Actually it goes through kernel chain and prints the requested values.
It is in C, so it is faster than some awkward parse interface to prtdiag.
Listing 1 shows the main body of the envstat program. The whole archive
can be found on the Sys Admin Web site or at:
http://jagor.srce.hr/~ddelija/envstat/
This limited tool did succeed in averting the aforementioned crisis.
Summer vacations started, and nobody cared much anymore. Still, I
felt dissatisfied and considered it a challenge to come up with something
better. I got an idea while reading Understanding SNMP MIBs
by David Perkins and Evan McGinnis (Prentice Hall). I thought a small,
dedicated SNMP agent built around our temperature module, envstat,
could be a nice, workable solution. The first step was to define a
MIB and then implement it through the agent and manager. Luckily,
there were useful leftovers on hand from previous projects.
During the winter of 1995-1996, we tried to implement a system
to monitor the network, but the project ended without much success.
A CARNet MIB was written and a related agent was developed in Scotty.
During the summer of 2000, I finally rewrote the agent to make it
simpler, more robust, and more secure and created a MIB extension
based on envstat. This tool has evolved into a modular agent with
the capacity to add new modules. The system currently includes production
modules for commands such as:
io stat -ne -- To show error status on io devices.
df -lk and df -oi -- To show file system status.
en vstat -t -- To show temperature status.
ne tstat -k -- To show some unusual counters on Solaris.
The agent main body is a very short initialization loop, which
dynamically loads defined modules (see Listing 2). The rest of the
job is hidden and done by Scotty initialization. Other modules can
be found at the Sys Admin Web site or at:
http://jagor.srce.hr/~ddelija/agent/
We have subsequently integrated this small agent into three different
management systems and given it a command-line interface. At the moment,
we are hoping to create a system integrity-checking interface that
would provide a MIB through which the management station can observe
the status of tools like AIDE, Tripwire, or fcheck.
Development of the SNMP Agent
A full-scale SNMP agent is a huge, complex piece of software.
As a heavy-duty server it is related to some important functions
on a managed system and capable of doing some mischief, like stopping
or starting important services unintentionally. SNMP agents are
often misconfigured and can be used as a back door into systems
or as a method of a denial of service, even by mistake. (The recent
CERT Advisory CA-2002-03 addressed SNMP-related security problems.)
As in any server process, the SNMP agent primarily consists of
two parts: a Protocol API that handles SNMP, and another part that
handles the system interface. The basic idea behind SNMP management
is remote debugging. Each SNMP agent presents the system as a set
of ordered, well-defined variables. Each variable has its name,
type, value range, and order defined and stored in the related MIB.
MIB stands for managed information base, which often causes misinterpretation
because there is no real database, just declarations. Variables
exist only in agents as names of real entities. Action on those
variables can trigger real action on the managed system. Changing
a variable can trigger a command execution or even stop the system.
The agent associates the MIB-related variables with real variables
and actions on the managed system. On the other hand, the manager
process must agree with the agent on variables and their syntax.
The manager and the agent therefore share the MIB as a common definition.
SNMP agents must be economical and unobtrusive when it comes to
system performance. It doesn't make sense for a system to spend
all its resources servicing the agent instead of performing the
primary function the agent is supposed to monitor.
If you are building a full-scale agent you intend to sell, there
are some standard MIBs that you must incorporate, like MIB II. If
you are a Sunday agent developer, you can be less strict and choose
to implement just a small subset of variables that are important
to you. Similar choices exist for development tools. If you are
building a heavy-duty, full-system agent, you'll probably use some
C-oriented tool well integrated into the system.
On the other hand, if you write something experimental, you might
develop a prototype with some open source tool or even in scripting
language. There are many such tools. UCSD SNMP is probably the best
known. Various Perl implementations are available, and there is
the Tcl/Tk implementation named "Scotty". Some of these tools are
standalone, and some are part of bigger network management environments.
I chose Scotty because it is simple and handy and is based on Tcl.
Tcl is a good solution for beginners because it is easy to learn,
develop, and extend. However, the simplicity of Tcl can sometimes
lead to bad code if you write the program very quickly and without
proper analyses.
There are various types of SNMP agents, some with very complex
functionalities. For experimental purpose or just interfacing commands,
you can use a very simple agent called a "screen scraper". A screen
scraper actually wraps around existing commands or applications.
Screen scrapers were among the first SNMP agents developed, and
are still useful and easy to implement. A screen scraper can exist
completely in the user space, without kernel hooks. This approach
is especially well suited to an operating system, like Linux, where
the system can read most important data straight from the /proc
file system.
Screen scrapers typically provide a control variable, which triggers
execution of the external command and shows when data was last generated.
Values scraped from command output are presented as tables, usually
indexed by rows.
How It Looks
Solaris 2.6 (and higher versions) provides the command iostat
-ne, which shows errors on devices like disks, tapes, CD-ROMS,
metadevices, and auto volumes. This command is an ideal early warning
for IO subsystem health. Figure 1 shows the output of the iostat
-ne command. The idea is to get iostat -ne output into
a simple index table as shown in Figure 2.
The iostat -ne output is self-explanatory, but for an exact
interpretation: s/w means soft errors on device, h/w means hard
errors, tm means transient errors, tot is the total of all three
error columns, and the last column is the device to which the errors
are related. To present such results within SNMP, there must be
a MIB with defined variables:
IoDEvicesNUmber -- Number of devices on which iostat
-ne returns statistics. As for control, the index table goes
from 1 to this number.
ErrorsiostatLastCHange -- Indicates time when iostat -ne
was last executed and the table was last loaded.
IostatDeviceTable -- Table of iostat -ne entries.
IostatDeviceEentry -- A line in the table consists of Index,
IostatDevice, soft errors, hard errors, and transient error columns.
Index -- Index number of the line.
IostatDevice -- Device that is described in line.
SoftError -- Number of soft errors, s/w column in output.
HardError -- Number of hard errors, h/w column in output.
TransientError -- Number of transient errors, trans column
in output.
To store this data for the agent, you'll need a two-dimensional
associative array with a few additional variables for the number
or devices and the timestamp.
Listing 3 shows an example of the envstat module interface. It
calls envstat -t command to get temperature information,
removes the two first lines and stores the rest into Env global
(a two-dimensional array with values bound to MIB variables). Each
module has at least two functions. The first is to handle the external
command, Env_get, and the second is the initialization function,
Env_Init. The initialization function handles MIB tree installation
and initial population of the global arrays.
Scotty allows connection of MIB variables with real Tcl variables
and event scripts. As variables in MIBs are ordered into a so-called
MIB-tree, it is possible to connect action scripts to the variables,
which are then executed in a specific order depending on how variables
are accessed.
In this case, each time a IoDeviceNumber is requested,
the new iostat -ne results are loaded. The data in the table
remains the same until the next request for an IoDeviceNumber.
The only code required is a few, fast lines in Tcl, having very
low impact on the system performance. The agent will connect the
real Tcl variables to MIB names and wait for a request. The process
is completely event driven.
The basic ideas behind this code are simplicity and ease of use.
Typically, agents are more complex, including security mechanisms,
passwords, address-based access controls, locking and synchronization,
and many other things. Another basic idea that it's important to
remember is not to make the agent too heavy on the system. Often,
the execution of CPU-heavy commands can slow the system. This is
a common problem with simple agents, especially if they are written
in scripting language and written in a hurry. Associative arrays
are very convenient for organizing things, and it's easy to forget
how huge the output from a command can be and what affect memory
allocation can have on system performance. During testing, we noticed
that our agent increased memory usage by about 10-12 MB of RAM,
which is acceptable for machines with more than 256 MB of RAM.
To avoid too rapid command execution, there is a resting period
for each management station. Such a mechanism with simple host-based
access control gives sufficient protection for some environments.
You must decide whether this type of solution is consistent with
the security policies of your own network. Additional security can
be achieved through using non-privileged users and ports for the
agents and additional SNMPv1 or SNMPv2 security mechanisms. To be
precise, the SNMP session definition is completely independent of
the basic functions. Actually, you can have more then one agent
session in the agent program. Scotty is well designed to accomplish
that. Security is more policy than implementation, but good rule
of thumb is to be as restrictive as possible, especially in the
beginning while all dependencies are not clear.
I know that professional agent and MIB writers will not agree
with this approach. It is true that, in the MIB design, there should
be better managed system analyses (which is the heart of MIB definition),
better design of modules, less overhead, and better testing. Professional
descriptions on how to build an agent can be found in Understanding
SNMP MIBs.
How to Integrate an Agent into the System
The presence of an agent is not enough if it is not integrated
into the management system. To begin, you must determine which management
system is available. The manager must be capable of contacting the
agent and working with the collected data. The manager should also
include a trouble notification tool.
In our case, the agent started as a simple helping tool. Data
collection was added to the existing module, which monitored the
environmental values for Cisco routers. Originally, notification
occurred through email, and data went to a file. This management
system was at a remote location and not easily portable. We eventually
needed a less centralized system because of the possibility of lost
mail and link problems. We were actually following the normal evolution
of management systems and architectures.
There were several discussions on how to overcome the shortcomings
of a centralized management model. One of the UNIX admins said something
about "all this SNMP mumbo-jumbo and still you cannot tell the cause
of a problem unless you're right on the command line". This comment
led to a simple idea: a command-line tool that can indicate specific
data from the agent and show it in an admin-friendly way. (Don't
confuse such a tool with the dreaded agent browser or MIB browser.)
We quickly developed a set of simple tools for displaying agent
data. These tools are in the manner of RPC-based r-tools (for example,
ruptime). We named our tool for accessing environmental information
renvstat. As other functionalities were added to the agent, we added
additional tools, such as riostat for remote iostat and rdf for
remote df. The output presents the data in sysadmin-acceptable form.
Figure 3 shows the example of the renvstat output.
The r-commands are written in Scotty. The effective code is about
30 lines. renvstat checks through the environmental tables in the
agent and outputs the results. (For a test, we wrote one version
of renvstat in Perl.) We installed these r-tools on computers where
admins are often logged in. Listing 4 contains the code for the
renvstat command, a remote version of envstat implemented
through SNMP. It can be found with other similar commands at:
http://jagor.srce.hr/~ddelija/agent/monitor/
This is a walk-through agent that pretty prints its results and, actually,
more than one machine can be defined in one call. The code is a little
bit more complex because of necessary MIB loading and session handling,
but still it is very short. The heart is the envstat_test function,
which does the MIB walk and prints the result. All of the abovementioned
r-commands are designed in the same way.
Once the r-commands were working, we started thinking about outfitting
a monitoring station close to the monitored devices. We tried Big
Sister and Big Brother, as well as other approved tools.
Big Sister proved to be the better choice, since it is completely
in Perl and has direct SNMP support. It turned out, however, that
my Perl package was broken so Big Sister was not able to run. So,
we tried Big Brother and got a Big Brother monitoring station operational
in a reasonably short time. The fact that Big Brother offers no
direct SNMP support did not present a problem. We succeeded in integrating
our r-tools almost without modification. Big Brother has its own
bb-df tool to collect and process file system data. Turning this
script bb-df into bb-rdf was a matter of changing the command "df"
into "rdf hostname". It was the same with other modules. Only the
envstat module needed its own new bb-env script developed from a
skeleton example.
Listing 5 shows the bb-rdf script (it is now outdated, because
Big Brother is getting new versions). My actual change was adding
rdf command into PATH and changing the value of the command name
variable DF into RDF to be compliant with Big Brother coding practice.
I had to add NODE name reference in the eval line, but the rest
is the same as before. Figure 4 shows Big Brother's global status
display.
Obviously, this is not the best solution since Big Brother is
a multitude of scripts doing data polling. The demand on the management
station running Big Brother is heavy, depending on the number of
events and polling schedule. A more sensible solution would be a
proxy or a mid-level manager stationed adjacent to Big Brother,
internally doing data poll from remote agents. Such an approach
lends itself to a sophisticated, professional system. We have plans
for building this type of a system when the demand impact on the
monitoring stations becomes too heavy.
A very important issue, closely related with the integration,
is the method of distribution, installation, and upgrade. The traditional
method for distributing software is to compile it, tarball it, copy
it, and install it on the target machines. This takes time and is
not error proof. A better method is to use packaging tools. We use
a Debian package manager in our system. This means all software
needed for our agents is organized into a set of Debian packages.
The more difficult question is not which package manager to use,
but how to separate the files into handy modules. For easy maintenance,
modules that are often changed should be kept in separate packages.
In our case, the agent code is constantly upgraded, so we isolated
it in a separate module. The steadier Tcl/Tk and Scotty code are
also in separate packages. This approach is helpful in emergency
situations and simplifies any automated upgrades.
The end result is a small, extensive open source system capable
of doing very specific tasks. However, the system is not easily
accessible with standard tools and utilities.
Conclusion
CERT Advisory "CA-2002-03 Multiple Vulnerabilities in Many Implementations
of the Simple Network Management Protocol (SNMP)," released on February
12th, 2002, compiles many well-known SNMP-related problems. At last
maybe some of the old problems will be solved. Like other protocols,
SNMP often suffers from unreliable old code and misconfigured daemons.
I encourage you to read through CERT Advisory CA-2002-03 before
embarking on an SNMP project.
Building the agent was definitely a useful experience. In reality,
for busy sys admins with sufficient financial resources, it would
be wiser to buy the necessary tools or contract with a professional
developer for a custom SNMP agent. However, if you don't have the
budget for a high-end solution, I hope you find this article helpful.
References and Links
A. Cockcroft and R. Petit, Sun Performance Tunning, Java &
Internet. ISBN 0-13-095249-4.
David Perkins and Evan McGinnis, Understanding SNMP MIBs.
ISBN 0-13-437708-7.
Dave Zeltserman and Gerard Puoplo, Building Network Management
Tools with Tcl/Tk. ISBN 0-13-080727-3.
SCOTTY home page -- http://wwwhome.cs.utwente.nl/~schoenw/scotty/
CARNet MIB page -- http://jagor.srce.hr/~ddelija/agent/
Agent home page -- http://jagor.srce.hr/~ddelija/envstat/
BigBrother home page -- http://www.bb4.com/
BigSister home page -- http://bigsister.graeff.com/
Unix Sysadmin Resource Center on Stokeley Consulting -- http://www.stokely.com/unix.sysadm.resources/
SNMP Research -- http://www.snmp.com
CERT Advisory CA-2002-03: "Multiple Vulnerabilities in Many Implementations
of the Simple Network Management Protocol (SNMP)", http://www.cert.org.
Damir Delija has been a UNIX system engineer since 1991. He
received a Ph.D. in Electrical Engineering in 1998. His primary
job is systems administration, education, and other system-related
activities.
|