Advanced
Configuration of Nagios
Syed Ali
In last month's Sys Admin, I described the open source,
SNMP-based monitoring tool Nagios (http://www.nagios.org).
(See Part 1 at: http://www.samag.com/documents/s=8892/sam0310c/.)
Nagios 1.1 released on June 2 of 2003 can be downloaded from http://www.nagios.org.
Nagios remotely monitors services and hosts, and can monitor network
services without the need for a software-based monitoring agent
on the computer you are monitoring. The primary purpose of Nagios
is to check the state of hosts or services on the network. However,
you can also use Nagios to obtain performance data on the hosts
and services it monitors. You can customize a Nagios plugin to look
for performance data. Nagios then lets you do either of the following:
- Route the performance data to an external command
- Write the performance data to a log file
Like other elements of the Nagios environment, the performance
data feature is not intended for any specific application but is,
instead, an architecture designed to support many kinds of custom
solutions. The details of obtaining the performance metrics are
left to the plugin creator. Similarly, the external command used
to process the performance information is at the discretion of the
Nagios user. The task of writing or adapting a plugin (which is
essentially script, such as a Perl script) is a much bigger topic
than I can cover in this article. (See http://sourceforge.net/projects/nagiosplug/
for more on writing Nagios plugins.) This article focuses on how
to configure Nagios to receive incoming performance data from the
plugin and route the data to a file or an external command.
Getting Performance Data with Nagios
As illustrated last month, Nagios lets the user specify commands,
which Nagios will run to monitor hosts and services. Some plugins
do not provide any performance data. For instance, the check_ssh
plugin, which can be run as follows:
$ /usr/local/nagios/libexec/check_ssh -H my_ssh_server
returns the following output:
SSH OK - OpenSSH_2.9p2 (protocol 1.99)
This output does not provide any indication of how long the SSH server
took to respond.
Other plugins provide performance data that is not necessarily
in a format that Nagios can use with the special performance-data
processing feature. For instance, the standard output line for a
Nagios plugin that monitors network connectivity using ping
is:
PING OK - Packet loss = 0%, RTA = 1.36 ms
This plugin does indeed provide some performance information, but
that information is passed to the standard $OUTPUT$ macro, thus bypassing
the performance data-processing features described in this article.
We need to modify the plugin source code to display the performance
data in a manner that makes it easy for Nagios to parse. According
to Nagios documentation, "If a plugin wishes to pass performance
data back to Nagios, it does so by sending the normal text string
that it usually would, followed by a pipe character (|), and then
a string containing one or more performance data metrics."
In other words, the data must be in the form:
Normal_output | performance_output
If the plugin output is in this form, Nagios routes the normal output
to the $OUTPUT$ macro and sends the portion of the output after the
| symbol to the $PERFDATA$ macro.
In the check_ping.c file found in the source code of Nagios plugins,
printing is done for the output by the following entry:
printf ("PING %s - %sPacket loss = %d%%, RTA = %2.2f ms", \
state_text (result), warn_text,pl, rta);
The entry has to be modified to print the performance data in a Nagios-friendly
manner as follows:
printf ("PING %s - %sPacket loss = %d%%, RTA = %2.2f ms | RTA = \
%2.2f ms ", state_text (result), warn_text,pl, rta, rta);
After recompiling the check_ping.c entry, running the check_ping
command gives an output similar to the following:
PING OK - Packet loss = 0%, RTA = 1.36 ms | RTA = 1.36 ms
The data after the | symbol is then placed by Nagios in the $PERFDATA$
macro. You can use the data in the $PERFDATA$ macro to build a performance
profile of the server response time over daily and weekly cycles.
Compiling Nagios for Performance Data
As I mentioned in an earlier section, Nagios offers two methods
for processing performance data:
- External command method -- An option that lets you specify
an external command that Nagios will execute to generate the performance
data. This is a very flexible option, however, this option consumes
more system resources than the alternative file-based method because
it forks a process each time a host or service check is performed.
- File-based method -- The data is output to a file in a
template-specified format. This is much less resource-intensive
than the external command method.
To get Nagios to recognize performance data, you must enable processing
of performance data in the main config file nagios.cfg, as
follows:
process_performance_data = 1
After you enable performance monitoring in the config file, you must
recompile Nagios and specify either the default external command method
or the file-based method. To use the default external command method,
rerun the configure script with the option:
# /usr/loca/src/nagios/configure --prefix=/usr/local/nagios \
--with-cgiurl=/nagios/cgi-bin--with-htmurl=/nagios \
--with-nagios-user=nagios --with-nagios-grp=nagios \
--with-default-perfdata
To use the file-based method, rerun the configuration script with
the option:
# /usr/loca/src/nagios/configure --prefix=/usr/local/nagios \
--with-cgiurl=/nagios/cgi-bin--with-htmurl=/nagios \
--with-nagios-user=nagios --with-nagios-grp=nagios \
--with-file-perfdata
then rerun make.
The External Command Method
As I previously described, Nagios uses the $PERFDATA$ macro to
provide the performance data collected from a monitoring plugin
such as check_ping. Once you have configured your Nagios implementation
for performance monitoring, you must specify the command that Nagios
will execute using the data it collected from the $PERFDATA$ macro.
The Nagios configuration file nagios.cfg lets you specify the name
of the command that will run against the $PERFDATA$ as follows:
service_perfdata_command=myservice-processor
myservice-processor is a name I have chosen for the command.
I must then associate this command name with a command definition
through the Nagios object configuration file commands.cfg. In this
case, commands.cfg contains a definition such as:
define command {
command_name myservice-processor
command_line /bin/echo "$HOSTNAME has a ping response time of
$PERFDATA" >> /usr/local/nagios/var/ping-response-time}
To process host performance data in addition to service performance
data, provide the following configuration definition in the main Nagios
config file nagios.cfg:
host_perfdata_command=myhost-processor
You will also need a definition of myhost-processor in the
object configuration file commands.cfg, similar to the definition
defined for the service:
define command {
command_name myhost-processor
command_line /bin/echo "$HOSTNAME has a response time of
$PERFDATA" >> /usr/local/nagios/var/host-response-time}
This example uses the command /bin/echo to write the data in
a reader-friendly format. I could have, however, used any other external
command to process the data.
File-based Method
The file-based method is less flexible than the default external
command method because it does not allow you to specify an external
command to be used with the $PERFDATA$ macro. All you can do is
specify a file into which Nagios will log the $PERFDATA$.
Remember that $PERFDATA$ is simply the performance output of the
plugin command (i.e., the data that a plugin returns after the |
symbol). We must tell Nagios where to log this data.
To use the file-based method, edit the main configuration file
Nagios.cfg for host and service-based $PERFDATA$ collection as follows:
xpdfile_host_perfdata_file=<hostfile>
xpdfile_service_perfdata_file=<servicefile>
<hostfile> can be /usr/local/nagios/var/myhostdata or
any other file you want to log to, and <servicefile>
can be /usr/local/nagios/var/myservicedata or any other log file.
You must tell Nagios in what format you want the logging done
by using the following two directives:
xpdfile_host_perfdata_template=<template>
xpdfile_service_perdata_template=<template>
<template> can be a definition such as:
$HOSTNAME$\t$TIME$\t$OUTPUT$\t$PERFDATA
Conclusion
By gathering performance data, you can execute a command against
the data or simply log the data to a file. In this article, I've
illustrated how to enable performance data collection, how to configure
Nagios to support separate processing of performance data, and how
to either run external commands against that data or simply log
that data to a file. Nagios offers a lot of flexibility when it
comes to monitoring the performance of services and hosts on the
network.
Syed Ali has a Master's in Computer Science from Stevens
Institute of Technology and is an MCP, MCSE, MCT, CCNA, CCAI, RHCE,
and SCSA. He currently works for a research laboratory in Princeton,
New Jersey, as a supervisor for systems administration. Syed can
be contacted at: alii@paul.rutgers.edu.
|