Article

nov2003.tar

Advanced Configuration of Nagios

Syed Ali

In last month's Sys Admin, I described the open source, SNMP-based monitoring tool Nagios (http://www.nagios.org). (See Part 1 at: http://www.samag.com/documents/s=8892/sam0310c/.) Nagios 1.1 released on June 2 of 2003 can be downloaded from http://www.nagios.org. Nagios remotely monitors services and hosts, and can monitor network services without the need for a software-based monitoring agent on the computer you are monitoring. The primary purpose of Nagios is to check the state of hosts or services on the network. However, you can also use Nagios to obtain performance data on the hosts and services it monitors. You can customize a Nagios plugin to look for performance data. Nagios then lets you do either of the following:

Route the performance data to an external command
Write the performance data to a log file

Like other elements of the Nagios environment, the performance data feature is not intended for any specific application but is, instead, an architecture designed to support many kinds of custom solutions. The details of obtaining the performance metrics are left to the plugin creator. Similarly, the external command used to process the performance information is at the discretion of the Nagios user. The task of writing or adapting a plugin (which is essentially script, such as a Perl script) is a much bigger topic than I can cover in this article. (See http://sourceforge.net/projects/nagiosplug/ for more on writing Nagios plugins.) This article focuses on how to configure Nagios to receive incoming performance data from the plugin and route the data to a file or an external command.

Getting Performance Data with Nagios

As illustrated last month, Nagios lets the user specify commands, which Nagios will run to monitor hosts and services. Some plugins do not provide any performance data. For instance, the check_ssh plugin, which can be run as follows:

$ /usr/local/nagios/libexec/check_ssh -H my_ssh_server

returns the following output:

SSH OK - OpenSSH_2.9p2 (protocol 1.99)

This output does not provide any indication of how long the SSH server took to respond.

Other plugins provide performance data that is not necessarily in a format that Nagios can use with the special performance-data processing feature. For instance, the standard output line for a Nagios plugin that monitors network connectivity using ping is:

PING OK - Packet loss = 0%, RTA = 1.36 ms

This plugin does indeed provide some performance information, but that information is passed to the standard $OUTPUT$ macro, thus bypassing the performance data-processing features described in this article.

We need to modify the plugin source code to display the performance data in a manner that makes it easy for Nagios to parse. According to Nagios documentation, "If a plugin wishes to pass performance data back to Nagios, it does so by sending the normal text string that it usually would, followed by a pipe character (|), and then a string containing one or more performance data metrics." In other words, the data must be in the form:

Normal_output | performance_output

If the plugin output is in this form, Nagios routes the normal output to the $OUTPUT$ macro and sends the portion of the output after the | symbol to the $PERFDATA$ macro.

In the check_ping.c file found in the source code of Nagios plugins, printing is done for the output by the following entry:

printf ("PING %s - %sPacket loss = %d%%, RTA = %2.2f ms", \
  state_text (result), warn_text,pl, rta);

The entry has to be modified to print the performance data in a Nagios-friendly manner as follows:

printf ("PING %s - %sPacket loss = %d%%, RTA = %2.2f ms | RTA = \
  %2.2f ms ", state_text (result), warn_text,pl, rta, rta);

After recompiling the check_ping.c entry, running the check_ping command gives an output similar to the following:

PING OK - Packet loss = 0%, RTA = 1.36 ms | RTA = 1.36 ms

The data after the | symbol is then placed by Nagios in the $PERFDATA$ macro. You can use the data in the $PERFDATA$ macro to build a performance profile of the server response time over daily and weekly cycles.

Compiling Nagios for Performance Data

As I mentioned in an earlier section, Nagios offers two methods for processing performance data:

External command method -- An option that lets you specify an external command that Nagios will execute to generate the performance data. This is a very flexible option, however, this option consumes more system resources than the alternative file-based method because it forks a process each time a host or service check is performed.
File-based method -- The data is output to a file in a template-specified format. This is much less resource-intensive than the external command method.

To get Nagios to recognize performance data, you must enable processing of performance data in the main config file nagios.cfg, as follows:

process_performance_data = 1

After you enable performance monitoring in the config file, you must recompile Nagios and specify either the default external command method or the file-based method. To use the default external command method, rerun the configure script with the option:

# /usr/loca/src/nagios/configure --prefix=/usr/local/nagios \
  --with-cgiurl=/nagios/cgi-bin--with-htmurl=/nagios \
  --with-nagios-user=nagios --with-nagios-grp=nagios \
  --with-default-perfdata

To use the file-based method, rerun the configuration script with the option:

# /usr/loca/src/nagios/configure --prefix=/usr/local/nagios \
  --with-cgiurl=/nagios/cgi-bin--with-htmurl=/nagios \
  --with-nagios-user=nagios --with-nagios-grp=nagios \
  --with-file-perfdata

then rerun make.

The External Command Method

As I previously described, Nagios uses the $PERFDATA$ macro to provide the performance data collected from a monitoring plugin such as check_ping. Once you have configured your Nagios implementation for performance monitoring, you must specify the command that Nagios will execute using the data it collected from the $PERFDATA$ macro. The Nagios configuration file nagios.cfg lets you specify the name of the command that will run against the $PERFDATA$ as follows:

service_perfdata_command=myservice-processor

myservice-processor is a name I have chosen for the command. I must then associate this command name with a command definition through the Nagios object configuration file commands.cfg. In this case, commands.cfg contains a definition such as:

define command {
command_name myservice-processor
command_line /bin/echo "$HOSTNAME has a ping response time of 
  $PERFDATA" >> /usr/local/nagios/var/ping-response-time}

To process host performance data in addition to service performance data, provide the following configuration definition in the main Nagios config file nagios.cfg:

host_perfdata_command=myhost-processor

You will also need a definition of myhost-processor in the object configuration file commands.cfg, similar to the definition defined for the service:

define command {
command_name myhost-processor
command_line /bin/echo "$HOSTNAME has a response time of 
  $PERFDATA" >> /usr/local/nagios/var/host-response-time}

This example uses the command /bin/echo to write the data in a reader-friendly format. I could have, however, used any other external command to process the data.

File-based Method

The file-based method is less flexible than the default external command method because it does not allow you to specify an external command to be used with the $PERFDATA$ macro. All you can do is specify a file into which Nagios will log the $PERFDATA$.

Remember that $PERFDATA$ is simply the performance output of the plugin command (i.e., the data that a plugin returns after the | symbol). We must tell Nagios where to log this data.

To use the file-based method, edit the main configuration file Nagios.cfg for host and service-based $PERFDATA$ collection as follows:

xpdfile_host_perfdata_file=<hostfile> 
  xpdfile_service_perfdata_file=<servicefile>

<hostfile> can be /usr/local/nagios/var/myhostdata or any other file you want to log to, and <servicefile> can be /usr/local/nagios/var/myservicedata or any other log file.

You must tell Nagios in what format you want the logging done by using the following two directives:

xpdfile_host_perfdata_template=<template> 
xpdfile_service_perdata_template=<template>

<template> can be a definition such as:

$HOSTNAME$\t$TIME$\t$OUTPUT$\t$PERFDATA

Conclusion

By gathering performance data, you can execute a command against the data or simply log the data to a file. In this article, I've illustrated how to enable performance data collection, how to configure Nagios to support separate processing of performance data, and how to either run external commands against that data or simply log that data to a file. Nagios offers a lot of flexibility when it comes to monitoring the performance of services and hosts on the network.

Syed Ali has a Master's in Computer Science from Stevens Institute of Technology and is an MCP, MCSE, MCT, CCNA, CCAI, RHCE, and SCSA. He currently works for a research laboratory in Princeton, New Jersey, as a supervisor for systems administration. Syed can be contacted at: [email protected].