Network
Monitoring with Nagios
Syed Ali
System and network monitoring is essential for systems administrators.
There are many SNMP-based management tools -- both commercial and
free -- that can be used to manage and monitor nodes on a network.
There are also non-SNMP based tools, which do a good job of monitoring
network nodes. In this article, I will explain how to install and
use Nagios, a GPL tool written by Ethan Galstad that you can use
for host and service monitoring.
Nagios is primarily based on Linux, but will run under most Unix
variants. Nagios has many useful features that can help enhance
performance and promote a user-friendly environment. For example,
if you have a large number of Web servers running in your environment,
you can monitor the HTTP service on all of them using Nagios. The
best part is that you do not have to install any program on your
Web servers to do the monitoring because Nagios can monitor the
HTTP service remotely. Similarly, you can monitor almost any TCP-
or UDP-based service that is running on any platform without having
to install Nagios on any of the monitored hosts. I use Nagios to
monitor services such as HTTP, DHCP, DNS, FTP, SMTP, and LDAP.
Nagios can also monitor host resources such as disk space, CPU
load, log file size, running processes, and memory usage. Unlike
service monitoring, host resource monitoring requires installation
of the Nagios monitoring agent on the host that is being monitored.
Nagios supports notification of problems and resolutions via email
or pager. With the help of a plugin, Nagios can also be used to
monitor SNMP-based events. Information about monitored hosts and
services can be displayed in a 3-D VRML map.
Installation
You can download Nagios at: http://www.nagios.org. CVS-enlightened
users can download Nagios source code by typing:
cvs'-'cvs -d:pserver:anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios
anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios login
with a blank password. For example, hit the enter key when prompted
for a password and then type:
cvs -z3 -d:pserver:anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios
anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios co nagios
You should also check:
http://www.nagios.org/cvs.php
before attempting to download via CVS in case of any updates.
Nagios consists of a core program and some plugins. The core program
calls the plugins to do the monitoring. check_smtp, for example,
is a plugin that checks to see whether SMTP is running on the specified
host. The core program is written in C and some of the plugins are
in C as well as Perl. The core program (as of this writing) is v1.0
and can be downloaded as nagios-1.0.tar.gz.
For the purposes of this article, I will show compile and install
information on a Red Hat Linux v7.2 box with kernel v2.4.7-10smp.
Nagios does not require a fast machine to run, although I happen
to have a dual processor 2.4-GHz Pentium IV with 4 GB of RAM on
which I am running Nagios. Once you have downloaded Nagios, download
the basic set of plugins from: http://www.nagios.org/download/.
You can start with the base set, "nagios-plugins-1.3.0.tar.gz".
A list of the plugins is available at: http://sourceforge.net/project/showfiles.php?group_id=29880.
I performed the installation of Nagios as root and Nagios was
smart enough to set the appropriate permissions as well as ownership
on files. Nagios is not required to run as root, nor would I recommend
it. The default user and group that Nagios configuration files contain
is called "nagios".
It's best if you create this user and group as follows. The useradd
command will automatically create a user and a group called "nagios",
and will set the primary group of user "nagios" as group "nagios".
The home directory will also automatically be set as /home/nagios,
but you can modify that using the -d option:
root:/usr/local/src:#useradd nagios -d
/usr/local/nagios
Next, unzip the Nagios program:
root:/usr/local/src:#id
uid=0(root) gid=0(root)
groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),2(daemon)
root:/usr/local/src:#pwd
/usr/local/src
root:/usr/local/src:#ls
nagios-1.0.tar.gz nagios-plugins-1.3.0.tar.gz
root:/usr/local/src:#tar xvfz nagios-1.0.tar.gz
cd to the nagios-1.0 directory and run the configure command:
root:/usr/local/src/nagios-1.0:#./configure
The configure command assumes the following defaults, which you can
change as you see fit. I recommend you leave the default installation
directory as is:
./configure --prefix=/usr/local/nagios
--with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios
--with-nagios-user=nagios --with-nagios-grp=nagios
Compile Nagios:
root:/usr/local/src/nagios-1.0:#make all
To install the binaries, run:
root:/usr/local/src/nagios-1.0:#make install
Three sample configuration files are automatically created by Nagios
-- nagios.cfg-sample, cgi.cfg-sample, and resource.cfg-sample. If
you want these files installed in the appropriate directory (i.e.,
/usr/local/nagios/etc), assuming you choose the default installation
path of "/usr/local/nagios", run the following command:
root:/usr/local/src/nagios-1.0:#make install-config
To automatically start Nagios at the next reboot, run:
root:/usr/local/src/nagios-1.0:#make install-inits
which will place the Nagios startup file "/etc/init.d/nagios". I recommend
creating start and stop links for the appropriate run level in /etc/rc3.d
as follows:
root:/etc/rc3.d:#ln -s ../init.d/nagios S100nagios
root:/etc/rc3.d:#ln -s ../init.d/nagios K01nagios
Installing and Configuring Plugins
Your next step is to install the plugins Nagios uses to monitor
hosts and services. Downloading "nagios-plugins-1.3.0.tar.gz" (http://sourceforge.net/project/showfiles.php?group_id=29880)
will get the latest set of plugins. You do not have to configure
the plugins in the same source directory as Nagios. For example,
I downloaded the plugins module in /usr/local/src and then went
on to uncompressing and untarring the plugins file as follows:
root:/usr/local/src:ls -l nagios-plugins-1.3.0.tar.gz
-rw-r--r-- 1 root root 491510 Mar 2 00:09 nagios-plugins-1.3.0.tar.gz
root:/usr/local/src:gzip -d nagios-plugins-1.3.0.tar.gz
root:/usr/local/src:tar xvf nagios-plugins-1.3.0
The above procedure should create a nagios-plugins-1.3.0 directory
where it is untarred. Once you cd to the nagios-plugins-1.3.0
directory, you must configure the plugins and install them, which
can be done as follows:
root:/usr/local/src/nagios-plugins-1.3.0:./configure
root:/usr/local/src/nagios-plugins-1.3.0:make all
root:/usr/local/src/nagios-plugins-1.3.0:make install
Since we choose the default install path of Nagios (i.e., /usr/local/nagios),
we do not have to specify the path for Nagios when configuring the
plugins. However, if you choose another path, use the following syntax
when running the configure script for the plugins:
root:/usr/local/src/nagios-plugins-1.3.0:./configure
--prefix=BASEDIRECTORY --with-nagios-user=SOMEUSER
--with-nagios-group=SOMEGROUP --with-cgiurl=SOMEURL
Configuration
The next step is to configure Nagios, which is the most involved
part of the project. Nagios has a number of configuration files
that can be seen by running ls in /usr/local/nagios/etc:
root:/usr/local/nagios/etc:ls
contactgroups.cfg-sample
escalations.cfg-sample hosts.cfg-sample
misccommands.cfg-sample services.cfg-sample
cgi.cfg-sample contacts.cfg-sample
hostextinfo.cfg-sample nagios.cfg-sample
checkcommands.cfg-sample
dependencies.cfg-sample hostgroups.cfg-sample
resource.cfg-sample timeperiods.cfg-sample
Notice that all of these have -sample appended because they
are sample configuration files. Before using any of these files, remove
"-sample".
We can effectively break the configuration files of Nagios into
the following types:
1. Main configuration file (nagios.cfg)
2. Resource file(s) (resource.cfg)
3. Object configuration files
4. CGI configuration file (cgi.cfg)
5. Extended information configuration files
Nagios is highly customizable, so I will only cover some of configuration
directives in this article. Since nagios.cfg is the main configuration
file for Nagios, start configuration of nagios.cfg as follows:
root:/usr/local/nagios/etc:su - nagios
nagios:/usr/local/nagios:cd etc
nagios:/usr/local/nagios/etc:cp
nagios.cfg-sample nagios.cfg
nagios:/usr/local/nagios/etc:vi nagios.cfg
There are more than 65 configuration directives in nagios.cfg. Because
we chose the default configuration, we do not have to modify any of
the configuration directives in this file to get Nagios up and running.
A few of the directives are listed below.
Specify the location of the nagios log file:
log_file=/usr/local/nagios/var/nagios.log
Specify which user nagios should run as:
nagios_user=nagios
Specify which group nagios should run as:
nagios_group=nagios
If you want Nagios to use syslog, leave the option below at 1:
use_syslog=1
Specify how often you want Nagios to rotate its log file (d
stands for daily):
log_rotation_method=d
Object Configuration
Next, modify the object configuration files. According to Ethan
Galstad (the main developer of Nagios), an object is "simply a generic
term I use to describe various data definitions you need in order
to monitor anything." The files that contains object definitions
include the following:
1. checkcommands.cfg
2. contactgroups.cfg
3. contacts.cfg
4. dependencies.cfg
5. escalations.cfg
6. hostgroups.cfg
7. hosts.cfg
8. misccommands.cfg
9. services.cfg
10. timeperiods.cfg
Contacts Definitions
The contacts.cfg file contains information on contacts. A contact
is a systems administrator or other person who will be notified
in the event of an emergency. An example entry is as follows:
define contact{
contact_name joe
alias System Admins
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email joe@mycompany.com
}
define contact{
contact_name mary
alias System Admins
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email mary@mycompany.com
}
Next, edit the contactgroups.cfg file and make an entry as follows:
define contactgroup{
contactgroup_name admintrators
alias System Admins
members joe,mary
}
Contact groups are very useful because you can create groups such
as HttpAdmins or SmtpAdmins and use these groups for notification
of respective services. (See sidebar "contact.cfg Parameters".) In
the above example, I have created a group called "administrators"
that contains user "joe" and "mary". I use this group for notification
of all hosts and services because we do not have separate HTTP or
SMTP administrators.
Host Definitions
To define the hosts that Nagios will monitor for us, we need a
host definition file. Nagios host definition file is called hosts.cfg
and in case of the default installation resides in /usr/local/nagios/etc/hosts.cfg.
Defining a host in the host definition file means providing a name
for the host, an IP address, and defining other parameters that
Nagios will use to monitor the host.
Configuration of object files can be done either with the "old"
method or the template-based "new" default method. The "old" method
was file-based, but did not use templates. Using a template for
host and service definition simplifies adding new hosts and services
for monitoring. For example, consider adding a host monitor by editing
hosts.cfg file. A host monitor is an instance of a host that defines
attributes about the host (such as name, IP address, how many checks
Nagios should run against the host) that can be seen in the following
example:
define host{
name generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24x7
notification_options d,u,r
register 0
}
define host{
use generic-host
host_name www1
alias www1
address 10.10.10.1
parents gw-10
}
}
define host{
use generic-host
host_name www2
alias www2
address 10.10.10.2
parents gw-10
}
This shows a template called "generic-host", and a host called "www1"
and "www2", which uses the generic template. The generic-host template
defines all the configuration directives that any host using this
template will inherit. You can override the inheritance by duplicating
the template-specified configuration in the host definition itself.
For example, if you want notifications disabled for host www2 while
you perform maintenance on it, you would modify the host entry as
follows:
define host{
use generic-host
host_name www2
alias www2
address 10.10.10.2
notifications_enabled 0
parents gw-10
}
In the above case, www will inherit all properties of generic-host,
and the notifications_enabled property of the generic-host template
will be overwritten with the new value of 0 (i.e., notifications disabled
for host www2).
The required configuration directives are shown in the sidebar
(of the same name). For the optional configuration directives, visit:
http://nagios.sourceforge.net/docs/1_0/xodtemplate.html#host
Hostgroups
The hostgroups.cfg file lets you create a logical group of hosts.
You could modify the hostgroups.cfg file to create a group for HTTP
servers as follows:
define hostgroup{
hostgroup_name http-servers
alias HTTP Servers
contact_groups admin
members www1, www2
}
Logically grouping servers based on the service to monitor is one
possible methodology. A host should belong to at least one group and
may belong to multiple groups. For example, if one of your HTTP servers
is also your email server, you can have another hostgroup such as:
define hostgroup{
hostgroup_name smtp-servers
alias SMTP Servers
contact_groups admin
members www2
}
Service Definitions
We now need to define a service to monitor on our previously defined
host. You can use templates in service definitions just as you can
with host definition templates. I first define a generic service
(called "generic-service") that has a large number of configuration
options:
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 3
retry_check_interval 1
contact_groups admin
notification_interval 0
notification_period 24x7
notification_options w,u,c,r
register 0
}
I can then create a service definition for an http-servers group that
inherits the generic-service properties and adds a few of its own:
define service{
use generic-service
hostgroup_name http-servers
service_description HTTP
contact_groups admin
check_command check_http
}
(See the sidebar for Nagios required service parameters.) For additional
directives, visit: http://nagios.sourceforge.net/docs/1_0/xodtemplate.html#service.
More on Plugins
Plugins do the actual monitoring for Nagios. The core Nagios engine
calls the plugins to check on hosts and service. Nagios provides
a number of plugins, and you can write your own plugins if want
to monitor almost any host or service. The plugins that come with
Nagios have help available when you execute the plugins with the
-h option.
When you specify the check_command option in services.cfg for
a particular service, Nagios looks up the check_command in the file
checkcommands.cfg and then runs the command based on the specified
options. Look at one of the entries in checkcommands.cfg:
define command{
command_name check_http
command_line $USER1$/check_http -H
$HOSTADDRESS$
}
The $HOSTADDRESS$ is a macro that Nagios expands to the IP address
of the host as defined in hosts.cfg. Nagios provides 32 user-defined
macros from $USER1$ through $USER32$. All macros are defined in the
resource.cfg file. In the resource.cfg file, $USER1$ has already been
defined as "$USER1$=/usr/local/nagios/libexec". Therefore, Nagios
will execute:
/usr/local/nagios/libexec/check_http -H 10.10.10.1
for the following entry in services.cfg file:
define service{
use generic-service
host_name www1
service_description HTTP
contact_groups admin
check_command check_http
}
Remember from the previous example that www1 has IP address 10.10.10.1.
When the service check executes, Nagios will look up the command check_http
in checkcommands.cfg and will run the command with the options specified
in checkcommands.cfg.
Web-based GUI
Nagios provides an excellent, optional Web-based user interface.
If you want to use the Nagios Web interface, you must edit cgi.cfg
and also your Web server configuration file. In my case, I am using
Apache and have modified the httpd.conf as follows:
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin/
<Directory "/usr/local/nagios/sbin/">
AllowOverride AuthConfig
Options ExecCGI
Order allow,deny
Allow from all
</Directory>
Alias /nagios/ /usr/local/nagios/share/
<Directory "/usr/local/nagios/share/">
AllowOverride AuthConfig
Options None
Order allow,deny
Allow from all
</Directory>
The above configuration will let me access the Nagios GUI by typing:
http://nagioshost/nagios/
The ExecCGI options allows execution of CGI scripts and the AllowOverrise
AuthConfig allows me to use the directives below in my .htaccess file:
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
require valid-user
Setting up authentication to access the Nagios Web interface is a
good idea. To set up authentication, create a .htaccess file (as shown
above) and set up a new user as follows:
htpasswd -c /usr/local/nagios/etc/htpasswd.users admin
For the Nagios cgi configuration file cgi.cfg, I have modified the
following entries:
refresh_rate=90
xedtemplate_config_file=/usr/local/nagios/etc/hostextinfo.cfg
default_statusmap_layout=3
host_unreachable_sound=hostdown.wav
host_down_sound=hostdown.wav
service_critical_sound=critical.wav
service_warning_sound=warning.wav
service_unknown_sound=warning.wav
normal_sound=noproblem.wav
The refresh_rate is the rate at which Nagios will refresh the Web
page for the status, statusmap, and extinfo CGIs. The hostextinfo.cfg
file is explained in the "Beautification" section later in this article.
The Web interface lets you view a status "map" of the hosts being
monitored. The map can be laid out in the following coordinates:
0 = User-defined coordinates
1 = Depth layers
2 = Collapsed tree
3 = Balanced tree
4 = Circular
5 = Circular (Marked Up)
User-defined coordinates allows a user to pick and choose where
hosts are displayed on the status map. The coordinates of a host
on the status map should be defined in the extended information
file, which is hostextinfo.cfg by default. Coordinates are defined
as 2d or 3d, in positive integer format, with x and y coordinates.
The coordinates you specify are for the upper left-hand corner of
the host icon that is drawn. For example, in the hostextinfo.cfg
file:
define hostextinfo{
host_name www
icon_image web.png
vrml_image web.png
statusmap_image web.gd2
2d_coords 100,250
3d_coords 100.0,50.0,75.0
}
The depth layers option displays the parent nodes and not the child
nodes. Child nodes are visible by clicking on a parent node. Depth
layer is useful for a large network where there are many parent/child
relationships.
The collapsed tree option displays all hosts in a layered tree-like
manner, giving you the option of clicking on a host and zooming
in to see its child nodes, if any are defined. Child nodes are not
displayed by default.
The balanced tree option displays all the hosts that are being
monitored as nodes of a tree, with the root of the tree being the
Nagios server process. All nodes are considered equal distance from
the root.
The circular option shows all the hosts around the central Nagios
server, arranged in a circular manner. This gives a cluttered view
if you have many hosts being monitored.
The sound parameters define sounds that are played on the client
Web browser. I think the sounds are useful because I do not have
to constantly watch the Nagios GUI or check my email to be notified
of a system-down status.
Beautification
If you want to use image icons to represent the host and services
you are monitoring, Nagios comes with a decent amount of icons in
jpg, gif, jd2, and png format. You can download six additional logo
packs from:
http://www.nagios.org/download/
Each logo pack contains of a number of logos that you can use to represent
your hosts and services.
Conclusion
Nagios can be a useful tool in a network for monitoring hosts
and services. The ability of Nagios to remotely monitor services
without the installation of software on the monitored hosts is a
great plus. Nagios can also be used without SNMP or along with SNMP
to effectively monitor a network. The professional GUI that Nagios
offers rivals those of many commercial tools. If you are low on
budget or favor free software over commercial ones, then give Nagios
a try. You will not be disappointed.
Resources
Nagios html and pdf documentation -- http://www.nagios.org/docs/
Nagios FAQ -- http://www.nagios.org/faqs/
Nagios email list -- http://www.nagios.org/mailinglist.php
There are seven mailing lists: nagios-[users,announce,devel,checkins]
and nagiosplug-[help,devel,checkins]. The one most helpful to new
users is probably nagios-announce (for new announcements). If you
want extra help after reading the documentation and the FAQ, then
try nagios-users.
Syed Ali has a Master's in Computer Science from Stevens Institute
of Technology and is an MCP, MCSE, MCT, CCNA, CCAI, RHCE, and SCSA.
He currently works for a research laboratory in Princeton, New Jersey,
as a supervisor for systems administration. Syed can be contacted
at: alii@paul.rutgers.edu.
|