Monitoring
Latency with Smokeping
Bill Kramp
Smokeping is a great tool that complements the traditional ping
utility. Ping will quickly show whether a server is up and how fast
it is responding, but ping does not provide a way to study the round
trip time (RTT) -- the latency of the connection to the server
over a time interval. Smokeping, however, writes latency information
into a database. The smokeping program sends many requests during
each polling period to the network device and records the response
times. Smokeping can use pings, http requests, or SMTP requests
to the server or network service to collect the latency information.
This data can then be viewed with a browser to show the latency
for the network service. The graphs can also show the latency distribution
for each sample period.
Smokeping consists of two main parts:
- Smokeping -- The program that tests network devices and
records latency information in the database.
- smokeping.cgi -- A script that displays latency information.
Smokeping usually runs with a polling period of 5 minutes to collect
the latency to the network services. It reads a configuration file
that details the network server and service to test. The default
setting is to send 20 requests during each polling period to the
devices defined in the configuration file.
All of the data is collected and stored using the Round Robin
Database (RRD). RRD will probably be familiar to people who use
MRTG, Cricket, or the many other front ends used for collecting
SNMP data from network switches and routers. In this article, I
will cover how to configure and use smokeping to measure latency
on the network. I will provide a sample configuration script and
some example graphs of collected data. I will also explain how I
used Smokeping to document and resolve problems on the network.
Software Installation
Smokeping is a free software package with a GPL license. You can
download smokeping from the download section at the Web site:
http://people.ee.ethz.ch/~oetiker/webtools/smokeping/
Several other utilities are required to make smokeping work, such
as Perl, Apache, fping, echoping, SpeedyCGI, and RRD. You will also
need Socket6 if you need to resolve IPv6 addresses. Links to these
utilities can be found in the resources section at the end of this
article and at the smokeping Web site. All the utilities must be installed
before smokeping will work. Download, configure, and install the software
using the instructions included with each utility. Use gunzip and
tar to extract each of the utilities. Most of the software can then
be configured, compiled, and installed with the following commands:
# ./configure
# make
# make install
Some of the utilities may need to be configured with other options
to meet your needs. Make sure to read the README and INSTALL instructions
with each utility before installing.
Smokeping
The smokeping software should be installed under a user with limited
rights like "smokeping". After creating a user named smokeping,
install the smokeping software in this smokeping home directory,
using the smokeping account. For this article, the home directory
for smokeping will be /home/smokeping. Within the smokeping directory
"/home/smokeping", make the following directories: var,
public_html, and public_html/.simg. In the directories bin and etc,
remove ".dist" from the end of the files that end with
".dist". This can be done by using the mv command
to move them, or by making a copy with the cp command. Copy
the "/home/smokeping/htdocs/smokeping.cgi.dist" script
to the new public_html directory. Examples:
cp /home/smokeping/etc/config.dist /home/smokeping/etc/config
cp /home/smokeping/htdocs/smokeping.cgi.dist /home/smokeping/public_html/smokeping.cgi
The smokeping and smokeping.cgi scripts must be configured with the
correct paths to the utilities and configuration file. Edit /home/smokeping/bin/smokeping
to make sure the path for Perl is correct. Also check that the paths
for the smokeping and RRD library directories, as well as the path
for the configuration file, are defined correctly. This may require
the use of the find command, if you are not sure of their locations.
The paths for the lib and config file for smokeping could look like
this:
#!/usr/bin/perl -w
use lib qw(/local/rrdtool/lib/perl);
use lib qw(/home/smokeping/lib);
Smokeping::main("/home/smokeping/etc/config");
The next file to edit is the /home/smokeping/public_html/smokeping.cgi
script to make sure the path to speedy, the SpeedyCGI tool, is correct.
As in the smokeping utility, the configuration file, and the library,
directories for smokeping and RRD need to be defined. The paths for
the config and lib directories will be the same as the smokeping utility:
#!/usr/bin/speedy -w
use lib qw(/local/rrdtool/lib/perl);
use lib qw(/home/smokeping/lib);
Smokeping::cgi("/home/smokeping/etc/config");
Apache
Apache only needs a few changes in its httpd.conf file to allow
access to the smokeping.cgi script in the smokeping directory. The
AddHandler option for CGI scripts must be uncommented by removing
the leading pound sign (#). Access to the public_html directory
must be added to the httpd.conf file as well. The changes for httpd.conf,
shown in Listing 1, should allow smokeping to work, but should not
be considered a secure configuration. Steps should be taken to secure
the configuration of Apache and to limit access to the smokeping.cgi
script in a manner that will meet your security policies. (All listings
are on the Sys Admin Web site: http://www.sysadminmag.com.)
Utility Testing
It is important to test each utility to make sure the utilities
are working properly. Smokeping will only work if the utilities
on which it depends are working correctly. Understanding how each
utility works will make configuration and troubleshooting much easier.
These utilities can also be useful for testing and troubleshooting
network services in other applications. The following sections describe
three important utilities that support smokeping:
For more on configuring smokeping's support utilities, refer
to the utility documentation.
Fping
Fping can send out pings to a single address or multiple IP addresses
at the same time. After sending out the many concurrent pings, it
will keep track of the responses from each server. By comparison,
the ping utility can only query one server at a time. If an IP address
doesn't respond to fping, it will try several more times. You
should only use this tool to ping hosts or sets of IP addresses
that you are authorized to probe. Using this tool to ping hosts
or scan networks that you do not control could be viewed as an illegal
activity. A simple query to test fping is:
Command->
# fping -e -c 3 192.168.1.1
Response->
192.168.1.1 : [0], 84 bytes, 0.34 ms (0.34 avg, 0% loss)
192.168.1.1 : [1], 84 bytes, 0.32 ms (0.33 avg, 0% loss)
192.168.1.1 : [2], 84 bytes, 0.32 ms (0.32 avg, 0% loss)
192.168.1.1 : xmt/rcv/%loss = 3/3/0%, min/avg/max = 0.32/0.33/0.34
The response to the fping command showed three pings and their
latency. It also showed the minimum, maximum, and average latency
value for the three pings. To make fping executable by the smokeping
account, fping must be made setuid root. This will allow processes
other than root, like smokeping, to run the fping command.
Under Linux, the command to make fping setuid root is:
# chmod 4755 fping
Echoping
Echoping can be used to test several types of network services,
such as http, https, smtp, chargen, etc. The three main types used
by Smokeping are http, https, and smtp. To test an http service,
use the -h option to specify that the echoping request will
be an html page, followed by the path. The server and port for the
Web server are then specified, with a colon used to separate the
server address and port number. The -n followed by a number
option informs echoping to execute the command three times:
Command->
Echoping -n 3 -h / www.domain-name:80
Response->
Elapsed time: 0.002667 seconds
Elapsed time: 0.002565 seconds
Elapsed time: 0.002613 seconds
---
Minimum time: 0.002565 seconds (99805 bytes per sec.)
Maximum time: 0.002667 seconds (95988 bytes per sec.)
Average time: 0.002615 seconds (97897 bytes per sec.)
Median time: 0.002613 seconds (97972 bytes per sec.)
Echoping should respond with the message showing the elapsed time
for each of the http requests. It will also display the minimum, maximum,
average, and median latency values for the Web server. Testing all
the Web sites manually will be easier to troubleshoot than within
smokeping.
If encrypted http connections are going to be used, the OpenSSL
library will need to be installed first. Issue the Echoping configure
command with the "-with-ssl" option. Then issue the commands
make and make install.
It is also very easy to test the response times of email servers.
Issuing the following command can test the latency of an SMTP connection:
# echoping -n 3 -S 192.168.1.1
Curl
Curl is another useful tool for testing http and https connections.
Echoping will inform you of the response time or a failure, but
it is sometimes useful to know exactly what the http or https response
looks like. Curl will display the html response in text form, and
not interpret the html commands. With command-line options that
are available, it can use either GET or POST to send information
to a Web server. It can also send user authentication information
to a server to access protected resources. Here is a sample use
of curl to display an https Web page:
# curl --insecure https://web.domain-name
The "--insecure" option tells it not to use any default
CA certificates. The "-cacert <CA certificate>" option
would be used to define a certificate to be used during a connection.
Smokeping Configuration
The smokeping configuration script consists of five basic sections:
general, database, presentation, probes, and targets (see sidebar).
The format is hierarchical structure for use with the parseconfig
module that comes with smokeping. The sample config script in Listing
2 shows a basic configuration file for smokeping.
Running Smokeping
It is always a good idea to run things manually before turning
them loose. Running smokeping with the debug option will cause it
to display information about what it is doing. Smokeping will not
run as a daemon while in debug mode, but will only run once and
display more detailed information about what it is doing:
Command->
/home/smokeping/bin/smokeping --debug
Response (condensed)->
### fping seems to report in 1 milliseconds
Launched successfully
FPing: probing 4 targets
EchoPingHttp: probing 3 targets
EchoPingHttp: forks 5, rounds 1, timeout 300
EchoPingHttp: executing cmd /local/bin/echoping -h / -n 20 mars:80
EchoPingHttp: mars: got 0.001975 0.002002 0.002008 0.002016
0.002018 0.002022 0.002022 0.002024 0.002028 0.002030 0.002031
0.002040 0.002059 0.002060 0.002068 0.002072 0.002106 0.002251
0.002315 0.004432
The response showed that four targets were defined in latency testing
with fping. Three http targets were also defined, of which only the
"mars" Web server debug data is displayed. The "mars"
http response shows the results of the 20 latency tests.
To start Smokeping as a daemon, execute the smokeping utility
without any options. Smokeping will daemonize a process and return
to the shell prompt. The process can be restarted to read the configuration
file by running smokeping with the "--restart" option.
Web Pages
Smokeping uses a CGI script to query the RRD database and create
the graphs and Web pages. To save a couple of keystrokes in specifying
the URL to the CGI script, create an index.html file (Listing 3)
in the smokeping public_html directory to automatically redirect
to the smokeping.cgi script.
Graphs
Smokeping uses two different types of graphs to display the latency
information: an overview graph and a detailed graph. The overview
graph shows the RTT latency value for the past 10 hours (Figure
1). The graph displays the median RTT value for that time period,
as well as the last retrieved value. The average packet loss is
also calculated and displayed on the graph. Clicking on the graph
will retrieve the detailed graph for that server or service being
monitored.
The detailed smokeping graph (Figure 2) use color codes to indicate
any packet loss. The color-coded bar indicates that median value
of the packets sent during each polling period. Smokeping will use
green if there was no packet loss. Light blue is used for losing
one packet, with increasingly darker shades for loss increases up
to four. Purple and red are used when the packet loss is 10 or 19
per polling period. If all packets during a polling period are lost,
nothing will be displayed for that time period on the graph.
Smokeping also uses grey bars to indicate the distribution of
the response times. A dark grey indicates a tight packing of many
polls with the same RTT. A tall bar of light grey indicates a large
variance of the latency in the packets sent.
When viewing the smokeping graphs for latency, be aware that the
data collected using pings may show packet loss. Many routers will
drop the ICMP ping packets when they are busy handling other traffic.
ICMP is given a much lower priority when compared to the other UDP
and TCP traffic. It is better to test for latency using http, smtp,
or the other supported services under smokeping.
Resolving Network Problems with Smokeping
The following sections describe some examples of how smokeping
can help troubleshoot network problems.
Random Latency Problem
Last year, some users noticed a problem with their terminal application
experiencing a response delay of 5 to 6 seconds to a remote site.
The problem did not happen all the time, and of course, never occurred
when I tried it. The remote site could not detect any problem with
the network equipment. I configured smokeping to monitor the latency
to the remote server and the routers in between. After several days,
the data showed that the problem was occurring at the remote sites
server (Figure 3). The graph showed that the server was experiencing
latencies of up to 7 seconds at random intervals. The router graph
(Figure 4) does show some packet loss, but not at the same time
as the server. In some cases, the router has zero packet loss with
no change in latency, while the server shows latency problems.
Copies of the graphs showing smokeping's findings were emailed
to the remote site. The remote site determined that the router was
very busy handling our inbound connection, and lots of outbound
connections at those times. The remote site then used a packet shaper
to give priority to our application traffic, which resolved the
problem.
Broadband Problem
In another case, a remote site using a broadband provider for
Internet access began to see slower response times in accessing
our site. Another site with the same broadband provider was not
experiencing this problem. The Cricket/RRD data showed that utilization
of the bandwidth was not the cause. Looking at the smokeping data
showed the latency had been slowly getting worse over a few days.
I contacted the provider and explained the problem along with the
data I was seeing with the smokeping graphs. They sent a technician
who diagnosed a problem with the external broadband equipment.
Monitoring Server Performance
Smokeping can also be used to monitor the latency of a Web server.
We have used it to monitor the response time of a Web server with
a backend database. Smokeping makes a request that only the Web
server handles, and makes requests that make use of the backend
database. Smokeping provides a history of how the two servers are
performing. When a user complains of performance problems, I can
look at the graphs to see what has changed and when. This helps
reduce troubleshooting time if we can see a correlation to a server
patch or modification.
Conclusion
Smokeping has proven valuable to me in performing my job responsibilities.
Learning how to use it, and the other utilities like curl and echoping,
has been very useful with other network projects. Identifying a
problem, and backing that up with a graph, went a long way toward
resolving network problems quickly. I think Smokeping just reinforces
the old saying that a picture is worth a thousand words.
Resources
Smokeping -- http://people.ee.ethz.ch/~oetiker/webtools/smokeping/
RRD -- http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/pub/
Apache -- http://www.apache.org/
Perl -- http://www.perl.com/
Speedy CGI -- http://daemoninc.com/speedycgi/
Fping -- http://people.ee.ethz.ch/~oetiker/webtools/smokeping/pub/fping-2.4b2_to.tar.gz
Echoping -- http://echoping.sourceforge.net/
Curl -- http://curl.haxx.se/download.html
Bill Kramp is the Network Administrator for the Finger Lakes
Community College, located in Upstate New York. He's been employed
at the college since 1989, where he designs and manages the networking
for the main campus and the two Extension Centers. He uses many
open source tools for monitoring the network: Cricket/RRD, Netsaint
(Nagios), smokeping, Snort, etc. Bill can be reached at: krampwd@flcc.edu.
|