Article

jun2003.tar

Stress Testing Jabber with the Jabber Test Suite

Dustin Puryear

Jabber is an instant messaging service whose time has come. The Jabber software relies on an open standard, which in turn depends on XML. This allows developers and service providers to use Jabber for any number of reasons. A common use of Jabber (http://www.jabber.org) is as an Instant Messaging (IM) application that connects fellow Jabber users with each other as well as with users of other IM services. Yet, Jabber can do so much more when serving as a gateway between custom services. (For an introduction to Jabber, see "Instant Messaging with Jabber," by Chris Josephes, in January 2002 Sys Admin.)

As the use of Jabber increases, the need for a testing harness has increased as well. When deploying Jabber, administrators and developers alike need to know how well the Jabber service scales as traffic and message size increases. I developed the Jabber Test Suite (JabberTest) for just this reason and have since expanded the software to evaluate a number of criteria, including delivery rates, out-of-order delivery, and the total number of connections a server can handle concurrently.

In this article, I will show how to install and run JabberTest. I will also work through several test scenarios where JabberTest is used to determine how well your Jabber service is working under a given load.

Installing the Suite

JabberTest requires a few things. To begin, you need a working C compiler installed on your UNIX system along with expat (http://expat.sf.net), an XML parser library written in C. You may want to download and install the expat library using your vendor's customized packages (i.e., an RPM). Doing so is fine, but for completeness I will install expat from source. (I will be using a Red Hat 7.0 workstation for the examples in this article.)

To download expat, go to the project's SourceForge page located at http://www.sf.net/projects/expat and download the latest expat package. I will use expat 1.95.6, but later versions should work fine with JabberTest. Once you have downloaded the source code, untar the tarball, move into the source directory, run configure, and then install the library:

# tar xvfz expat-1.95.6.tar.gz
# cd expat-1.95.6
# ./configure
# make install

We will be using the shared expat libraries. By default, expat will install to /usr/local, so check that /usr/local/lib is listed in /etc/ld.so.conf and then run ldconfig:

# ldconfig

Next, download and compile JabberTest. Like expat, JabberTest is hosted on SourceForge. Download the source code at http://jabbertest.sf.net. I will be using version 1.0.2, but later versions will work with the examples provided here:

# tar xvfz jabber-testsuite-1.0.2.tgz
# cd testsuite
# cp Makefile.linux Makefile
# make

Notice that you need to choose the makefile for your given UNIX flavor. I tested JabberTest under Linux, FreeBSD, SunOS, Solaris, and even Cygwin. You can probably compile JabberTest under other systems as well, but you may need to tweak the makefile to do it.

Once these steps are completed, JabberTest should be compiled and ready to go. Note that there is no make install step. The binaries will reside in the testsuite/ directory, but you can move them to another location.

Registering Your Test Accounts

The next step is to learn how each of the tools within the suite work. The first tool I will examine is userreg, which registers a group of test accounts with the Jabber service. When using userreg, and indeed all of the JabberTest tools, you must specify the hostname used by the Jabber service. In my example, I will be using jabber.example.com, but this choice will vary depending on where you have the server installed:

[testsuite]$ ./userreg -h jabber.example.com -u 4
0  1047251364.889784 1047251365.252948 0.363164
1  1047251365.258430 1047251365.595498 0.337068
2  1047251365.598393 1047251365.900254 0.301861
3  1047251365.908584 1047251369.180991 3.272407

In this example, userreg will connect to jabber.example.com and then register four test accounts named test_0, test_1, test_2, and test_3. Notice the output of userreg, which consists of four columns: iteration, registration start, and stop time (in the form of seconds.milliseconds), and the difference between the start and stop registration time.

When using userreg, focus on the diff column. When a Jabber server is heavily loaded this number will increase, and registrations that take more than a few seconds, at most, are generally too long. If this is the case, then you will need to research the cause of the slow registration process.

Common causes of slow registrations include an overloaded server or a server using file-based user accounts instead of a DBMS back-end. The problem with using a file-based user account scheme with Jabber is that directory searches tend to slow down as the number of files in a directory increase. So if you are serving more than 10,000 or so accounts, you could experience delays.

Testing for Connection Limits

Another useful tool in the suite is pasvlogin, which logs in a given number of users with Jabber. Connections to Jabber are silent in that they do not produce any messages destined for a recipient. Rather, pasvlogin will connect using a given user account and then disconnect. You can specify the number of connections to make, and whether you wish pasvlogin to maintain the connections until the program is killed.

There are two main focuses behind pasvlogin. First, pasvlogin can help determine the maximum theoretical connections that may be maintained by your Jabber server. For example, if you want to see whether your Linux-based Jabber server can maintain more than 1K connections (an old problem for administrators running Jabber under Linux), then run pasvlogin with the following parameters:

[testsuite]$ ./pasvlogin -h jabber.example.com -u 1500
0  1047251491.814272 1047251492.699332 0.885060
1  1047251492.708603 1047251492.988734 0.280131
2  1047251492.998453 1047251493.301929 0.303476
3  1047251493.308586 1047251493.633857 0.325271
...

Of course, if you are running pasvlogin on a Linux machine, then your testing client may hit the 1K-connection limit, so consider running pasvlogin concurrently on multiple machines. To do this properly, you must ensure that each copy of pasvlogin uses a unique set of accounts. This is done using the -n and -u options, which specify the first user account to log in and the total number of accounts to log in, respectively. For example, to specify that accounts test_5, test_6, and test_7 are used, use the following parameters:

[testsuite]$ ./pasvlogin -h jabber.example.com -n 5 -u 3

When using pasvlogin, you may also want to vary the speed with which attempts to establish sessions with the Jabber service are made. This is especially important if you have karma enabled (as is the default with the Jabber.org server). To specify the rate at which connections are made, use the -x and -y parameters, which specify a throttle rate in seconds and microseconds, respectively. For example, to ensure that one connection is made once per second and to maintain those connections until cancelled by the user, try the following:

[testsuite]$ ./pasvlogin -h jabber.example.com -u 2 -w -x 1
0  1047251610.753983 1047251611.044539 0.290556
1  1047251612.058311 1047251612.345279 0.286968
Enter quit to close sessions and quit.

Using the -x and -y parameters allows you to determine the best values for your karma based on the typical usage patterns of your clients. In addition, by varying these values you can determine the connection rate at which your Jabber server becomes swamped.

Finding Jabber Server Trouble Spots

The real power tools of the Jabber Test Suite are msgloadsnd and msgloadrec. These tools can help send a specified number of messages per second through your Jabber service.

The msgloadsnd tool sends the messages found in the file messages.txt (this can be overridden using the -z parameter). The messages file allows you to customize what messages to send so that you can test customized Jabber components that may act on the contents of the message. Currently, msgloadsnd will randomly select which messages to send from the file, but in future versions you will be able to go through the list of messages in the file linearly.

msgloadrec receives messages sent by msgloadsnd, and will use key values implanted by msgloadsnd in the message body to print the number of messages received, the order in which each message is received, and the time it took to receive the message. This information can help determine whether you are losing messages, whether messages are delivered out-of-order, and how long Jabber is taking to actually deliver the message to the recipient.

Begin by running a very simple load test against our Jabber server. The msgloadrec and msgloadsnd pair uses a set of users, one to send messages and one to receive. Begin by loading msgloadrec so that you have a Jabber user logged in and ready to receive messages. Do this by specifying the user account that will receive messages (known as the to-user) using the -t parameter:

[testsuite]$ ./msgloadrec -h jabber.example.com -t test_1 -d

Also notice the use of the -d parameter, which specifies that msgloadrec not only receive messages but also parse special timing data in the message created by msgloadsnd so that delivery order and times can be printed.

Next, launch msgloadsnd from another shell or another machine entirely. Specify both the sending (user-from) and receiving (user-to) user to msgloadsnd, using the -f and -t parameters, respectively. In addition, specify the total number of messages to send using -m, as shown below:

[testsuite] $ ./msgloadsnd -h jabber.example.com -f test_0 \
-t test_1 -m 256

In this example, msgloadsnd will send a total of 256 messages through Jabber from the account "[email protected]" to "[email protected]". Unless you have karma disabled, you may get kicked off the service by Jabber because you are sending messages through the service as fast as you can. (Using this kind of test with multiple msgloadrec and msgloadsnd clients will definitely help you tweak the server operating system parameters and Jabber so that high-load situations are well handled.)

While the test is running, you will see output from msgloadrec similar to the following:

0     0     1046405663.848253 1046405664.094517 0.246264
1     1     1046405663.857878 1046405664.094517 0.236639
2     2     1046405663.867901 1046405664.098372 0.230471
3     3     1046405663.877830 1046405664.098372 0.220542

The output here is very similar to the output of pasvlogin. The five columns are as follows:

Delivery count, which starts at zero when msgloadrec is first loaded
Message ID
When the message was sent
When the message was received (in the form of seconds.milliseconds)
The difference between the send time and receive time

Consider the uses of this data in real-world situations. First, consider the delivery count column, which is the first column in the output. msgloadrec maintains an internal counter of the number of messages received since the process was launched. Second, imagine that you have sent four messages through a heavily loaded Jabber service, but the output of msgloadrec shows only:

0     0     1046405663.848253 1046405664.094517 0.246264
1     1     1046405663.857878 1046405664.094517 0.236639
2     2     1046405663.867901 1046405664.098372 0.230471

In this case, you know that Jabber has lost the message somewhere. If you need to guarantee message delivery at all times, then this is an excellent test to perform while the server is under a high load. This is a good way to catch bugs in custom transports or other customizations of the Jabber server that behave incorrectly during high loads or low memory conditions.

Another possibility is that the output is actually as follows:

0     0     1046405663.848250 1046405664.094510 0.246260
1     2     1046405663.867900 1046405664.094510 0.226610
2     3     1046405663.877830 1046405664.098370 0.220540
3     1     1046405663.857870 1046405664.098370 0.240500

Notice that the message ID column is no longer in ascending numerical order. The message ID is used to determine when a message was sent. Because the numbers are out of order, we know that msgloadrec received the actual messages out of order. In this case, messages 0, 2, 3, and then 1 arrived in that order even though they were sent out in the order of 0, 1, 2, and 3. For some applications out-of-order delivery may be acceptable, but for others that may not be the case.

Finally, the last three columns are used when you need to know message delivery times. The last field is a simple difference between the message-sent and message-received fields. These values can be used to determine the minimum, maximum, and average delivery times for your Jabber server. You may want to load many msgloadrec and msgloadsnd pairs on multiple machines to hammer against a Jabber service to see how well the service scales against the load. This is an opportune time to monitor performance of the server using tools such as sar, vmstat, and iostat. Delivery time is an excellent indicator of the success of tuning hardware to run large-scale Jabber services.

When actually tuning your Jabber server, another useful parameter is -l, which specifies that msgloadsnd send a steady stream of messages through Jabber until msgloadsnd is killed (typically with Control-C). When used in conjunction with -x or -y, which specify how many messages to send every second or usec, respectively, this lets you fine-tune how many messages to send through Jabber at any given time:

[testsuite]$  ./msgloadsnd -h jabber.org  -f test_0 -t \
test_1 -l -y 100000

Here we are sending a constant flow of messages to user "[email protected]" at the rate of 1 message every 1/10 of a second. Of course, the rate at which you specify msgloadsnd to send a message may not actually be the rate at which messages are sent. Your process may be preempted often enough that it can't even send messages that frequently. Thus, the only way to actually check how often messages are being sent is to review the message send time column in the msgloadrec output (indeed, that is one of the main reasons why those columns exist).

Learning More About Your Jabber Service

In this article, I have described the Jabber Test Suite and shown how the suite can be used to learn more about your own Jabber service. The goal of the suite is to allow administrators access to the kind of information required to tune their servers, and to ensure any client requirements are maintained (e.g., delivery order). There are other parameters available for use with the tools that are worth exploring. Additionally, you can use the tools in various combinations to create new ways to test your service, and you can further fine-tune the messages to be sent by modifying the messages file used by msgloadsnd. Feel free to email me with any questions, comments, bug reports, or feature suggestions.

Dustin Puryear is author of Integrate Linux Solutions into Your Windows Network (Premier Press), has written numerous technical articles, is a conference speaker, and continues to push the envelope in UNIX and Windows integration as principle consultant of Puryear Information Technology. He can be contacted at: [email protected].