Getting
Status Information about an HACMP Cluster
Arni Snorri Eggertsson
During the time that I have managed High Availability Cluster
Multiprocessing for AIX (HACMP) clusters, I have found that status
information for these clusters is not very accessible. There are
ways of monitoring them with standard commands that answer one question
per command or with the CLVIEW program, which is actually getting
better and better with each release. But it's difficult to monitor
for status changes and especially difficult to do custom monitoring,
such as Big Brother. And since HACMP is intended for running highly
available applications, state monitoring is one of the essential
parts for it to meet its goals.
Over time I created small scriptlets using SNMP to give me information
about specific things inside HACMP. These include checking if all
interfaces are up, and if each application is up and running and
on its intended node. As my collection of scriptlets grew, I decided
to combine them into one script that would create a human-readable
status screen (see Listing 1). My experience is that condensed information
that can fit on one screen is very useful for systems administrators
to quickly and easily debug a problem.
But how can you get this information and wrangle with it? The
easiest way to do this is by querying the HACMP server demons using
SNMP (the command is snmpinfo on AIX and snmpget on Linux). You
can't, however, receive all information using SNMP, so I had to
use at least one command that belongs to the HACMP command set.
Detailed discussion about SNMP in general is beyond the scope
of this article, but I assume readers at least know what SNMP is.
Actually, I think anyone with basic shell scripting skills who reads
through the script given in Listing 2 will understand it without
knowing anything about SNMP.
Information is actually quite easy to retrieve using SNMP since
IBM is kind enough to publish the MIB (Management Information Database),
which helps translate and query SNMP data into a human-readable
format. Here is an example without the MIB in place:
root@server1:/>snmpinfo -m dump -c public -h localhost \
1.3.6.1.4.1.2.3.1.2.1.5.1
1.3.6.1.4.1.2.3.1.2.1.5.1.1.0 = 1
1.3.6.1.4.1.2.3.1.2.1.5.1.2.0 = 41:43:4D:45
1.3.6.1.4.1.2.3.1.2.1.5.1.3.0 =
1.3.6.1.4.1.2.3.1.2.1.5.1.4.0 = 2
1.3.6.1.4.1.2.3.1.2.1.5.1.5.0 = 1
1.3.6.1.4.1.2.3.1.2.1.5.1.6.0 = 1043707312
1.3.6.1.4.1.2.3.1.2.1.5.1.7.0 = 0
1.3.6.1.4.1.2.3.1.2.1.5.1.8.0 = 32
1.3.6.1.4.1.2.3.1.2.1.5.1.9.0 = 73:65:72:76:65:72:31
1.3.6.1.4.1.2.3.1.2.1.5.1.10.0 = 73:65:72:76:65:72:31
1.3.6.1.4.1.2.3.1.2.1.5.1.11.0 = 2
1.3.6.1.4.1.2.3.1.2.1.5.1.12.0 = 1
However, if we have the MIB in place and repeat the query, we get
the following. Note that I also added the "-v" flag to translate the
results from HEX into ASCII:
root@server1:/>snmpinfo -m dump -v -c public -h localhost -o \
/usr/sbin/cluster/hacmp.defs 1.3.6.1.4.1.2.3.1.2.1.5.1
clusterId.0 = 1
clusterName.0 = "ACME"
clusterConfiguration.0 = ""
clusterState.0 = 2
clusterPrimary.0 = 1
clusterLastChange.0 = 1043707312
clusterGmtOffset.0 = 0
clusterSubState.0 = 32
clusterNodeName.0 = "server1"
clusterPrimaryNodeName.0 = "server1"
clusterNumNodes.0 = 2
clusterNodeId.0 = 1
And to make it even easier, you can replace the query string "1.3.6.1.4.1.2.3.1.2.1.5.1"
with the word "cluster" and get the same results. All this translation
between these numbers and names is done with help from the MIB. The
MIB that is deployed with HACMP/CS and HACMP/ES is named hacmp.defs
and hacmp.my. These are plain text files and provide information about
what you can get from HACMP with SNMP. You may have to open these
files to fully understand the result set. The constants I set up first
in the script are pure information from the MIB.
If you are using HACMP/ES, the MIB files are located under /usr/es/sbin/cluster
but, if you are using HACMP/CS, the MIB files can be found under
/usr/sbin/cluster.
The scripts can easily be modified to run on a different node
than one of the cluster nodes. To make them run on a different AIX
machine, just comment out the part where I use the clfindres
command and update the variables that decide which host to query.
Please note that different operating systems have different commands
of using SNMP so some adjustments may be required to port the script
to another platform.
To understand how some of the information is retrieved, you can
view Table 1, which describes the SNMP keys and what the retrieved
information means. Some SNMP keys describe themselves.
The entire script can be found in Listing 2. The script is basically
built from about five function calls, each of which does a bit of
work. When combining my scriptlets into this single script, I tried
to be as modular as possible, but that is not always possible when
reusing code. I also realize that this can all be done with Perl
in a simpler manner, but I prefer shell scripts.
The script works like this: gather_information is called, which
reads the variable SNMPKEYS, and this variable contains all the
SNMP keys listed in Table 1. These are the keys from which I get
most of the information. The gather_information function calls get_info,
which translates all the SNMP information into variables using the
snmp_cmd function. Once we have most of the information loaded as
variables, it's time to print the status screen using the print_report
function. Since shell scripts do not offer three-dimensional arrays,
I had to improvise when printing the network information and that's
why network information is retrieved from a special function.
This is just one way of how to use the information gathered. This
script can easily be extended to do much more and even report to
open source monitoring tools such as BigBrother or BigSister. I
encourage everyone to have a go at modifying the status screen and
let their imagination go wild.
Arni Snorri Eggertsson is an RHCE and an CATE for AIX 5L and
AIX 4. He has been working professionally in the *nix world for
about four years now although he has much longer been active as
an amateur. His focus is large HACMP installations and AIX support.
Arni can be contacted at arnie@gormur.com.
|