Cool
Commands
Peter Baer Galvin
There are so many commands in Solaris that it is difficult to
separate the cool ones from the mundane. For example, there are
commands to report how much time a program spends in each system
call, and commands to dynamically show system activities, and most
of these commands are included with Solaris 8 as well as Solaris
9. This month, I'm highlighting some of the commands that you
might find particularly useful.
Systems administrators are tool users. Through experience, we
have learned that the more tools we have, the better able we are
to diagnose problems and implement solutions. The commands included
in this column are gleaned from experience, friends, acquaintances,
and from attendance at the SunNetwork 2002 conference in September.
"The /procodile Hunter" talk by Solaris kernel developers
Brian Cantrill and Mike Shapiro was especially enlightening and
frightening because Cantrill wrote code to illustrate a point faster
than Shapiro could explain the point they were trying to illustrate!
Useful Solaris Commands
truss -c (Solaris >= 8): This astounding option to truss
provides a profile summary of the command being trussed:
$ truss -c grep asdf work.doc
syscall seconds calls errors
_exit .00 1
read .01 24
open .00 8 4
close .00 5
brk .00 15
stat .00 1
fstat .00 4
execve .00 1
mmap .00 10
munmap .01 3
memcntl .00 2
llseek .00 1
open64 .00 1
---- --- ---
sys totals: .02 76 4
usr time: .00
elapsed: .05
It can also show profile data on a running process. In this case,
the data shows what the process did between when truss was
started and when truss execution was terminated with a control-c.
It's ideal for determining why a process is hung without having
to wade through the pages of truss output.
truss -d and truss -D (Solaris >= 8): These truss
options show the time associated with each system call being shown
by truss and is excellent for finding performance problems in custom
or commercial code. For example:
$ truss -d who
Base time stamp: 1035385727.3460 [ Wed Oct 23 11:08:47 EDT 2002 ]
0.0000 execve("/usr/bin/who", 0xFFBEFD5C, 0xFFBEFD64) argc = 1
0.0032 stat("/usr/bin/who", 0xFFBEFA98) = 0
0.0037 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
0.0042 open("/usr/local/lib/libc.so.1", O_RDONLY) Err#2 ENOENT
0.0047 open("/usr/lib/libc.so.1", O_RDONLY) = 3
0.0051 fstat(3, 0xFFBEF42C) = 0
. . .
truss -D is even more useful, showing the time delta between system
calls:
Dilbert> truss -D who
0.0000 execve("/usr/bin/who", 0xFFBEFD5C, 0xFFBEFD64) argc = 1
0.0028 stat("/usr/bin/who", 0xFFBEFA98) = 0
0.0005 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
0.0006 open("/usr/local/lib/libc.so.1", O_RDONLY) Err#2 ENOENT
0.0005 open("/usr/lib/libc.so.1", O_RDONLY) = 3
0.0004 fstat(3, 0xFFBEF42C) = 0
In this example, the stat system call took a lot longer than
the others.
truss -T: This is a great debugging help. It will stop
a process at the execution of a specified system call. ("-U"
does the same, but with user-level function calls.) A core could
then be taken for further analysis, or any of the /proc tools could
be used to determine many aspects of the status of the process.
truss -l (improved in Solaris 9): Shows the thread number
of each call in a multi-threaded processes. Solaris 9 truss -l
finally makes it possible to watch the execution of a multi-threaded
application.
Truss is truly a powerful tool. It can be used on core files to
analyze what caused the problem, for example. It can also show details
on user-level library calls (either system libraries or programmer
libraries) via the "-u" option.
pkg-get: This is a nice tool (http://www.bolthole.com/solaris)
for automatically getting freeware packages. It is configured via
/etc/pkg-get.conf. Once it's up and running, execute
pkg-get -a to get a list of available packages, and pkg-get
-i to get and install a given package.
plimit (Solaris >= 8): This command displays and sets
the per-process limits on a running process. This is handy if a
long-running process is running up against a limit (for example,
number of open files). Rather than using limit and restarting
the command, plimit can modify the running process.
coreadm (Solaris >= 8): In the "old" days
(before coreadm), core dumps were placed in the process's
working directory. Core files would also overwrite each other. All
this and more has been addressed by coreadm, a tool to manage
core file creation. With it, you can specify whether to save cores,
where cores should be stored, how many versions should be retained,
and more. Settings can be retained between reboots by coreadm
modifying /etc/coreadm.conf.
pgrep (Solaris >= 8): pgrep searches through
/proc for processes matching the given criteria, and returns their
process-ids. A great option is "-n", which returns the
newest process that matches.
preap (Solaris >= 9): Reaps zombie processes. Any processes
stuck in the "z" state (as shown by ps), can be
removed from the system with this command.
pargs (Solaris >= 9): Shows the arguments and environment
variables of a process.
nohup -p (Solaris >= 9): The nohup command can
be used to start a process, so that if the shell that started the
process closes (i.e., the process gets a "SIGHUP" signal),
the process will keep running. This is useful for backgrounding
a task that should continue running no matter what happens around
it. But what happens if you start a process and later want to HUP-proof
it? With Solaris 9, nohup -p takes a process-id and causes
SIGHUP to be ignored.
prstat (Solaris >= 8): prstat is top and
a lot more. Both commands provide a screen's worth of process
and other information and update it frequently, for a nice window
on system performance. prstat has much better accuracy than
top. It also has some nice options. "-a" shows
process and user information concurrently (sorted by CPU hog, by
default). "-c" causes it to act like vmstat (new
reports printed below old ones). "-C" shows processes
in a processor set. "-j" shows processes in a "project".
"-L" shows per-thread information as well as per-process.
"-m" and "-v" show quite a bit of per-process
performance detail (including pages, traps, lock wait, and CPU wait).
The output data can also be sorted by resident-set (real memory)
size, virtual memory size, execute time, and so on. prstat
is very useful on systems without top, and should probably
be used instead of top because of its accuracy (and some
sites care that it is a supported program).
trapstat (Solaris >= 9): trapstat joins lockstat
and kstat as the most inscrutable commands on Solaris. Each
shows gory details about the innards of the running operating system.
Each is indispensable in solving strange happenings on a Solaris
system. Best of all, their output is good to send along with bug
reports, but further study can reveal useful information for general
use as well.
vmstat -p (Solaris >= 8): Until this option became available,
it was almost impossible (see the "se toolkit") to determine
what kind of memory demand was causing a system to page. vmstat
-p is key because it not only shows whether your system is under
memory stress (via the "sr" column), it also shows whether
that stress is from application code, application data, or I/O.
"-p" can really help pinpoint the cause of any mysterious
memory issues on Solaris.
pmap -x (Solaris >= 8, bugs fixed in Solaris >= 9):
If the process with memory problems is known, and more details on
its memory use are needed, check out pmap -x. The target
process-id has its memory map fully explained, as in:
# pmap -x 1779
1779: -ksh
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 192 192 - - r-x-- ksh
00040000 8 8 8 - rwx-- ksh
00042000 32 32 8 - rwx-- [ heap ]
FF180000 680 664 - - r-x-- libc.so.1
FF23A000 24 24 - - rwx-- libc.so.1
FF240000 8 8 - - rwx-- libc.so.1
FF280000 568 472 - - r-x-- libnsl.so.1
FF31E000 32 32 - - rwx-- libnsl.so.1
FF326000 32 24 - - rwx-- libnsl.so.1
FF340000 16 16 - - r-x-- libc_psr.so.1
FF350000 16 16 - - r-x-- libmp.so.2
FF364000 8 8 - - rwx-- libmp.so.2
FF380000 40 40 - - r-x-- libsocket.so.1
FF39A000 8 8 - - rwx-- libsocket.so.1
FF3A0000 8 8 - - r-x-- libdl.so.1
FF3B0000 8 8 8 - rwx-- [ anon ]
FF3C0000 152 152 - - r-x-- ld.so.1
FF3F6000 8 8 8 - rwx-- ld.so.1
FFBFE000 8 8 8 - rw--- [ stack ]
-------- ------- ------- ------- -------
total Kb 1848 1728 40 -
Here we see each chunk of memory, what it is being used for, how much
space it is taking (virtual and real), and mode information.
df -h (Solaris >= 9): This command is popular on Linux,
and just made its way into Solaris. df -h displays summary
information about file systems in human-readable form:
$ df -h
Filesystem size used avail capacity Mounted on
/dev/dsk/c0t0d0s0 4.8G 1.7G 3.0G 37% /
/proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
fd 0K 0K 0K 0% /dev/fd
swap 848M 40K 848M 1% /var/run
swap 849M 1.0M 848M 1% /tmp
/dev/dsk/c0t0d0s7 13G 78K 13G 1% /export/home
Conclusion
Each administrator has a set of tools used daily, and another
set of tools to help in a pinch. This column included a wide variety
of commands and options that are lesser known, but can be very useful.
Do you have favorite tools that have saved you in a bind? If so,
please send them to me so I can expand my tool set as well. Alternately,
send along any tools that you hate or that you feel are dangerous,
which could also turn into a useful column!
Peter Baer Galvin (http://www.petergalvin.org) is the
Chief Technologist for Corporate Technologies (www.cptech.com),
a premier systems integrator and VAR. Before that, Peter was the
systems manager for Brown University's Computer Science Department.
He has written articles for Byte and other magazines, and
previously wrote Pete's Wicked World, the security column,
and Pete's Super Systems, the systems management column for
Unix Insider (http://www.unixinsider.com). Peter is
coauthor of the Operating Systems Concepts and Applied
Operating Systems Concepts textbooks. As a consultant and trainer,
Peter has taught tutorials and given talks on security and systems
administration worldwide.
|