u386mon is a performance monitor that has been in the
public
domain for a number of years, through several revisions
(the last
one I have seen being u386mon2.20). This article tells
you
how to use u386mon to get and interpret system performance
data.
u386mon is primarily for 386/486-based systems, but
the author
reports in the supplied README file that it has been
ported to other
environments. The distribution provides makefiles for
several of the
popular versions of Intel-based UNIX, including SCO
and Interactive.
It is not for XENIX-based systems as the XENIX kernels
do not provide
support for the nlist(3C) function, but use the xlist(S)
function instead.
u386mon outputs detailed information about what is happening
on the system in several different user displays. The
display area
supported is 25 or 43 lines, with or without color,
and is split into
three primary areas:
Variable, Boot Information, Tunable Parameters, and
Process Information
The CPU section reports on instant, 5-second and 10-second
CPU utilization.
The information includes total, user, kernel, and break
percentages
and is displayed in both numerical and graphical form.
In the graphical
bar chart, user mode is represented by the letter "u"
and
kernel by a "k."
When the CPU shows user mode, the kernel is executing
application
calls. Kernel mode indicates that a system call has
been requested.
Since a process in this state cannot be interrupted
or killed, such
a process will show a higher degree of kernel-mode CPU
utilization.
Most performance monitoring tools -- such as vmstat
--
include CPU wait time with CPU utilization. There is
a difference,
however, and keeping track of the two separately helps
you measure
the true effectiveness of your CPU. If the CPU is spending
a lot of
time waiting for device interrupts or program alarms,
then it is idle,
and this can be significant if the CPU doesn't have
anything else
to do. When the WAIT time is merged with the actual
KERNEL utilization
time, the values reported are not realistic.
u386mon separates the actual KERNEL and WAIT times so
that
you can see the amount of time the CPU is actually spending
on processing
instructions versus the time spent waiting for other
activities.
In the System/Memory Information (Sysinfo/Minfo) display
area, shown
in Figure 1, the majority of the kernel routines are
displayed. This
display can also be replaced with a process listing.
u386mon Commands
The following are the commands available once u386mon
is running.
ESCape -- quits u386mon
e -- selects the extra display. If the
display is in 43-line mode, this information will now
be included
in it.
m -- selects the main Sysinfo/Minfo display.
p -- shows a ps listing (does not
include zombie, sleeping, shell, or getty processes).
s -- shows serial I/O information for the
standard serial devices which make up COM1 and COM2.
+/- -- increments or decrements the
update delay by one second. The initial startup value
is 2 seconds,
with the valid range being from 1 to 4 seconds.
l -- Lock the u386mon process into
RAM if you are running as the superuser (which you pretty
well have
to be in order to be sure that you can read the /dev/kmem
device).
When u386mon is locked into RAM, the worked PLOCK appears
on the screen.
Deciphering the Display
Color is also used to communicate information. In the
display shown
in Figure 1, the CPU utilization values display in green
if below
70 percent, in yellow if between 70 percent and 89 percent,
and in
red if over 90 percent.
Similarly, in the CPU WAIT display, total wait time
displays in green
if it is less than 30 percent, in yellow if utilization
is between
30 percent and 49 percent, and in red if it is 50 percent
or above.
If the word "INEXACT" appears on the screen,
it indicates
that u386mon was not scheduled quickly enough to collect
accurate
1-second interval values. If this is the case, the 5-
and 10-second
intervals are also likely to be inaccurate.
If the word "INVALID" appears, then u386mon
was scheduled
3 or more seconds late, and all percentage values are
now suspect.
Sysinfo/Minfo Information
The following information is reported on the Sysinfo/Minfo
screen,
and I thank Paul Hite, of PRC Realty Systems in Virginia,
and Greg
Oetting, of the Santa Cruz Operation's Support Department,
for helping
shed some additional light on these:
bread -- Measures the physical block reads
from the block devices, such as disks.
bwrite -- Measures the physical block writes
to the block devices, such as disks.
lread -- Measures the logical disk reads
from the buffer cache.
lwrite -- Measures the logical disk writes
to the buffer cache. (In the case of lread and lwrite,
if the kernel wants a chunk of data, it will look in
the buffer cache
first, which will increment the value of lread. If the
data
is not found, then a physical block read will be done,
thus incrementing
bread.)
phread -- Measures the blocks read with
the kernel physio routine.
phwrite -- Measures the blocks written
with the kernel physio routine.
swapin -- Counter which is incremented
with each request to swap in a page from the swap device
(a page is
typically 4Kb in size).
swapout -- Counter which is incremented
with each request to swap out a page to the swap device.
bswapin -- Measures the number of blocks
read in from the swap device (a block is typically 512
bytes).
bswapout -- Measures the number of blocks
written out to the swap device. If the four swap values
are high,
(the term high being relative), this could indicate
a potential RAM
bottleneck, as the machine appears to be doing more
swapping than
would be desirable.
iget -- Measures the number of calls made
to get the inode information when given the inode number
and the device
number.
namei -- Measures the number of calls made
to convert a pathname into the associated inode information.
dirblk -- Measures the number of blocks
read for directory reads.
readch -- Measures the number of characters
read using the read() system call.
writch -- Measures the number of characters
written using the write() system call.
rawch -- Counts raw tty characters
read.
canch -- Counts raw tty characters
put onto canonical queue.
outch -- Represents the number of characters
that have been written out.
msg -- Counts message operations (msgsnd()).
sema -- Counts semaphore operations (semop()).
maxmem -- Displays the amount of physical
RAM available -- typically the total system RAM minus
the RAM consumed
by the kernel itself.
freemem -- Displays the current amount
of free RAM.
mem used -- Shows the percentage of maxmem
which has been allocated to processes. The higher this
number, the
more likely that pages will have been placed on swap.
nswap -- Shows the total amount of swap
area available.
frswap -- Like freemem, represents
the amount of unallocated swap space.
swp used -- Like mem used, shows
a percentage of the nswap which has been allocated.
If this value
and mem used are both high, then it is an indication
that your machine
may be spending CPU cycles thrashing between RAM and
the swap area.
pswitch -- Displays the number of process
context switches. A context switch occurs when one process
gives up
the CPU and another process takes over. Higher values
in this field
indicate a sign of I/O intensive processes, because
the kernel will
switch to another process rather than wait for the I/O
operation to
be completed. Note that this is not switching from kernel
to user
mode, as this is a change in MODE, not in context.
syscall -- Represents the number of system
calls.
sysread -- Represents the number of read()
calls.
syswrit -- Represents the number of write()
calls.
sysfork -- Represents the number of fork()
calls.
sysexec -- Represents the number of exec[,l,le,etc.]()
calls.
runque -- Shows the size of the run queue.
runocc -- Shows the percentage of time
that there is a job ready to run in the run queue.
swpque -- Shows the size of the swap queue.
swpocc -- Shows the percentage of time
that there is a job in the swap queue.
vfault -- Counts the number of page faults,
which occur when a requested page is not in memory.
When a page fault
occurs, the system looks in all of the available RAM
for that page
(specifically from the free page pool); if the page
is not found,
then it is loaded in from the filesystem. Page faults
occur only with
program text, not data.
demand -- Represents the number of pages
created by calls to the malloc() system call. The pages
are
usually filled with byte value zero (0), hence the term
demand zero
pages.
pfault -- Represents the number of page
faults due to "copy on write." This type of
error occurs when
creating a new process via the fork() system call. Rather
than
copying all of the data pages for the forked program,
we mark the
data pages from the parent as "copy on write."
Since many
fork calls are followed with an exec system call,
this increases the performance of the system.
steal -- Represents the number of pages
which have been stolen by paging (see sidebar).
pnpfault -- Not used in SCO UNIX. Will
always be zero.
wrtfault -- Not used in SCO UNIX. Will
always be zero.
A number of other values are also shown on the main
display. The majority
of these values are not reported under SCO UNIX, and
therefore will
be zero.
Extra Display Information
The extra display information shown in Figure 2 is part
of the main
screen if you are running in 43-line mode. Be aware
that owing to
differences in porting, the bootinfo data provided differs
between
Interactive Systems and SCO. This information will not
be available
for systems which do not support it.
The kernel parameters included in the Extra Display
are defined as:
v_autoup -- Specifies the age that a delayed
buffer must be in seconds before bdflush will write
it out.
v_buf -- Reports the number of I/O buffers.
v_clist -- Reports the number (NCLIST)
of configured character list buffers.
v_file -- Represents the configured maximum
number (NFILE) of open files for the entire system.
v_hbuf -- Reports the number of hash buffers
allocated.
v_inode -- Reports the size of the incore
inode table, which is the number of active inodes system-wide.
v_maxpmem -- Specifies the maximum amount
of memory that can be consumed by a process; if the
value is zero,
then use all of the available memory.
v_maxup -- Defines the maximum number (MAXUP)
of processes that can be run by a non-root user.
v_mount -- Defines the maximum number (NMOUNT)
of mountable filesystems by specifying the size of the
kernel mount
table.
v_pbuf -- Reports the number of allocated
physical I/O buffers.
v_proc -- Reports the size of the process
table (NPROC), and, therefore, the maximum number of
processes system-wide.
v_region -- Specifies the size of the region
table (NREGION). A region is a contiguous area of the
virtual address
space of a process that can be treated as shared or
protected. Each
process has a data, text, and stack region.
BootInfo Data
Boot information differs from vendor to vendor, but
the display includes
the following:
basemem -- Reports the amount of base memory
in the machine.
extmem -- Reports the amount of extended memory
in the machine.
bootflags -- Reports some identifying information
regarding the machine on which the system booted. On
SCO systems the
values are
0x00000001 AT or AT386
0x00000002 Microchannel
0x00000004 Extended ISA
0x00000008 Intel 80486
The memory used and memory available fields
are self-explanatory.
A look at the u386mon source code reveals that much
of
the information available in this display is tuned for
Interactive
UNIX and not for SCO. As a result, the usefulness of
the data is reduced
in some circumstances.
For ISC systems, for example, u386mon can in many cases
identify
the model of the machine on which it is running and
display the video
adapter in use. ISC configurations that it recognizes
includes the
following:
CPU Types |
Monitor Types |
Compaq |
unknown to sys |
PS/2 |
EGA |
Generic 386 |
CGA |
AT&T 6386 |
MONO |
Olivetti M380 |
Compaq MONO |
Dell 386 |
Zenith Z449 |
Dell 325 |
Toshiba T5100 |
Adv Logic Res |
Compaq VGA |
Zenith Data |
VGA |
|
Paradise VGA1 |
|
Video 7 VGA |
Tunable Parameters Data
This section displays a number of the tunable parameters
-- that
is, parameters that can be adjusted in order to optimize
the kernel
resources.
t_ageinterval -- The frequency at which
processes are aged.
t_bdflushr -- The rate at which bdflush
is run.
t_gpgshi -- The high-water mark for page
stealing; the page stealing process will continue until
the value
of freemem is greater than this value.
t_gpgslo -- A control value for freemem;
if freemem falls below this value, stealing of pages
from processes
is triggered.
t_gpgsmsk -- Mask used by the getpages
routine in order to determine what pages can be stolen
t_maxfc -- The maximum number of pages
that will be saved up and freed at once.
t_maxsc -- The maximum number of pages
to be swapped out in a single operation.
t_maxumem -- The maximum size of a user's
virtual address space in pages.
t_minarmem -- The minimum available resident,
or non-swappable, memory that must be maintained in
order to prevent
deadlock.
t_minasmem -- The minimum available swappable
memory that must be maintained in order to prvent deadlock.
Process Information
The process section provides information on the state
of current processes
on the system. The display shows the count of current
processes for
each state.
sleep -- Refers to a process currently
waiting for an event to complete, such as disk I/O.
run -- Denotes a process currently running.
zombie -- Denotes a process which has terminated,
but whose parent did not wait for it to complete. Such
processes typically
show in the process table as <defunct>; they are
not a problem unless
there are many of them.
stop -- Denotes a process stopped by a
debugger.
idle -- Refers an intermediate state in
process creation.
onproc -- Denotes a process being run on
a processor.
xbrk -- Refers to a process that is being
swapped.
Serial I/O Information
Figure 3 shows a sample of the output produced when
the s option
is specified. Unfortunately, non-standard serial device
drivers are
not included here. The current code in u386mon allows
for a
maximum of only 16 ports.
The fields in the Serial I/O display are defined as:
tty -- The name of the serial port.
raw -- The number of characters which are
to be processed on the raw input queue.
can -- The number of characters to be processed
on the raw canonical input queue.
out -- The number of characters waiting
on the output queue.
speed -- The baud rate of the port.
state -- The internal state of the port.
iflag -- Input modes as defined by stty(C).
oflag -- Output modes as defined by stty(C).
cflag -- Control modes as defined by stty(C).
lflag - Line discipline modes as defined
by stty(C).
pgrp -- The process group controlling the
port.
The state field consists of a series of characters
representing the various modes the line can be in. The
characters,
and the modes they represent, are as follows:
B -- Output is in progress on the port.
C -- The software-copy of carrier detect
is present.
D -- A delay timeout is in progress on
the port.
O -- The device is open.
S -- Output is stopped with Control-S.
W -- The process is waiting for the open
to complete.
For more information on the input, output, and control
modes, see
stty(C) in the User's Reference Manual.
Process Status Information
When the p option is specified, u386mon retrieves a
process list and displays it in place of the Sysinfo/Minfo
data. Figure 4
shows a sample of this information, which encludes
the following
fields:
S -- A two-character process status indicator.
The first character defines the process status, the
second defines
the process swap status. The values for the first character
in process
status are:
s sleeping
R ready to run (might be running if u386mon
were not)
z zombie
d stopped by debugger
i idle
p running on processor
x XBREAK -- is growing or shrinking.
An S following this character indicates that
the process is swapped. Otherwise, the process is currently
in memory.
USER -- The username running the process;
if the process is running setuid, then a # appears next
to the user name.
PID The process id.
CPU -- CPU usage, used for scheduling purposes.
UCPU -- User time for this process.
SCPU -- System time for this process.
SIZE -- Size of the swappable image in
pages.
TTY -- Terminal to which the process is
currently attached.
CMD -- The name of the command being executed.
If there isn't enough space on the screen to display
all of the processes,
then some fields will be dropped on the basis of the
following selective
elimination scheme:
1) getty, uugetty, sh, csh, ksh;
2) swapped or zombie processes;
3) sleeping processes.
If there still isn't enough space, a message indicating
this displays, and processing continues.
Kernel Tuning Considerations
Using u386mon to get the information is half the fun;
interpreting
it and using it is more often the challenge. u386mon
can highlight
potential configuration problems which may result in
hardware adjustments
or kernel tuning.
While adjusting kernel parameters may seem the obvious
solution to
certain performance problems, there are tradeoffs involved,
and, in
fact, it is possible to degrade system performance by
reconfiguring
the kernel. This could occur, for example, if you set
tunable parameters
to a level that increases the amount of RAM needed for
the kernel,
thereby, decreasing the amount of RAM available for
user processes.
A decrease in the RAM vailable for user processes may
mean the kernel
will be forced to perform more page swapping, which,
in turn, will
decrease the amount of realtime available for user processing.
Certain kernel tunables, however, may be considered
strong candidates
for adjustment, and there are mechanisms for evaluating
whether or
not adjustment is warranted.
Candidate parameters -- along with methods for evaluating
their
appropriateness -- are listed here. The parameter names
used are
from the SCO UNIX System V/386 environment, and may
not be the same
on your system. For that reason, the definition of the
parameter is
also included.
NFILE This parameter defines the maximum number of files
which may be opened system-wide. When this table is
full, an error
similar to "file table overflow" will be displayed
on the
console device.
To determine the appropriate value for NFILE, you must
first find
the maximum number of files that will be opened by any
single application.
To find that value, log in on two terminals and run
pstat(C)
or crash(ADM) (see Figure 5) on the first terminal.
The value
displayed on the first terminal when you log in on the
second will
be your BASE value. Now run each of your applications
in turn, subtracting
the BASE value from the open file value for the application.
When
you've found the largest number, use the formula shown
in Figure 5
to calculate the value.
NINODE This parameter defines the maximum number of inodes
which may be active system-wide. When this table is
full, an error
similar to "inode table overflow" will display
on the console
device.
Use the procedure described for NFILE to find the largest
number of
active inodes required by any single application, then
use the formula
shown in Figure 6 to calculate the required value. See
Figure 6 also
for the format required for pstat and crash.
NPROC This parameter defines the maximum number of processes
system-wide. When this table is full, an error message
similar to
"cannot fork: too many processes" displays
on the console
(this is not the same as the error that displays "too
many processes"
at the user's terminal -- see MAXUP below).
Again, use the procedure described earlier to determine
what the size
of this table should be. Appropriate commands and the
applicable formula
are shown in Figure 7.
NREGION There are typically three regions for each process
running on the system. The specified value for NREGION
should
be three or four times the value of NPROC. To determine
the
current number of defined regions, use the subcommand
region
to the crash command.
MAXUP This parameter defines the maximum number of processes
that each non-root user can create. If an error message
similar to
"too many processes" displays at the user's
terminal, then
it is MAXUP which has been reached.
Other Methods of Reporting on the Kernel Configuration
Many UNIX systems provide the sysdef command which will
read
some kernel and system configuration files, and generate
a report
on the status of the system. Output from this command
differs from
system to systems (see Figure 8 for output from a Motorola
68020-based
system and Figure 9 for a sample from an SCO UNIX 3.2.4-based
system).
Tricks, Tips, and Other Suggestions
The process of tuning the kernel isn't meant to require
a degree in
the black arts -- or even guru-hood. Simply reading
and adjusting
parameters as required -- and interacting with the software
vendors
-- will do the trick in most situations. You should
be aware that
few operating systems are configured in such a way as
to be suitable
for all purposes "out of the box." Some tuning
will be required
for most systems.
A wealth of books target this topic in general, but
if you want more
information on system tuning specifically, I suggest
that you look
at the sar (System Activity Reporter) which is distributed
as part of UNIX, or get a copy of System Performance
Tuning,
published by O'Reilly and Associates.
About the Author
Chris Hare is the Operations Manager for i*internet
Inc., a
Canadian Internet Service provider. He has worked in
the UNIX environment
since 1986, and in 1988 became the first SCO Authorized
Instructor
in Canada. He is a co-auther of the book Inside UNIX,
and he
is currently focused on networking, security, and perl.