SolarisTM
Resource Management
Peter Baer Galvin
With SolarisTM 9, Sun is bundling the previously unbundled
Solaris Resource Manager. How does it work, and how well does it
work? This month, the Solaris Companion takes it for a spin.
Before Solaris 9, some rudimentary resource management was included
in Solaris. For example, the psrset command controls processor
sets. This command has been available since Solaris 2.6. With this
command, you can create and delete processor sets (which are identified
groups of processors), assign CPUs to those sets, assign threads
to those sets, and display information about the system's processor
sets. In this manner, certain tasks can be run on certain CPUs,
and those tasks can be limited to not using other CPUs. This base
functionality can be useful in a small number of situations, but
for fine-grained management of threads on a system, memory, I/O,
disk use, and more flexible CPU control are all required. Enter
Solaris Resource Manager.
The Solaris Resource Manager (SRM) was an unbundled product before
Solaris 9, and is now included at no cost in the Solaris 9 release
from Sun. This review covers that free version (as implemented in
the 12/02 release). It is included as part of the full operating
system installation, so no extra effort is needed to make it available
on an S9 system.
Concepts
SRM consists of two rather disparate functions -- resource limitations
and fair share scheduling. Think of the first as an extension to
the standard "limits" that are settable within Solaris. The second
is a new scheduler that manages CPU scheduling based on allocated
shares, rather than the usual use-the-most-CPU-cycles kind of scheduling.
The new scheduler will be described in next month's Solaris Companion.
To clarify, SRM is in no way a replacement for domaining or other
"pure" resource use limiters. That is, a crash of the operating
system will take down all processes on that system (or within that
domain), including all SRM jobs. So SRM can help optimize use of
a system and it can allow programs that might usually be mutually
exclusive to live in harmony on a system.
So how would you choose between multiple domains and dynamic reconfiguration
(DR), and Solaris Resource Manager? Domaining provides absolute
operating system separation, so a task within one domain cannot
affect other domains. DR allows resources to move between domains,
but testing and planning must occur, and issues like memory allocation
must be resolved (as an application suddenly has more memory available
to it). SRM is more flexible but does not provide that wall between
applications. It should be used when fine-grained resource control
is required, when resource use changes might be frequent, and on
systems without domaining available. Of course, it could be used
in conjunction with domains for the most complete set of solutions.
Resource Limitations Theory
There are several concepts to understand before making use of
limit management within SRM, which include processes, tasks, and
projects. Processes, tasks, and projects are units of resource allocation.
A task consists of one or more processes, and a project is one or
more tasks. For example, a process, task, or project may be limited
in how much CPU time it can use. If a project is limited, then all
tasks in that process inherit that limit. Likewise, a task limit
is applied to the resource use of all processes in that task.
With SRM, processes are assigned to tasks or projects at login
or through newtask, at, batch, or cron
commands. Once these logical collections are made, you can use commands
such as prctl and newtask to manage resource use by
those groups, and commands like ps, id, prstat,
and the accounting subsystem to view system activities based on
those groups.
New resources that can be managed in this way include use of CPU
cycles, number of threads, amount of CPU time, and maximum address
space (virtual memory). This list expands the previous limitable
resources of number of open files, maximum file size, core dump
size, and data and stack virtual memory size. One key resource not
yet included is physical memory. Network use is manageable by the
separate IPQoS facility (not discussed here, but possibly a topic
for a future column).
These resources can be set to have threshold values, and when
a threshold is reached a local or global action can be triggered.
For example, the process could be killed, or the event could simply
be logged. These thresholds have three privilege levels, as UNIX
administrators might expect. "Basic" means that the owner of the
calling process can modify it; "privileged" means that only the
superuser can modify it; and "system" is fixed at boot time by the
kernel. System thresholds are set to the maximum of the resource
that the kernel is capable of providing.
Resource Limitations Fact
The Sun documentation about SRM is very good, with quite a few
examples. It is weird that Sun mixes network management and resource
management into one document, though. The manual is available at
docs.sun.com: "System Administration Guide: Resource Management
and Network Services".
The definition and management of projects is done via configuration
files and command-line functions. (It can also be done via the Solaris
Management Console.) /etc/project is much like /etc/passwd
in its format and function. It provides project information that
coincides with processes on the system. As a simple example, the
file can be edited with vi, or the projadd, projmod
and projdel commands can be used:
system:0::::
user.root:1::::
noproject:2::::
default:3::::
group.staff:10::::
testproject:11:For testing:pbg::
dontuse:12:Unused:::
projects lists for a user what projects are available:
$ projects
default testproject
All but the last two lines of the configuration file were there from
the system installation. Thus, by default, root processes run in project
"system", and most others in "default". This can be seen in the abridged
ps output:
$ ps -eo user,project,comm
USER PROJECT COMMAND
root system sched
root system /etc/init
root system pageout
root default /usr/dt/bin/dtlogin
root system /usr/openwin/bin/fbconsole
pbg default dtaction
pbg default /usr/openwin/bin/speckeysd
pbg default /bin/ksh
For this example, "testproject" is used. If a user is listed as a
valid member of a project, he or she may execute tasks within that
project. Only the superuser can execute tasks within a project without
being a project member.
A project can be further refined via this configuration file or
commands. The configuration file approach has the benefit of being
resilient to reboots. The file is read at boot time, or when SRM
commands are executed. However, changes made to the configuration
file do not affect processes already running. This example shows
commands to manage the project space.
First, let's create a task within the "testproject" project with
the newtask command:
$ id -p
uid=101(pbg) gid=14(sysadmin) projid=3(default)
$ newtask -p testproject csh
% id -p
uid=101(pbg) gid=14(sysadmin) projid=11(testproject)
Also, any new child processes of a project member are also members
of that project. Note the membership enforcement:
$ newtask -p dontuse
newtask: user "pbg" is not a member of project "dontuse"
The most important resource management command is prctl. It
cannot create a project, but once processes are running within a project,
it can manage their resources.
For example, let's limit the number of threads within a task (assuming
a process is running in project "testproject"). The first command
sets the "basic" limit at five threads, and the second line sets
the privileged limit at eight (that command must be run as root,
although the first one needn't). The third command confirms those
operations:
# prctl -n task.max-lwps -v 5 -e deny -i project testproject
# prctl -n task.max-lwps -t privileged -v 8 -e deny -i project testproject
# prctl -n task.max-lwps -i project testproject
2642: sh
task.max-lwps
5 basic deny
8 privileged deny
2147483647 system deny [ max ]
#
Next we spawn some threads in that project, within a task, to see
the results:
$ newtask -p testproject
$ csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
Vfork failed
Notice that the basic rule was not enforced, but that the privileged
one was. It is unclear what the basic resource limit priority is for,
but privileged obviously works. Of course another task could have
been spawned, and it to would be allowed eight threads in this example.
What if monitoring was desired, but not enforced limits? Enter
the rctladm command. But first, the action of our limit needs
to change from deny to allow (i.e., "none"):
# prctl -n task.max-lwps -t privileged -v 8 -d all -i project testproject
# prctl -n task.max-lwps -i project testproject
2847: sh
task.max-lwps
8 privileged none
2147483647 system deny [ max ]
# rctladm -e syslog task.max-lwps
# rctladm
process.max-address-space syslog=off [ lowerable deny no-local-action ]
process.max-file-descriptor syslog=off [ lowerable deny ]
process.max-core-size syslog=off [ lowerable deny no-local-action ]
process.max-stack-size syslog=off [ lowerable deny no-local-action ]
process.max-data-size syslog=off [ lowerable deny no-local-action ]
process.max-file-size syslog=off [ lowerable deny file-size ]
process.max-cpu-time syslog=off [ lowerable no-deny cpu-time inf ]
task.max-cpu-time syslog=off [ no-deny cpu-time no-obs inf ]
task.max-lwps syslog=notice
project.cpu-shares syslog=off [ no-basic no-local-action ]
The rctladm command tells the system to use syslog whenever
the max-lwps resource limit is reached. Note that for longer-term
settings, /etc/rctladm.conf is used. Now when the thread limit
is exceeded, the offending command is allowed but a syslog entry is
made:
# tail -1/var/adm/messages
Feb 9 20:55:07 sunny genunix: [ID 883052 kern.notice] privileged rctl task.max-
lwps (value 8) exceeded by task 25
So on the whole the facility works nicely, although its rather limited
in, well, what can be limited.
Some other useful project-enabled commands include:
- prstat -J or -T -- Dynamically updated process list,
including project or task summaries
- pgrep -J or -T -- Display the process IDs of processes
in the specified project or task
- pkill -J or -T -- Kill only processes in the specified
project or task
Summary
Solaris Resource Manager is a welcome addition to the core operating
system. It allows control over processes and resources that was
previously available only via commercial tools. This kind of functionality
continues the trend of Solaris moving from a technical computing
operating system to one that can accommodate both technical and
business uses, even within the same operating system instance.
This column described the concepts and showed some basic uses,
but there is quite a lot to this new Solaris facility. There are
plenty of details that must be considered as resource management
is configured, initialized, and used. Many were discussed here,
but some that were not touched on include resource control prioritization,
and using global naming services such as LDAP and NIS+ for resource
management information. Also, extended accounting can be used to
monitor resource use on a project or task basis.
Overall, SRM is worth learning to allow systems managers and administrators
to gain more control over who is doing what on the computers they
manage. Next month, the Solaris Companion will look at the second
half of SRM -- the Fair Share Scheduler.
Peter Baer Galvin (http://www.petergalvin.org) is the
Chief Technologist for Corporate Technologies (www.cptech.com),
a premier systems integrator and VAR. Before that, Peter was the
systems manager for Brown University's Computer Science Department.
He has written articles for Byte and other magazines, and
previously wrote Pete's Wicked World, the security column, and Pete's
Super Systems, the systems management column for Unix Insider
(http://www.unixinsider.com). Peter is coauthor of the Operating
Systems Concepts and Applied Operating Systems Concepts
textbooks. As a consultant and trainer, Peter has taught tutorials
and given talks on security and systems administration worldwide.
|