In several articles in Sys Admin, I've discussed various
applications 
that monitor the system and aid in the performance-tuning
process. 
In this article I discuss something that few people
understand but 
that UNIX users deal with constantly: the process management
system. 
I then explain why and how I developed a command I call
mon, 
which is a ps lookalike.
I use a System V environment (SCO UNIX) as the basis
for my discussion 
and I assume some level of C knowledge, as I will be
discussing C 
system header files and code.
The UNIX process management system is a based on a time-sharing
kernel. 
The kernel manages placement of each process in the
CPU for execution. 
This process management facility offers kernel routines
that create 
processes; kernel routines that handle interrupts; and
a process scheduling 
mechanism.
A process is a program in motion. The executable code
stored on the 
disk is not a process but a program.
For every executing process, a process context describes
the system 
resources. UNIX shares the system resources by switching
the process 
contexts. Process contexts consist of (1) text, data,
and stack segments 
contained wholly or partially in memory; and (2) kernel
process data 
structures.
A task state segment, which defines the registers identifying
where the instructions are located in memory.
The Task State Segment
The Task State Segment (TSS) is a hardware construct.
The TSS contains 
a copy of all of the registers needed to locate the
instructions and 
data used by the process. These are:
-- the general registers
-- the segment registers
-- the flags register
-- the instruction pointer register
-- the selectors for the process's Local Descriptor
Table, and the kernel's Global Descriptor Table
-- the Page Descriptor Register, and
-- the read-only stack pointers for the privileged 
execution levels.
The TSS is highly hardware-dependent. The TSS described
here would 
be found in a segmented architecture machine, such as
an Intel 386 
or 486 machine. Non-segmented architecture machines,
such as the Motorola 
680x0 series, will have a very different TSS structure.
The TSS structure 
is listed in /usr/include/sys/tss.h.
The Binary Executable File Structure
The structure of an executable or binary file is related
to the TSS 
because of the hardware dependence of the CPU architectures.
There 
are several types, the two most important being the
Object Module 
Format, which is generally used by Microsoft and XENIX,
and the Common 
Object File Format, which is typical on SCO and AT&T
UNIX systems, 
as well as others. Regardless of the binary type, each
has several 
distinct components:
a text segment, which contains the actual 
machine instructions to be executed
a data segment, which contains the variables 
and structures used by the program
other tables and structures, which contain 
other useful information about the program, including
a symbol table 
and a comment section.
At the beginning of every binary executable is a header
describing 
the contents of the file. This header identifies the
type of binary, 
80286 vs 80386, for example, the size and offset of
the text and data 
segments, the size and offset of the symbol table, and
the entry point 
of the program.
Binary programs on SCO UNIX may be of the Intel Object
Module Format 
(OMF) or the AT&T Common Object File Format (COFF).
The layout of 
a binary under SCO UNIX is shown in Figure 1.
Creation of a Process
When a user issues a command to the command interpreter,
the fork(S) 
system call is executed. fork() creates a new entry
into a 
kernel table known as the process table. The process
table is fixed 
in size. I will discuss it in more detail later in this
article.
Through the fork() mechanism, only an existing process
may create a new process. The fork() call copies the
original 
process to a new process known as the child, and then
executes the 
child process, which may use an exec(S) system call
to load 
another program text for the desired process.
fork() gives the parent process the Process ID (PID)
number 
of the child, and gives the child process a value of
zero. By this 
mechanism, programmers can develop processes that behave
differently 
depending on whether the processes perceive themselves
as parent or 
child. The sample PERL code in Figure 3 illustrates
a small program 
using fork and exec. The fork causes the child 
program to be started, which prints the date. The parent
continues 
execution and sleeps for 10 seconds, prints the message
"the parent 
is dead", and exits.
fork() is accomplished by two kernel routines known
as newproc 
and procdup. Collectively, these two routines allocate
a new 
PID number and create an entry in the process table
for the process, 
then perform the steps listed above.
Process Execution
An exec() system call initially handles the process
execution. 
exec() creates and initializes the context for the new
process. 
If there isn't already a copy of the process running
in the system, 
then a process region is assigned for its text segment
(executable 
code), data segment, and stack.
A process region is a data structure that describes
the segment in 
memory. For example, it inludes the type of segment
(text, data, stack), 
how many memory pages are in the region, the number
of processes sharing 
the region, and more. On SCO UNIX systems, the region
table and associated 
structures are defined in /usr/include/sys/region.h.
UNIX creates four processes on system startup that exist
for the lifetime 
of the system. These processes are the memory scheduler,
the paging 
daemon, the buffer flushing daemon, and the init process.
It is important 
to note that all four of these processes are running
in kernel mode, 
not user mode.
Kernel Mode
In kernel mode state, processes are not preemptible.
In kernel mode, 
the CPU is seized by the process until the process gives
up the CPU 
voluntarily, or the time-slice has expired. While a
process executes 
in kernel mode, signals are saved until the process
exits kernel mode, 
whereupon the signals are processed. This is illustrated
in Figure 2. 
[Editor's note: Zombie processes and processes that
hang the system 
are frequently those trapped in kernel mode. They can't
be interrupted 
in kernel mode. Even signal 9 may not get through.]
There are situations 
where the process is in such a state as to not respond
to signals, 
such as zombie processes. These processes (and others)
do not respond 
as they are typically stuck in kernel mode. When using
the appropriate 
options to the ps command (ps -el on SCO and AT&T
systems), the process 
priorities are listed.
The range for process priorities is 0 to 127, with 0
being highest 
priority, and 127 being the lowest. Priorities 0 to
39 indicate kernel 
mode, and 40 to 127 indicate user mode. It is important
to note that 
processes whose priorities are 26 or higher can respond
to signals, 
and processes whose priorities are less than 26 will
not respond to 
signals. The more common signals and their values are
listed in Figure 4. 
User Mode
User mode is all other states of execution. A user process
can only 
execute instructions from its own text segment, reference
its own 
data segment, and use its own stack.
Some instructions are privileged and require kernel
mode to execute. 
User processes get access to kernel mode by using system
calls, predefined 
kernel routines such as open(), read(), and write(),
or loadable device driver routines. Once in kernel mode,
the process 
can execute instructions from the kernel's text segment,
access the 
kernel's data structures, and use a system stack in
the kernel's u-area.
Switching from user to kernel mode is not a context
switch, but a 
mode switch. The running process continues to execute
after a mode 
switch. With a context switch, a new TSS is loaded and
a new process 
begins execution.
The System Processes
The memory scheduler -- sched, swapper, or PID 0 --
is responsible 
for swapping processes in and out of RAM according to
their priority 
and the available memory on the system. Most UNIX systems
today perform 
demand paging rather than swapping, as older UNIX systems
did. (See 
the sidebar, "Paging and Swapping under SCO UNIX,"
for discussion 
of demand paging and swapping.)
The paging daemon, typically vhand, or PID 2, steals
pages of memory that have not been recently referenced
for use by 
the system or other processes. If the page contains
data or stack segments, then it is saved to the swap
device for later 
retrieval. If the page contains program text, the page
is simply used.
The buffer flushing daemon, usually bdflush, or PID
3, flushes 
"dirty" buffers which have been in the cache
for too long.
Finally, the init process is the first true user process
that is executed. When entering multiuser mode, init
creates 
all of the gettys used to permit login to the system.
Why does /unix not show up in a ps listing? The kernel,
/unix on SCO systems, consists of four distinct parts
that 
execute asynchronously, not as a single entity visible
by name in 
a ps listing. These parts are: