aix9809
35
September 1998
3 Thread concepts and programming
10 CPU usage monitor
21 Running commands and scripts on
remote hosts
30 CC-NUMA
39 Using signals to kill a process
44 AIX news
© Xephon plc 1998
AIX Update
Published by
Editor
Xephon
27-35 London Road
Newbury
Berkshire RG14 1JL
England
Telephone: 01635 550955
From USA: 01144 1635 33823
E-mail: [email protected]
Harold Lewis
Disclaimer
Readers are cautioned that, although the information in this journal is presented in good
faith, neither Xephon nor the organizations or
individuals that supplied information in this
journal give any warranty or make any repreNorth American office
sentations as to the accuracy of the material it
contains. Neither Xephon nor the contributXephon/QNA
ing organizations or individuals accept any
1301 West Highway 407, Suite 201-405
liability of any kind howsoever arising out of
Lewisville, TX 75067
the use of such material. Readers should
USA
satisfy themselves as to the correctness and
Telephone: 940 455 7050
relevance to their circumstances of all advice,
Contributions
information, code, JCL, scripts, and other
If you have anything original to say about contents of this journal before making any
AIX, or any interesting experience to use of it.
recount, why not spend an hour or two putting
it on paper? The article need not be very long Subscriptions and back-issues
– two or three paragraphs could be sufficient. A year’s subscription to AIX Update, comNot only will you actively be helping the free prising twelve monthly issues, costs £175.00
exchange of information, which benefits all in the UK; $265.00 in the USA and Canada;
AIX users, but you will also gain professional £181.00 in Europe; £187.00 in Australasia
recognition for your expertise and that of
your colleagues, as well as being paid a and Japan; and £185.50 elsewhere. In all
publication fee – Xephon pays at the rate of cases the price includes postage. Individual
£170 ($250) per 1000 words for original issues, starting with the November 1995 issue, are available separately to subscribers
material published in AIX Update.
for £15.00 ($22.50) each including postage.
To find out more about contributing an
article, see Notes for contributors on AIX Update on-line
Xephon’s Web site, where you can download Code from AIX Update is available from
Notes for contributors in either text form or as Xephon’s Web page at www.xephon.com
an Adobe Acrobat file.
(you’ll need the user-id shown on your address label to access it).
© Xephon plc 1998. All rights reserved. None of the text in this publication may be
reproduced, stored in a retrieval system, or transmitted in any form or by any means, without
the prior permission of the copyright owner. Subscribers are free to copy any code reproduced
in this publication for use in their own installations, but may not sell such code or incorporate
it in any commercial product. No part of this publication may be used for any form of
advertising, sales promotion, or publicity without the written permission of the publisher.
Copying permits are available from Xephon in the form of pressure-sensitive labels, for
application to individual copies. A pack of 240 labels costs $36 (£24), giving a cost per copy
of 15 cents (10 pence). To order, contact Xephon at any of the addresses above.
Printed in England.
2
Thread concepts and programming
THE DEFINITION OF A THREAD
All modern Unix operating systems, and other ‘high-end’ operating
systems, such as Windows NT, implement one of two methods of
multiprocessing. The older method is to use multiple concurrent
processes, while the new one is to use threads. A thread is defined as
an independent flow of control that operates within the same address
space as other independent flows of control within a process.
Traditionally, the creation of a new control flow in a Unix program
requires the execution of the fork() system call. This system call is
implemented by the kernel and results in the creation of a complete
copy of the calling process, including all its private and system data,
which demands substantial processor time and uses a lot of memory.
The newly created process is completely separate from its parent and
therefore has to use complicated inter-process communication
primitives in order to coordinate activity and transfer data. By
contrast, the creation of a new thread requires only two system calls
and replication of the thread’s private data (about 64 KB), which
therefore causes much less overhead.
Thread implementations are defined by following POSIX standards:
•
POSIX 1003.4a Draft 4 (implemented in AIX V3 by the DCE
Threads Library)
•
POSIX 1003.1c Draft 7 (implemented in all variations of AIX
V4)
•
POSIX 1003.1c Draft 10 (implemented in AIX V4.3 in both 32bit and 64-bit modes).
THREAD TYPES AND MODELS
There are three different types of thread: user, kernel, and kernel-only
threads. User threads are created and manipulated by the user through
functions defined in the libpthreads.a library. Their mapping to kernel
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
3
threads is implementation-dependent. Kernel threads are operating
system-managed entities that are handled by the system scheduler.
They run in user mode when executing user functions and library
calls, but switch to kernel mode when executing system calls invoked
by the user. Kernel-only threads perform system-related tasks on
behalf of kernel mode programs.
User threads are mapped to kernel threads by the threads library. The
way in which this mapping is carried out is known as the ‘thread
model’. There are three possible thread models available, which
correspond to three different ways to map user threads to kernel
threads: the ‘M:1’, ‘1:1’, and ‘M:N’ models.
The mapping of user threads to kernel threads is done using virtual
processes. A virtual process (VP) is an implicit thread library entity
that represents a CPU that’s able to run a thread. For a kernel thread,
VPs represent real CPUs, while for the user threads they represent a
kernel thread or a structure bound to a kernel thread.
The M:1 model
In the M:1 (m-to-one) model, all user threads are mapped to one kernel
thread (or process), and all user threads run on one VP. The mapping
and all thread programming features are handled by the thread library.
This model can be used on any computer system including the
traditional single-threaded systems. DCE threads were implemented
in AIX V3 this way.
The 1:1 model
In the 1:1 (one-to-one) model, each user thread is mapped to one
kernel thread and each thread runs on one VP. Most of the user thread’s
programming is performed by the kernel thread. This model is
implemented in AIX V4.1, AIX V4.2, and AIX V4.3.
The M:N model
In the M:N (m-to-n) model, user threads are mapped to a pool of kernel
threads. A user thread can be bound to a specific VP or share a number
of unbounded VPs. This is the most efficient and complex thread
model; the user thread’s programming tasks are performed by thread
4
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
libraries and kernel threads. This model is implemented in AIX
V4.3.1.
THREAD SCHEDULING
Each thread object has a set of scheduling parameters. These parameters
are associated with the thread when the thread is created by passing
an appropriately initialized thread attribute to the pthread_create()
function.
The scheduling priority of the thread is an integer value between 1 and
127, the highest priority being 127. The scheduling policy has three
possible values: SCHED_OTHER, SCHED_FIFO, and SHECD_RR.
SCHED_OTHER is the default scheduling policy. It implements the
standard AIX scheduling algorithm that decreases the priority of
threads that are CPU intensive. SCED_FIFO is a strict ‘first-in-firstout’ algorithm that results in all threads with same priority running
uninterrupted to completion. This policy should be used with extreme
care as it can effectively destroy the performance benefits of
multithreading. Another thread attribute that influences thread
scheduling is the thread’s contention scope.
THREAD CONTENTION SCOPE
The contention scope attribute can take one of two values:
PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM.
‘Process’ or ‘local’ contention scope specifies that the thread is to be
scheduled in competition with all other local contention scope threads
in the process. System or global contention scope specifies that the
thread is to be scheduled in competition with all other threads in the
system. The contention scope is only meaningful in a M:N library
implementation. An attempt to set the contention scope attribute to
PTHREAD_SCOPE_PROCESS on an operating system prior to AIX
4.3.1 will fail, producing an appropriate error message.
THREAD SMP SCHEDULING STRATEGY
When a thread is scheduled to run on a computer equipped with more
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
5
than one CPU (SMP), a processor must be chosen to run the thread.
There are two ways to perform this selection: selecting the same
processor all or most of the time or selecting any available processor.
The first method, known as ‘processor binding’, can reduce the need
to access memory, as the processor’s cache may still contain data and
instructions from the previous execution of the thread. The second
method, called ‘opportunistic affinity’, tends to result in more efficient
use of a computer’s processors. AIX V4 implements a strategy of
opportunistic affinity, though it also allows threads to be bound to a
specific processor by the execution of the bindprocesor() function of
the libpthread.a library.
THREAD-ENABLED COMMANDS
With the introduction of kernel threads in AIX V4, a number of system
monitoring commands were extended to report thread-related data.
ps - display process information
A new flag, -m, has been added to this command. When issued from
the command line with this flag, ps displays information about threads
(one line per thread). This flag should be combined with process
selection flags (e, a, k, etc) and format flags (I, F, o, etc) to home in
on the information about threads that the user requires. For instance,
the command ps -elm produces long format display (-l) about all
processes other than kernel processes (-e) and the threads associated
with these processes (-m).
A new format flag, -F, has also been added to the command. Using this
flag with a regular ps command, such as ps -o THREAD, causes
additional thread information to be displayed.
netpmon – network I/O and network-related CPU statistics
This command has a new flag, -t, which prints CPU statistics on a perthread basis. When this flag is used, each report line describing
process statistics is followed by lines describing the CPU usage of
each of the process’s threads.
Below is an example of part of the output of netpmon -t.
6
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
Process CPU Usage Statistics:
----------------------------Network
Process (top 20)
PID
CPU Time
CPU %
CPU %
---------------------------------------------------------netscape_aix4
11692
23.4825
67.442
0.000
32077
23.4825
67.442
0.000
16732
6.3278
18.173
0.004
34671
6.3278
18.173
0.004
2358
2.1443
6.158
0.000
3169
2.1443
6.158
0.000
16598
0.5460
1.568
0.000
41877
0.5460
1.568
0.000
1032
0.2049
0.589
0.589
Thread id:
2065
0.0534
0.153
0.153
Thread id:
1807
0.0515
0.148
0.148
Thread id:
1549
0.0477
0.137
0.137
Thread id:
1291
0.0523
0.150
0.150
Thread id:
1033
0.0000
0.000
0.000
Thread id:
netscape_aix4
Thread id:
X
Thread id:
netpmon
Thread id:
gil
tprof – reports CPU usage for the system and individual programs
The -t flag has been added to this command to constrain the report to
a specific process_id and its children, adding a new Thread
Identification (TD) column.
sar – reports system activity counters
While the format of the report hasn’t changed, the meaning of the
following flags is now subtly altered:
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
7
•
-q refers to the average of threads and not the processes
•
-w reports the number of thread, not process, switches per second.
vmstat – report statistics on kernel threads and other system activity
The r, b, and cs column headings respectively refer to:
•
The number of kernel threads placed on the run queue per second
•
The number of kernel threads placed on the wait queue per second
•
The number of thread context switches placed on the run queue
per second.
BASIC THREAD PROGRAMMING EXAMPLE
The following program displays the basic features of any threadbased program, including:
•
Setting thread attributes to required values
•
Creating threads
•
Passing an argument to a thread
•
Binding a thread to a processor
•
Passing return data from the thread
•
Terminating the thread.
Please note that this example (and any other programs that implement
threads) should be compiled using either xlc_r or xlC_r. These
commands assure compilation using the correct parameters and
linkage with libraries that are thread-safe.
THREAD PROGRAM
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
void *Thread(void *arg)
{
int rc;
8
/* For pthread-related macros and functions */
/* For formatted I/o */
/* Thread function */
/* Return code */
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
/* Bind thread to processor #0 */
rc = bindprocessor(BINDTHREAD, thread_self(), 0);
/* Report argument passed by the calling program */
printf("String passed from the main program is: %s\n", (char *)arg);
/* Terminate thread and pass argument back to main program */
pthread_exit("Hear you main!");
}
void *main(int argc, char **argv)
{
pthread_t thread;
pthread_attr_t attr;
char targ[] = "Hello thread!";
int rc;
struct sched_param sched;
char *thread_rc;
/* Thread variable */
/* Thread attributes variable */
/* Argument to be passed to the
thread */
/* Scheduling attributes */
/* Threads return value */
/* Initialize thread attributes */
pthread_attr_init(&attr);
/* Set detachechstate attribute to CREATE_UNDETACHED to allow
storage and successful retrieval of thread's return status */
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_UNDETACHED);
/* Set sceduling_policy and scheduling_priority attributes */
sched.sched_policy = SCHED_RR;
sched.sched_priority = 80;
/* Create the thread and pass to it the argument */
rc = pthread_create(&thread, &attr, Thread, (void *)targ);
if (rc) {
perror("Thread invocation failed!\n");
exit(1);
}
/* Wait for threads termination and receive its return value */
pthread_join(thread, (void *) &thread_rc);
printf("Thread returned: %s\n", thread_rc);
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
9
exit(0);
}
REFERENCES
1
Introduction to Multithreaded Programming by Chary Tamirisa,
AIXpert, November 1994.
2
Multithreaded Implementations and Comparisons, A White Paper,
Sun Microsystems, Part No: 96168-001 (1996).
3
PThreads Primer by Bill Lewis and Daniel Berg, SunSoft Press
(1996).
4
Programming with Threads by Devang Shah, Steve Kleiman,
and Bari Smaalders, SunSoft Press (1996).
5
Thread Time by Scott Norton and Mark Dipasquale, Prentice Hall
(1996).
6
Programming with POSIX Threads by Dave Butenhof, Addison
Wesley (1997).
A Polak
Systems Engineer
APS (Israel)
© Xephon 1998
CPU usage monitor
CPU usage monitor is a shell script that plots a graph of CPU usage
(in percentage terms) on the y coordinate against the number of
required cycles (by issuing the command sar 1 1 repeatedly) on x
coordinate. The script runs under VT100 terminal emulation software
for AIX. The actual CPU usage is worked out from the output of sar
1 1 command, which takes one sample per second. The graph shows
CPU usage in steps of 5% and should serve as a broad indicator of
CPU usage.
10
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
LISTING
#####################################################################
#
#
Name
: cum (CPU usage monitor)
#
#
Description : The script plots a graph of CPU usage against
#
number of one second-interval cycles.
#
#
It includes the following functions:
#
#
o DefineVariables
#
o DisplayMessage
#
o MoveCursor
#
o HandleInterrupt
#
o GetRequiredNumberOfSamples
#
o DrawCpuUsageGraoph
#
o PopulateCpuUsageGraph
#
#
Notes
: 1 The command used to monitor CPU usage is sar 1 1,
#
which takes one sample per second for all
#
processors combined.
#
#
2 The script repeats the sampling process and plots
#
a graph showing the number of cycles input by the
#
user.
#
#####################################################################
#####################################################################
#
#
Name
: DefineVariables
#
#
Description
: Defines all variables.
#
#####################################################################
DefineVariables
{
(
)
#
# define cursor home
#
X_HOME=1
Y_HOME=1
#
# define escape sequences
#
ESC="\0033["
RVON=_[7m
# revrese video on
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
11
RVOFF=_[27m
# reverse video off
BOLDON=_[1m
BOLDOFF=_[22m
# bold on
# bold off
BON=_[5m
BOFF=_[25m
# blinking on
# blinking off
FEC=1
SEC=0
# failure exit code
# success exit code
SLEEP_DURATION=3
# number of seconds for sleep command
ERROR="${RVON}${BON}cum.sh:ERROR:${BOFF}"
INFO="${RVON}${BON}cum.sh:INFO:${BOFF}"
#
# define signals
#
SIGNEXIT=0 ; export SIGNEXIT
# normal exit
SIGHUP=1
; export SIGHUP
# session disconnected
SIGINT=2
; export SIGINT
# ctrl-c
SIGTERM=15 ; export SIGTERM
# kill command
#
# define message
#
INVALID_SAMPLE="\${NO_SAMPLES}, is out of range${RVOFF}"
INVALID_NUMBER="\${NO_SAMPLES}, is a bad number${RVOFF}"
INTERRUPT="Program Interrupted! Quitting early${RVOFF}"
}
#####################################################################
#
#
Name
: MoveCursor
#
#
Description
: The function moves the cursor to a given point.
#
#
Input
: y coordinate value
#
x coordinate value
#
#####################################################################
MoveCursor (
{
)
YCOR=$1
XCOR=$2
print -n
12
"${ESC}${YCOR};${XCOR}H"
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
}
#####################################################################
#
#
Name
: DisplayMessage
#
#
Description
: The function displays a given message
#
# Input
: Message Type (E = Error, I = Information)
#
Message Code
#
#####################################################################
DisplayMessage (
{
)
MESSAGE_TYPE=$1
MESSAGE_TEXT=`eval echo $2`
MoveCursor 24 1
if [ "${MESSAGE_TYPE}" = "E" ]
then
echo "`eval echo ${ERROR}`${MESSAGE_TEXT}\c"
else
echo "`eval echo ${INFO}`${MESSAGE_TEXT}\c"
fi
sleep ${SLEEP_DURATION}
}
#####################################################################
#
# Name
: HandleInterrupt
#
# Overview
: The function displays an appropriate message and
#
exits returning a failure code.
#
#####################################################################
HandleInterrupt ()
{
DisplayMessage I "${INTERRUPT}"
echo "${RVOFF}"
clear
# MoveCursor ${Y_HOME} ${X_HOME}
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
13
exit
$FEC
}
#####################################################################
#
#
Name
: DrawCpuUsageGraph
#
#
Description : The function:
#
#
o draws x and y coordinates
#
o labels the coordinates
#
o marks X coordinate with time interval
#
o marks Y coordinate with perecent used
#
#
Notes
: 1 Cursor position (1, 1) is to top left hand
#
corner of the screen.
#
#
2 The y coordinate is marked from 0 to 100
#
percent with inetrval of 10 percent.
#
#
3 The x coordinate is marked from 0 to 50 with
#
an interval of 5 cycles
#
#####################################################################
DrawCpuUsageGraph
{
(
)
trap "HandleInterrupt " $SIGINT
$SIGTERM $SIGHUP
clear
#
# fix the coordinate for reference point (not necessarily the origin
# of the graph)
#
X_REF=10
Y_REF=3
#
# fix the coordinate for the first label (CPU usage) on the y
# coordinate
#
X_LABEL1=6
Y_LABEL1=1
#
# fix the coordinate for the second label ('^')
#
X_LABEL2=10
Y_LABEL2=2
#
14
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
# move the cursor to the right position for the first label
#
MoveCursor ${Y_LABEL1} ${X_LABEL1}
#
# display the first label
#
echo "${BON}CPU Usage(%)${BOFF}"
#
# move the cursor to the right position for the second label
#
MoveCursor ${Y_LABEL2} ${X_LABEL2}
#
# display the second label
#
print -n "^"
#
# move the cursor to the reference point
#
MoveCursor ${Y_REF} ${X_REF}
#
# draw the y coordinate using a temporary variable Y_COR
# that will be incremented while the line is drawn downwards
# from the reference point
#
Y_COR=${Y_REF}
while true
do
if [ ${Y_COR} -eq 23 ]
then
#
# reached status line
#
break
fi
MoveCursor ${Y_COR} ${X_REF}
echo "|"
Y_COR=`expr ${Y_COR} + 1`
done
#
# save the current location as the
# origin of the graph
#
X_ORIGIN=${X_REF}
Y_ORIGIN=${Y_COR}
#
# draw X coordinate from the origin
#
MoveCursor ${Y_ORIGIN}
${X_ORIGIN}
echo "-----5----10---15---20---25---30---35---40---45---50->
${BON}Cycle(s)${BOFF}"
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
15
#
# draw percentage indicators on the y coordinate starting at the
# origin and decrementing Y_COR (currently 23) by two, marking
# the axis with lables showing multiples of 10
#
MoveCursor ${Y_ORIGIN} ${X_ORIGIN}
Y_COR=`expr
MoveCursor
echo "10"
Y_COR=`expr
MoveCursor
echo "20"
Y_COR=`expr
MoveCursor
echo "30"
Y_COR=`expr
MoveCursor
echo "40"
Y_COR=`expr
MoveCursor
echo "50"
Y_COR=`expr
MoveCursor
echo "60"
Y_COR=`expr
MoveCursor
echo "70"
Y_COR=`expr
MoveCursor
echo "80"
Y_COR=`expr
MoveCursor
echo "90"
Y_COR=`expr
MoveCursor
echo "100"
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
${Y_COR} - 2`
${Y_COR} ${X_ORIGIN}
}
#####################################################################
#
#
Name
: PopulateCpuUsageGraph
#
#
Description : This function populates the cpu usage graph with
#
cpu usage statistics sampled by the command sar 1 1
#
#
Notes
: 1 The sampling is repeated over the number of
#
cycles required, which is captured as a command
#
line argument.
16
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
#
#
2 CPU usage, the y coordinate, is rounded to the
#
nearest five percent or multiple of five percent.
#
#
3 Each point on the x coordinate is one cycle.
#
#####################################################################
PopulateCpuUsageGraph ( )
{
trap "HandleInterrupt " $SIGINT
$SIGTERM $SIGHUP
#
# turn reverse video on
#
echo "${RVON}"
NO_OF_REQ_CYCLES=$NO_SAMPLES
NO_CUR_CYCLES=0
#
# move the cursor to origin
#
# MoveCursor ${Y_ORIGIN} ${X_ORIGIN}
#
# re-define the origin to facilitate the drawing the graph
#
X_ORIGIN=`expr ${X_ORIGIN} + 1`
Y_ORIGIN=`expr ${Y_ORIGIN} - 1`
#
# initialize a variable for drawing the x coordinate value
# that is incremented from data from sar sampling
#
X_START=${X_ORIGIN}
#
# sample cpu usage and draw the graph
#
while true
do
#
# re-initialize a variable for drawing the y coordinate
#
Y_START=${Y_ORIGIN}
#
# compare cycles
#
if [ ${NO_CUR_CYCLES} -ge ${NO_OF_REQ_CYCLES} ]
then
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
17
break
fi
#
# run sar the command
# process the last line, which contains the percent idle figure
#
PERCENT_IDLE=`sar 1 1 | tail -1 | awk {'print $5'}`
PERCENT_USED=`expr 100 - ${PERCENT_IDLE}`
#
# work out percent used as a multiple of 5
#
YVAL_FOR_PERCENT_USED=`expr ${PERCENT_USED} / 5`
#
# adjust it for PERCENT_USED being less than 5
#
if [ ${YVAL_FOR_PERCENT_USED} -eq 0 ]
then
YVAL_FOR_PERCENT_USED=1
fi
#
# initialize a counter for drawing cpu usage
# on y coordinate
#
YVAL_CTR=1
#
# start drawing the cpu usage on y coordinate
#
while true
do
MoveCursor ${Y_START} ${X_START}
echo " "
#
# decrement Y_START
#
Y_START=`expr ${Y_START} - 1`
YVAL_CTR=`expr ${YVAL_CTR} + 1`
if [ ${YVAL_CTR} -gt ${YVAL_FOR_PERCENT_USED} ]
then
break
fi
done
#
# increment x coordinate value
#
X_START=`expr ${X_START} + 1`
#
# increment nunber of cycles
#
NO_CUR_CYCLES=`expr ${NO_CUR_CYCLES} + 1`
done
18
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
echo "${RVOFF}"
}
#####################################################################
#
# Name
: GetRequiredNumberOfSamples
#
# Description
: The function asks the user for number of cycles
#
required.
#
# Notes
: 1 The function restricts the number of cycles to
#
fifty or fewer.
#
#####################################################################
GetRequiredNumberOfSamples ( )
{
trap "HandleInterrupt " $SIGINT
$SIGTERM $SIGHUP
while true
do
clear
echo
"Cycle is the number of times the sampling is to repeat"
echo
"Enter number of cycles (up to fifty) required:\c"
read NO_SAMPLES
case
$NO_SAMPLES in
"" ) : ;;
* ) #
# check for digits only
#
if [ `expr $NO_SAMPLES + 1 2> /dev/null` ]
then
#
# check for less or equal to 50
#
if [ ${NO_SAMPLES} -le 50 ]
then
break ;
else
DisplayMessage E "${INVALID_SAMPLE}"
fi;
else
DisplayMessage E "${INVALID_NUMBER}"
fi ;;
esac
done
}
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
19
#####################################################################
#
# Name
: main
#
# Description : The function invokes all other functions
#
#####################################################################
main ()
{
DefineVariables
GetRequiredNumberOfSamples
DrawCpuUsageGraph
PopulateCpuUsageGraph
clear
exit $SEC
}
#
# execute main
main
Figure 1: Output chart
20
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
SAMPLE OUTPUT
The graph (Figure 1) gives a broad indication of CPU usage and can
be used to observe the overall CPU usage of strongly CPU-bound
programs. In these circumstances, the graph should depict peaks
corresponding to high CPU usage over cycles in which the program
runs.
Arif Zaman
Analyst/Programmer
High-Tech Software Ltd (UK)
© Xephon 1998
Running commands and scripts on remote hosts
INTRODUCTION
When a system comprises several machines, it is not unusual for the
system administrator to have to run the same system command or
script on each machine that makes up the system. For example, to
ensure that the system’s date is correctly synchronized it’s necessary
to run the date command on every server in the system and check the
results.
One quick way of running remote commands is using the rsh facility,
which allows the execution of commands on remote RS/6000s. For
example, to run the date command from a local host called cervino on
a remote host named lyskamm use the command rsh lyskamm date.
This displays the output of date run on lyskamm on the local system.
While this is a fairly straightforward example, much more useful tasks
can be accomplished with this facility. For instance, in an educational
environment, where users have login access to multiple servers, rsh
can be used to synchronize user properties on different machines and
to uncover discrepancies in user settings (eg home directory, password
expire time, initial shell login, etc). For instance, to control the user
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
21
smith on a remote machine you simply type (as root) rsh hostname
lsuser -f smith, where hostname can be lyskamm or any other host in
the network.
It’s not only root users that benefit from the ability to execute
commands remotely – this facility is also useful when looking for
users on other nodes, obtaining the load average on different hosts,
etc.
BACKGROUND
The mechanism for remotely executing commands is very simple.
First of all the user that requires this facility (smith) must be defined
on every host that allows remote execution – when a request for the
execution of a command arrives from the local host (cervino) on the
remote host (lyskamm), the remote host validates the user and checks
that he has permission to run commands remotely by checking the
configuration file /etc/hosts.equiv – entries in this file have the
following format:
# Example of /etc/hosts.equiv file
# To be put on "lyskamm"
# name of the host
cervino
user who can execute rsh
+smith.
This entry ensures that user smith on cervino is allowed to perform
commands (or perform a remote login) on the local system without
supplying a password or other form of validation. User smith is
therefore considered ‘trusted’.
If the entry below is present in the /etc/hosts.equiv file of host
lyskamm, it means that every user on cervino (if he/she exists) can
login to the local system without supplying a password.
# Example of /etc/hosts.equiv file
# "name of the host"
cervino
Obviously this is a point of weakness for security in your network,
especially if users are able to login to a remote system as root. For
22
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
security reason, root authority must never be granted using the /etc/
hosts.equiv file.
Notice that, even if you make sure you’ve limited access via /etc/
hosts.equiv, there is another mechanism that can be used to logon
remotely and even obtain root authority. If a user wants to gain access
from cervino to lyskamm and (vice versa), all he needs to do is create
a file named .rhosts in his home directory. The format of this file is the
same as that of /etc/hosts.equiv. In this example, user smith creates the
following file on lyskamm:
#example of .rhosts file
#to be put under ~smith directory
cervino
and an equivalent file on cervino:
#example of .rhosts file
#to be put under ~smith directory
lyskamm
This allows this user to be trusted between cervino and lyskamm.
This individual method for authenticating remote users is also valid
for the root user. Remember that, when the operating system receives
a request for a remote command, it first analyses the /etc/hosts.equiv
file, then, if trusting is unsuccessful, it analyses individual .rhosts
files. The use of hosts.equiv is quite involved in itself; if you need to
obtain more information about this file, consult the relevant man
page.
A REAL EXAMPLE
We’ll now discuss an example where all the concepts discussed so far
are applied.
It is common for AIX systems to work in an environment where hosts
are ‘clustered’ to form larger, more complex domains. When you have
lots of users (analysts, programmers, scientific users, etc) that utilize
system resources heavily, this allows workload to be divided among
different hosts. This difficult set-up can achieved by allowing every
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
23
/home
Lyskamm
/home NFS
/home
Marguareis
Cervino
/home
Tenibres
Figure 1: The network set-up
user to login to any host. Obviously this configuration is good as long
as users have access to the same environment at every system they use,
which means that wherever the user logs in, he or she has access to the
same home directory and files.
Figure 1 above illustrates this configuration: the host cervino acts as
an NFS server, holding users’ home directories on one or more
filesystems. Other hosts (lyskamm, tenibres, and marguareis), which
collectively comprise the ‘client’, mount these filesystems remotely,
allowing users to work as if files were local. If default NFS settings
are used, only the root user of the NFS server has permission to modify
and/or deleting files, etc – generic users must be defined both on client
and server systems.
Different implementations of this type of system can be far more
complex than this, though the concepts remain the same. For instance,
24
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
it is possible to create groups of servers by area of interest, or to grant
access to faster machines only to appropriate users and so on.
Whether the implementation is simple or complex, two main aims
have to be achieved:
•
Provide users with the same environment on each host
•
Allow the system manager to monitor the system efficiently.
To achieve these two results it is necessary to implement the correct
trust policy among hosts. To trust generic users it is necessary to edit
/etc/hosts.equiv and add the appropriate entry.
# /etc/hosts.equiv file to be put on each machine.
cervino
tenibres
marguareis
lyskamm
In this way users can, for instance, execute the command uptime on
each host, looking for the one with lowest workload. A good way to
perform this action is by using the script rcom (discussed later).
The trust relationship for the root user is a little bit more complicated:
firstly we configure the host cervino as the ‘preferential’ machine for
system operators. The local root user is then able to execute commands
on other hosts. By contrast, the root users of lyskamm, tenibres, and
marguareis are not able to perform any functions – either rlogin or
remote execution – on cervino, the NFS server. In this way you can
provide root access (with caution of course!) for administrative use on
client hosts, and not have to worry about the integrity of users’ data
stored in the server.
To complete this task you have to create the following .rhosts file in
each machine under the root user’s $HOME directory:
# .rhosts file to be put on each machine under ~root directory
cervino
root
That’s all!
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
25
MAKING ADMINISTRATORS’ WORK EASIER
To enhance the ability to administer such a system we’ll use two shell
scripts. The first one, rcom, allows users to execute rsh on a number
of hosts without having to repeat the command on each one. For
example, issuing the command rcom df shows the status of mounted
filesystems on each node forming the cluster. To customize the script,
change the HOSTLIST variable to suit your environment. Typing only
rcom results in a remote login at every host in the cluster.
If you have to execute complex commands involving pipes or other
special characters, remember to enclose your command in double
quotation marks (‘"’). For example:
rcom "who | grep john"
Shows you the host on which user john is logged.
As before, you can use this script as a user from any host in the cluster
or as root from the NFS server.
RCOM
#!/usr/bin/ksh
# To be changed according to your environment
HOSTLIST="cervino lyskamm marguareis tenibres"
tput bold
# this is for setting the "bold" style...
echo "$*"
tput sgr0
# returning in normal mode
echo "will be executed on $HOSTLIST\n"
sleep 3
for HOST in $HOSTLIST
do
echo $HOST
rsh $HOST $*
echo
done
Another shell script, named rscript, is useful for executing not just
one command or a stream of commands on multiple hosts, but a
complex, though non-interactive, script or executable file. For example,
26
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
you could create a script that performs automatic tasks, such as
checking filesystem quota, disk usage per user, erasing dummy files,
etc, and run it on multiple hosts using this facility (it’s my intention to
submit scripts along these lines in future). As before, it’s convenient
to use a single workstation to execute the administrative script on
other systems and collect results. To do so, simply type rscript
name_of_script_to_be_executed. The script distributes the executable
file among the hosts contained in HOSTLIST, then executes the file on
each host, collecting the results and merging them in a single file
named /tmp/rscript.out on local machines. When rscript has finished,
it also displays results on the screen.
Note that rscript creates temporary files in /tmp, so avoid running
more than one instance of this program at the same time and restrict
the use of this tool to the root user. In any case, the script uses the trap
command to avoid leaving temporary files in /tmp should the script
terminate abnormally after receiving an interrupt signal (Ctrl+C).
RSCRIPT
#!/bin/ksh
# Modify HOSTLIST according to your environment
HOSTLIST="tenibres cervino lyskamm marguareis"
NAME="$(basename $0)"
PREFIX="/tmp/$NAME"
OUT="$PREFIX.out"
trap i_stop 2
i_stop ()
{
echo "\nCleaning temporary files ..."
set -f
i_do_exec rm -f /tmp/torun*
set +f
echo "Done."
exit
}
i_do_exec ()
{
for HOST in $HOSTLIST
do
echo "$HOST \c"
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
27
rsh $HOST $*
done
echo "... Done."
}
i_do_copy_1 ()
{
echo "Distributing $1 on ..."
for HOST in $HOSTLIST
do
echo "$HOST \c"
rcp $1 $HOST:/tmp/torun
done
echo "... Done."
}
i_do_copy_2 ()
{
LOCAL=$(hostname)
echo "\nCopying output from remote hosts:"
for HOST in $HOSTLIST
do
echo "$HOST \c"
rcp $HOST:/tmp/torun.$HOST $PREFIX.$HOST
done
echo "... Done."
}
######################################################################
# Main
#
######################################################################
if ["$1" = "" -o ! -x "$1" -o "$1" = "help" -o "$1" = "-help" -o "$1" =
"-h"]
then
echo "Usage:\t$(basename $0) <file> [parameter]\n\twhere the file is
an executable or a script"
exit
fi
if [ -f $PREFIX* ]
then
rm -f $PREFIX*
fi
if [ -f /tmp/torun* ]
28
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
then
rm /tmp/torun*
fi
i_do_copy_1 $1
echo "\nMaking executables each /tmp/torun"
i_do_exec chmod +x /tmp/torun
echo "\nExecuting command on remote hosts ... (Please Wait)"
i_do_exec "/tmp/torun > /tmp/torun.\$(hostname)"
i_do_copy_2
echo "\nPrinting the list of files created"
find /tmp -name "$NAME.*" -exec ls -l {} \;
echo "\nResults will be stored in $OUT."
sleep 1
> $OUT
for FILE in $(find /tmp -name "$NAME.*" -print)
do
if [ "$FILE" != "$OUT" ]
then
echo "$FILE" >> $OUT
cat $FILE >> $OUT
echo "\n---------------------------------------------------------\n"
>> $OUT
fi
done
ls -la $OUT
echo "Done!"
echo "\nDo you want to see the command's result $OUT? ([Y]/n)"
read ans
if [ "$ans" = "Y" -o "$ans" = "y" -o ! "$ans" ]
then
more $OUT
fi
i_stop
Aiello Maurizio, Cleis Technology (Italy)
Marquez Fabio, Elsag Bailey (Italy)
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
© Xephon 1998
29
CC-NUMA
The acronym CC-NUMA has been bandied about in the computer
trade press over the past year or so, often with little explanation. The
first part of this article describes the CC-NUMA architecture and
discusses some of its characteristics, advantages, and disadvantages.
The second part then goes on to discuss some of the particular points
to consider when evaluating or comparing CC-NUMA systems.
CC-NUMA: THE DEFINITION
CC-NUMA stands for Cache Coherent, Non-Uniform Memory Access.
To explain what this means, let’s first of all consider a standard
Symmetrical Multi Processor (SMP) system (Figure 1).
CPU
CPU
CPU
CPU
System Bus
I/O
Memory
PCI
Figure 1: A ‘standard’ SMP system
In this architecture, all CPUs can access all memory and all I/O. The
access time from a CPU to memory and CPU to I/O is independent of
CPU. There is a single operating system image which runs on all
CPUs.
30
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
It is now possible to link several independent SMP systems (I’ll refer
to such systems as ‘modules’) via a high-bandwidth, low-latency
interconnect, as shown in Figure 2.
High speed low latency
module interconnect
CPUs
Interconnect
Interface
CPUs
Interconnect
Interface
CPUs
Interconnect
Interface
Memory
I/O
....
Memory
I/O
Module 0
Memory
I/O
Module n
Module 1
PCI
PCI
PCI
Figure 2: CC-NUMA
As with a ‘standard’ SMP system, there is only one instance of the
operating system, which runs on all CPUs. All system resources
(memory and I/O) are visible to all CPUs. There is a global memory
map, comprising the memory contained in all modules. For example,
the memory address range on module 0 may be from 0-16 GB, and on
module 2 from 16 GB to 32 GB, and so on.
However, the access time from a CPU on module 0 to memory on the
same module is always going to be less with this architecture than the
time to access data on a different module. This is because an access to
memory local to the CPU only is via the local system bus, while an
access to memory on a remote module has to:
1
Cross the local module’s bus to the interconnect interface
2
Cross the interconnect to the remote module’s interconnect
interface
3
Cross the remote module’s bus to memory.
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
31
The data must then make the reverse trip to the requesting module.
It is this non-uniformity in memory access time that gives rise to the
NUMA part of the CC-NUMA acronym. The ratio below is the
‘NUMA factor’, which is a measure of the difference in the time taken
for local and remote access.
remote memory access time
------------------------local memory access time
Now that ‘NUMA’ has been defined, let’s look at the ‘CC’ part of the
acronym. CC stands for cache coherent. This means that system
hardware is responsible for maintaining data coherency across modules.
For example, say CPU 0 on module 0 accesses variable A, which is
located at a memory address on module 2. CPU 0 now updates this
value, for instance by incrementing it by one. If CPU 4 on module 1
now tries to access the same variable, and reads the memory location
on module 2, the value stored is invalid or ‘stale’. The interconnect
hardware detects this situation, intercepts the second memory access,
and directs CPU 0 to provide the updated value. This is similar, though
not identical, to the technique for maintaining level-1 and level-2 CPU
caches coherent on a standard SMP system.
Managing data coherency through hardware has the important
characteristic of maintaining the standard SMP programming model.
This means that applications written for SMP systems run in an
identical manner under CC-NUMA-based systems. In fact, CCNUMA is just a means of implementing high-order SMP systems.
‘Standard’ SMP systems are known as UMA systems (for Uniform
Memory Access).
It must be pointed out, however, that if data is write-shared by several
processes and/or threads, then it’s the responsibility of the application
to ensure that concurrent accesses to the same data occur in a
controlled manner, for example through the use of locks. Cache
coherency only ensures that when a CPU reads a memory address it
receives the most up-to-date value. If two CPUs modify data and try
to write it back to memory without the use of locks or atomic
operations, then the probable result is incoherent data.
32
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
OPTIMIZATIONS
Now that the basic CC-NUMA architecture has been described, we
can now look at a number of optimizations that are possible. All of
these optimizations try to place data close to the CPUs that access it
(that is, on the same module). There are two complementary
approaches:
1
Place or move the data close to the CPU
2
Schedule the executable thread on a CPU close to the data that it
will access.
For the following discussion I will use the following terms:
•
Local memory is memory that is on the same module as the CPU
that accesses it.
•
Remote memory is memory that is on a different module from the
CPU that accesses it.
•
The home module is the module where a given memory address
is located.
•
The owner module is the module that has an updated value of a
given memory address.
The last two items deserve some clarification. Clearly a given address
in real memory must reside on a given module. For example address
A may be on module x. In this case x is said to be the home module.
When a CPU on a module, say module y, requests the data value held
at address A, then module y becomes the owner module. Thus the
home module and owner module may be one and the same, but are not
necessarily so.
Optimizations are important for CC-NUMA and can have a major
impact on system performance. The NUMA factor is typically in the
range of three to twenty, depending on the nature of the interconnect
(hardware, communications, and coherency protocol). This means
that accessing remote memory can take between three and twenty
times longer than accessing local memory. Additionally, without
optimization, a two-module system on average accesses remote
memory 50% of the time, while an eight-module system accesses
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
33
remote memory 87.5% (7/8) of the time. The need for (and benefits of)
optimization increase with NUMA factor and number of modules.
As an example of the complexity of the problem, consider the
following I/O scenario. As previously stated, all system resources are
visible to all processors. With a three-module system, as shown in
Figure 3, it is possible for a CPU on one module to initiate an I/O
operation (for example a disk read) on a device on a different module.
During such an operation, the disk controller performs a DMA (Direct
Memory Access) operation to a buffer somewhere in memory.
CPU
CPU
CPU
CPU
Memory
2
CPU
CPU
IC
CPU
CPU
I/O
Memory
CPU
CPU
IC
CPU
CPU
I/O
Memory
IC
I/O
DMA
buffer
1
1 The data is read from disk and placed in memory
2 The data is accessed by the requesting CPU
Disk
Figure 3: Three module I/O scenario
Suppose the buffer is physically located on a module other than the
one in which the CPU or the disk is located (the buffer could also be
split across two or more modules). Now the DMA from disk to
memory takes place across the interconnect to the module containing
the buffer. When the I/O operation is complete, the disk controller
signals the requesting CPU with an interrupt, which causes it to read
34
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
the buffer, requiring a second transmission of data across the
interconnect from the module containing the buffer to the one containing
the CPU. Optimizations can be made both at the system hardware
level and at the operating system level. These are discussed below.
HARDWARE OPTIMIZATIONS
L3 cache
In the same way that individual CPUs on an SMP system have their
own level-1 and level-2 cache subsystems, it is possible to implement
a level-3 cache for each module. This is shown schematically in
Figure 4. In this case data is accessed by all processors on the module.
The cache holds only remote memory data. There are no local memory
addresses present in the L3 cache.
The use of L3 caching means that, while the initial remote memory
access incurs the full ‘NUMA Factor penalty’, subsequent access to
the same data by any processor on the module has the same latency as
local access, provided the data is still resident in cache.
All caches use a hashing algorithm to map physical memory addresses
Interconnect
controller
CPUs
1
3
1 Local access
2 Cached remote access
3 Remote access
L3 cache
2
Memory
I/O
PCI
Figure 4: CC-NUMA with L3 cache
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
35
to cache lines. As cache is always smaller than the area of memory it
caches, several memory addresses map to the same line. A collision
occurs when cached data that’s in use by one CPU is evicted by
another one requesting another memory location that maps to the
same cache line. To be efficient, L3 cache should be large and
(preferably) use an associative organization to reduce the number of
collisions on individual cache lines.
For example, it’s not unusual for individual modules to support at least
4 GB of physical memory. In an eight-module system each module has
28 GB of remote memory. With a 32 MB non-associative L3 cache,
a cache line collision occurs about once every 900 accesses, on
average. In addition, the working set (that is, the set of regularly
accessed memory addresses) of today’s operating systems is probably
in the order of 32 MB. With a 32 MB L3 cache, caching application
code and data (such as a database) is done at the expense of evicting
operating system code from the cache, thus reducing system
performance. For such a configuration running enterprise type
applications a cache size of at least 256 MB is required.
Cache coherence protocol
The efficiency of the cache coherence protocol has a large impact on
the NUMA Factor. On a traditional SMP system, cache coherency is
generally maintained by a ‘snoopy’ protocol, in which each CPU
examines addresses being transmitted on the address bus. If a CPU
detects a request for a memory location for which it has an updated
value, it provides the value to the requesting processor.
This technique works when CPUs are located physically close to each
other and when the number of CPUs is limited. To adopt the same
technique for intermodule cache coherence would be prohibitively
expensive in terms of bandwidth and latency. Because of this, most
CC-NUMA implementations use a directory-based cache coherence
protocol, whereby only modules that are affected by the coherence
operation take part in the transaction.
Intervention
When a module (or, more accurately, a CPU on a module) tries to read
remote data and the home module is not the owner module, what
36
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
normally happens in a directory-based cache coherence protocol is
that the home module sends a request to the owner module asking for
an updated version of the data. Once this is received, the home module
replies to the original requesting module. Using a technique known as
‘intervention’ it’s possible to eliminate one of these data transfers.
Using intervention, when the home module receives a request for
data, instead of asking the owner to send it the updated value, it asks
the owner to send the updated value directly to the requester. This
technique can significantly reduce latency and the NUMA factor.
SOFTWARE OPTIMIZATIONS
The impact of the NUMA Factor
As discussed above, the NUMA Factor typically varies between three
and twenty. With a NUMA Factor of five or less, reasonable scalability
can be achieved without compromising performance or requiring
complex operating system modifications. This means that the system
performance increases fairly linearly with a gradient of about one with
each additional module (up to a limit). However, with a NUMA Factor
of five or more, it’s necessary to modify the operating system so that
data and executable threads that operate on them are close together.
Two operating system components have a role to play in such an
arrangement – the Virtual Memory Manager (VMM) and scheduler.
VMM allocation strategies
When an application makes a request for memory, for example via the
malloc() system call, the VMM reserves space to satisfy this demand
in physical memory. In non-NUMA system, the VMM usually uses a
least-recently-used (LRU) algorithm to decide where to place the
allocated memory. However, in NUMA systems, the objective is to
keep data close the CPU that is going to use it, so one of the first
modifications to make is to ensure that, when an application requests
virtual memory, the VMM tries to allocate physical memory on the
same module as the CPU on which the application runs.
This simple modification can significantly reduce the number of
remote data accesses and increase performance accordingly. There
are occasions, however, when it fails. For example, most database
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
37
implementations have an initialization phase during which the various
database buffers are allocated. At the end of this phase, a number of
‘worker threads’ and/or processes are created to implement the
database function. These worker processes, which are independent of
the initialization process, are likely to be scheduled on any CPU on
any module, and consequently the database buffers they use may well
be on a different module to threads.
This leads us to another optimization technique: page migration.
Page migration
As described above, it’s possible that the VMM’s allocation strategy
will not always be optimal for NUMA. In addition, threads and
processes may migrate from one module to another in order to
optimize CPU usage. Under these circumstances the number of
remote accesses may be very high. It’s possible for the operating
system to detect that one or more CPUs of a given module are
systematically accessing remote memory and to request that the
VMM physically moves the memory pages concerned to the module
making the accesses. However, care must be taken to avoid ‘pingponging’ pages between two or more modules.
Page replication
It is possible for several modules to access the same set of memory
addresses. In this case page migration doesn’t work. One solution to
this problem is for each module to make a copy of the data and keep
it locally. This technique is very effective for read-only data, such as
kernel or application executables, but generates a large amount of
coherence traffic in the event of write operations.
Scheduler affinity
In today’s modern operating systems, the schedulable entity is usually
an executable thread. Each thread is given a time slice of CPU; once
its time is up, it’s evicted and another thread is scheduled on the
processor. In a NUMA system, executing threads bring data with them
into the L3 cache. It makes sense, therefore, to try to make the most
of this data, so the next time that the thread is eligible for execution,
38
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
the scheduler should try to schedule it on a CPU on the same module.
This is known as ‘soft affinity’.
A variation on this theme is ‘hard affinity’, whereby the application
or system administrator binds a thread or process either to an individual
CPU or a module. By binding a thread, the thread can run on only that
module or CPU, even if there are no available CPUs on that module
and idle ones on other modules.
This article concludes in next month’s issue of AIX Update.
Jean-Paul Weber (France)
© Xephon 1998
Using signals to kill a process
In Unix, a signal is sent to a process when an event exterior to the
process occurs to which the process must respond. The simplest
example of a signal is ‘hang up’ (SIGHUP). When a user is logged on
at a remote terminal, the data line can hang up for a number of reasons.
Line problems, modem problems, power loss at the remote terminal,
or deliberately or accidentally turning off the remote terminal all
result in a hang up signal. Unix keeps track of which processes are
being run by which terminal, and when a terminal hangs (drops its
connection) the operating system sends a SIGHUP to all the processes
that were launched from that terminal.
A process has three options when it receives a SIGHUP signal:
•
The process can stop executing immediately, which is the default
action.
•
The process can catch (trap) the signal, ignore it, and continue
executing.
•
The process can catch the signal and carry out some other
programmed reaction. For example, it could close all open files,
display a warning message, and then exit.
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
39
THE KILL COMMAND
The kill command can be used to send signals to a running process. Its
syntax is:
kill -<signalNumber> <jobNumber>
The hang up signal has signalNumber 1, so the command kill -1 1234
sends the hang up signal to job 1234, in the same way as if the user had
turned off the terminal while logged on.
But before you begin killing processes and possibly causing havoc,
it’s necessary to understand what signals do and how programs handle
them – there are clean and less than clean methods of killing a process!
TRAPPING SIGNALS IN SHELL SCRIPTS
Enter the following shell script and call it alive.sh.
ALIVE.SH
# alive.sh
while true
do
echo "I'm alive!"
sleep 5
done
Make it executable with:
$ chmod +x alive.sh
Run it in the background by adding an ampersand (&) after its name:
$ alive.sh & [1] 5678 $
5678 is the job number, or process ID, of the alive.sh command now
running in the background. Despite the fact that the command runs in
the background, the message ‘I'm alive!’ still appears on your terminal
every five seconds. Kill it by sending it the SIGHUP signal.
$ kill -1 5678
In the following listing, our alive.sh command has been modified to
trap the SIGHUP signal. The trap’s syntax is:
trap functionName signalNumber
40
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
In our example the function signal01 is called when our shell script
receives the SIGHUP signal.
MODIFIED SCRIPT
# alive.sh
function signal01 {
echo "Received signal 1 (SIGHUP)"
}
trap signal01 1
while true
do
echo "I'm alive!"
sleep 5
done
If we now execute the alive.sh shell script, appending an ampersand
to the command, and attempt to kill it with -1, we see the following
output at our terminal:
$ alive.sh &
[1] 7788
$ I'm alive!
I'm alive!
I'm alive!
kill -1 7788
$ Received signal 1 (SIGHUP)
I'm alive!
I'm alive!
The trap catches signal 1 and simply displays a message and continues.
You can stop the program by sending a different signal such as signal
2 (SIGINT) with the command kill -2 7788.
This technique is used in more complex shell scripts. If the script is in
the middle of an important or complex calculation or action, rather
than just ‘dropping dead’, the script finishes its current action and
carries out other housekeeping, such as ensuring that open files are
closed, before terminating.
SIGKILL: A SIGNAL THAT CANNOT BE IGNORED
Signal 9 (SIGKILL) is unlike other signals in that it cannot be trapped.
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
41
Sending signal 9 to a process means that the operating system must
immediately kill the process. The advantage of signal 9 is that the
program cannot trap it and ignore it; the disadvantage is that the
program cannot intercept it and perform an orderly shut down.
Using the kill -9 on a database or similar process can be disastrous. It’s
important first to attempt to kill such a process with SIGHUP or
SIGINT before resorting to the deadly SIGKILL. Note that there are
instances when even kill -9 won’t kill a process. An example of this
is when external devices, such as tape drives, are involved.
TRAPPING SIGNALS IN A C PROGRAM
The following program (alive.c) performs a similar function to our
shell script alive.sh. Again the SIGHUP signal is trapped to display a
message.
SAMPLE C PROGRAM
#include
#include
<stdio.h>
<signal.h>
void
signal01() {
(void) printf("Received signal 1 (SIGHUP)\n");
}
main()
{
(void) signal(SIGHUP, signal01);
while ( 1 ) {
(void) printf("I'm alive\n");
(void) sleep(5);
}
}
SOME COMMON SIGNALS
•
1
SIGHUP (‘hang up’) is the result of a phone line or terminal
connection being dropped.
•
2
SIGINT (‘interrupt’) is generated from the keyboard, for
instance by pressing Ctrl-C.
42
© 1998. Xephon UK telephone 01635 33848, fax 01635 38345. USA telephone (940) 455 7050, fax (940) 455 2492.
•
3
SIGQUIT (‘quit’) is generated from the keyboard, usually by
a Ctrl-\ or Ctrl-Y. To find out which, type stty -a and press
enter. In the listing you will find ‘quit=^\’, or ‘quit=^Y’, or
something similar. A SIGQUIT often causes a core file to be
created, containing a copy of your working memory.
•
15 SIGTERM (‘software terminate’) is often used to terminate
a program. Using the kill command without a signal number
causes it to send its default, signal 15, to the job. This is a
good first step when trying to kill a process.
A CLEAN KILL
Time and effort is necessary to code a trap for a signal into a program.
This means that, when a trap has been coded in a program, it’s been
done for a good reason. If the program can simply die without
performing any cleanup, then why go to the trouble of including a
trap? That is why it’s a good idea to try signal 15, signal 1, and signal
2 before resorting to signal 9.
I use the following shell script to kill processes cleanly.
SCRIPT TO KILL PROCESSES
#!/bin/sh
#
# kill.sh
--
kill "cleanly"
for pid in $*
do
kill
$pid
kill -1 $pid
kill -2 $pid
kill -9 $pid
done
Note that killing a process that has already been killed results in a
harmless error message. Another good idea is to try to kill processes
in an orderly manner using ‘weaker’ signals before trying to massacre
them using SIGKILL.
AIX Specialist (Switzerland)
© 1998. Reproduction prohibited. Please inform Xephon of any infringement.
© Xephon 1998
43
AIX news
Sybase has announced Replication Server
version 11.5, which features better
management facilities and support for more
than 25 different types of data source,
including application packages. A new
replication management framework
simplifies set up and synchronization of data
between systems, including a new graphical
replication manager tool.
boasts additional support for Unified
Modelling Language v1.1. Rational Rose is
available for AIX at USD6,000.
Replication Server can now be managed by
systems management tools from Tivoli,
BMC and Compuware, with improved
warm-standby features.
Rational Software, Olivier House, 18
Marine Parade, Brighton BN2 1TL, UK
Tel: +44 1273 624814
Fax: +44 1273 624364
Out now for AIX, prices start at US$2,695
for two to eight concurrent users.
***
For further information contact:
Sybase Inc, 6475 Christie Avenue,
Emeryville, CA 94608, USA
Tel: +1 510 922 3500
Fax: +1 510 658 9441
Web: http://www.sybase.com
Sybase UK Ltd, Sybase Court, Crown Lane,
Maidenhead, Berkshire Sl6 8QZ, UK
Tel: +44 1628 597100
Fax: +44 1628 597000
***
Rational Software has announced new
releases of its Unix development tools,
including Rational Rose 98 for visual
modelling. The package provides languageindependent enterprise development
capabilities, and also integrates with
ClearCase,
Rational’s
software
configuration management package. It also
x
For further information contact:
Rational Software, 18880 Homestead Road,
Cupertino, CA 95014, USA
Tel: +1 408 862 9900
Web: http://www.rational.com
IBM has announced a high-performance
compiler for Java running on AIX (and also
OS/2, Windows 95, and Windows NT
systems), which compiles Java bytecode into
optimized platform-specific native code.
This is significantly faster than bytecode
executed in a JVM/JIT environment. The
degree of performance improvement
depends upon the application. The current
beta release supports a subset of the
JDK1.1.1 APIs.
IBM has also let it slip that its RS/6000 server
line is to become the first operating system to
receive Virtual Private Network
certification by the International Security
Association, the ICSA. The certification is
for AIX 4.3.1, which is already the holder of
Germany’s E3/F-C2 certification (though
not the DoD’s C2 certification).
For further details contact your local IBM
representative.
xephon
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement