the speed linitation is the EeEory

RAXCO VAXiVI{S
Perforuance }ianagenent Seninar
Section
The Conponents
of
II --
CPU
-----Notes
The VAX CRI
Tine
The speed of the \IAX CPU, as experienced by applications, ls deteroined by tno factors:
-
The tiue it takes to process instructions in
the CPU
-
The tfure
it takes to access nenory to fetch or
store the data needed by the instructions.
To sone extent there
is
overlap
-
instruction "pre-fetch"
-
cache write-throuqh
"pipelining"
(
(7gO/785/8200)
85x0/86x0)
or write-back
Menory accesses nay not be required
data is in the nenory cache.
if the
desired
is required, the CPU nalts on the
if it is a fetch. These naits do not shoH aa
idle or "null" tine; the CPU appears busy. these
wai.ts Bay be longer than the nornal EeEory access
trnes due to other activity in the Eenory controller
Hhen
a nenory access
access
or on the
-
CPU bus:
nenory controller conpletlng a prevlously
initiated nrite due to a cache nrite-through
or nrite-back.
-
instruction pre-fetch initlated
I/O,
read
nhich alnays has priority.
In nost VM applicatlons, the
CPU speed
lized -- the speed linitation is
controller.
Copyrlght RMCO,fnc. 1986
Dupllcatlon
is urderutithe EeEory
ln any nanner Prohlblted.
Page 2.1
RAXC0 VAX/VI-$
Speed
9v
rt
LJ ir-f ;i
"-.""--"..'.,*
of
-----Notes
Perfornance ltlanaseuent Seninar
ltenorv
The relevant speed is that of the EeEory controller
plus bus arbitration tine. Each CPU access causes a
fixed arcunt of data to be delivered: 54 bits on the
780/785/BZOO, I28 bits on the 8500, 255 on the 8550.
The approxinate speed ratings are:
llax Xfer
Access
Rate
Tine
Plicrosecs ttl Bytes/Sec
CPU
{<
8(
+15 ^S
so
uo
5co ",5
?5{a
3f+ "s
{
J:
2(.? -l2 G+.0
IL
tb
To understand these apparently large nunbers, they
nust be considered in context. A VN( 785 can execute
(if not sloned by nenory) roughly 1.2 aillion instructrons per second. In typical softnare, each instructi.on requires, on averase, 11 bytes of data (the
instruction itself plus any data referenced by the
instructi.on). If the cache hit rate (deflned as nemory references net by data previously placed in cache
due to direct references) is 50t, the denard on Denory
is 5.6 nillion bytes per second. Since there are
additional denands, notably fron I/0, and there is
other activlty on the SBI nhich occasi.onally uakes it
unavailable for access, the EeBory controller is
t"
'1-41i
clearly saturated.
I
l-
,tLmrp:
In fact, because the denand on menory by the CPU is
anything but gnooth, the sachine ni1l nrn considerably
sloner than 1.2 HIPS -- generally about .75 UIPS. (If
the softnare in question takes advantage of the poner
of the VAX inetruction set, this is roughly equivalent
to I.2 IBl.l 370 type I{IPS.)
The nost signiflcant varlable 1s the cache hlt rate.
50? rate asguned above is probably optinistic for
large VAX norkloads.
the
hit rates are influenced by:
- The nunber of active users, and therefore, the
Cache
context enitch rate
Copyrlght RjAXCO,Inc. 1986 Duplication
in
any ranner prohibited.
Page 2.2
RAXCO
-----Notes
VAX/VIG Perforuance l'lanageuent Seninar
The volune of comunicati.ons actlvlty (VAX
clugter, ECIIEI or other) nhich i.ncreases the
context sritch rate.
of application. Highly
conpute bound
generally
has a hlgh rate; data processing, office autonation and softnare developnent does poorly.
The type
scientific conputational activity
The progran desiqn - lts "Iocality" of
data
access ard code usage.
lnplications
Stand alone benchnarks are
evaluating VAX perfornance.
of little
value in
Traditional standardized CPU speed conparison
benclrnarks are gnall prograns ntuich nill stay
totally in cache and do very fen EeEory
accesses. The 785 is NOT 1.5 ti-ues faster
than the 780 in uoet appl,ications.
lleasuring resource utilization of an applica-
tion by running it stardalone fails to account
for the effect of context sHitches on cache
hit
rates.
UO transfers slon the CPU due to the additional
denard placed on Eenory. On a 780/785/8200, the
running tine of softnare sill be lengthened by 4Z
to 102. during tines xhen a disk transfer is active. Therefore, excess I/O transfers should be
avoided.-
Good proqran and data
prograns nhi.ch
tir€.
nill
locality ls essential in
consune larqe anounts of
czu
Hininize context srdtchinq
-Do not uge DZ's
!
-Hininize cluster
-Keep QUNIfl U
and mcNET
^a
activity. lo' \
\o' tJtla:
high, if possible
--4.'
Copyrlqht RAXC0,Inc. L985 Dupllcatton
ln any @nner proNblted.
Page 2.3
RAXCO
-----Notes
\IN(/VtlS Perfornance llanageuent Senlnar
-If prinarily a batch environnent, keep just
enough jobs
active to keep th€
CPU busy.
The dual CPU Vax conflgurations (782, 8300, 8800)
have the gaue nenory controller as their single
CPU counterpart (780, 8200, 8550). As that leuory
controller is the najor perforoance lilttiting
factor in the single CPU confi.guration, it becones
a crippling perfonunce factor in the dual
confiquration.
Short instruction fornats yield nuch faste. coa"
than longer foruatg because fener ne[ory accesses
are required ard less of the cache ie tied up nith
code.
8500/8550/8550: "Pipe1ining" architecture doesn't
pay off if prograns have nany branchet- in code.
There is usually little one can do about this at
the desiqn level, except to nrite structured topdonn code.
A l{aior User
of
Conpute Tine
- Terninal I/O
is
a najor resource drain on the \lAX, not
because 1t overloads any I/0 channels, hrt because it
requires large anounts of @U tine. For erauple: if
done naively, reading data fron a renote device at
9500 baud can consurne 75 to 100b of the CHJ ti-oe on a
Terninal I/O
\.Al
- 4lc frr-t.
Ynx
L^'i
lrAx 780.
There are tno sources of CPU tine utllization
terrinal IIO:
by
Interrupt and protocol handling
"Intelli.gent" features of teruinal driver
Clraracter llode vs EllA vs Ethernet
VZ's operate one character at a tine - each hfte of
I/O funcdata to be transferred requires a separate
Copyriqht RAXCO,Inc. 1986 Drplication
in
any Eanner
prohiblted.
Page 2.4
1'\ \
RAXC0 VAX/V}!S
-----Notes
Performance l{anaqenent Seninar
tlon by the CPU. This ts expensive -- a ternlnal
operating at 9600 baud requires 960 separate I/0 operations per second and generates 960 interrupts. This
niII use 5 to 7? of the resources of a 780.
The CPU tine spent respondi-ng to internrpts ard
issuing I/O's is "hidden", Be it is not charged to the
user
process. It is responeible for nost
"overhead"
on VAX systens.
DllU's and t[IV's operate in character node rtten
reading data, and in character node or "Et'tA" Bode Hhen
nriting data. i{hile in character node they have a
silo vrhich speeds up output operations sonenhat.
Generally. honever, ElA node ie uore efficient except
tor short transferg, nhere the cost of setti.ng up the
WA transfer exceeds the cost of doing several , but
sinpler, single byte transfers.
EllF's
or not on a
by the SYSGEN paraneter
which should be set to 3O. ----__*._
The decision about nhether
U,tF type device
TTY-E3,IASIZE,
to
is controlled
Ethernet devices (the IAT server) all I/O is in
tl.|A node, and all terninal data is nultiplexed (operati.ng on a tineout basis. ) The savings over charaeter
I/O is laroely offset by the cost of protocol hardling
nultiplexing control.
Copyrlght RANCO,Inc.
1985
frf
r,; ', []tJ
use D'lA node
For
and
f
\
-l
Dupllcatlon
Je.,[.Ju*"
6
\ ac(
ln any Danner prohlblted.
Pace 2.5
RAIO VAX/\rlllIi Perfornance llanageleat Sertaar
---ilotes
Etgurc 2-1
Colparlaon of lhrl.rr lerrlnat Ottput Rates
For Different transfer Sizea and CPUa
80000
70000
6oooo
q
v'rtr c
PER
50000
SECOHO
'40000.
30000
20000
1
0000
0
g
:
,,'
i,t'- l ,t Bi*r R;
64'
3Z
128 256 512 900'
1500
TRANSFER SIZE I;I'BYTEI
l:
Ihe
Y arlg represents the data trangfer rate achleved at tbe polnt
of 100t @U uti.Uzation. ?hat ig, th€ CPU proceaaing reguireil Just
to perforE tbe tetzinal I/O functlons ig ueing all avail.able @U
caclea.
lhe I aris is the Blze of the data trangfer - the nurber of bytea
ertput by each call to QIO. For eadr transfer gLze cEse, ttrere nere
three tegts. lhe lcftrcrt.bar ln each caEe repregeats a IZ oa a
780, the ceater bar a IZ on'a 750, ard tie ri.ghtrcat E IllF on a
730. IIY_II4SIZE r€s get to 64.
Copyrtght EAX@,Inc. 1985 Drpllcatlon
-T -t
1*J)
-*Jl
*.).l
r
r\A
:
j ?.
l;::-*
\
5=
I{3;
:l
r
.rJ
ln any Enner prohlblted.
{r 11
'
Page 2.5
t$'
RAXOO
\IAX/VIA Perfonance l{anagenent Senlnar
---Noteg
Etgure 2-2
of Uaxinrn Ternlnal Output Rates
MF type Device vg. IAf Senrer
Conparison
'l@
%80
+
+
+
--+
-F
-.-
c
P60
U
t40
.l
\!)20
d
Servec108/OlO
Attacfi:10 8i/OlO
Servec50&QlO
Attactr:50 EVOIO
sorv€c 132 8l/OlO
Attacht 132 EI/OIO
g
024
,.-'
14
681012
Number of Process€s
)
15
-l-
-.-Autu8t 198s/Hardcopy
The term "Attach" 1n the legend for the llneg refers to the AbIe
Attach terainal interface ntrich te a ElilF look-alike, tr11d Perfot1g
identically to the DIF in all respecta.
nhen clolng
This plot dlsplays the arcunt of CPU tine unused
terui-na1 output at fixed rateg rrith 3 dlfferent gizeg of QIO. For
exauple, the plot slpns that a El{F rprking at 1.0'000 bytes Per
second (10 processes) Hith 15O draracter QIO's rlLL use 55t of the
systen. The IAT, at the sane output rate, riLI uge 95t of the CHI.
Teninal flriver "Feaiureg'
(These
are all on
by
default):
Input
characters,
- Checks for llne edltlng
for bulk
needed
etc.
Not
teruinators,
trannisgion' block I/O, etc.
Echoeg (verY erPensive - single byte I/O).
Needed only for interactive trcr'k.
Copyrlght BAXC!,Inc. 1.986 Dupllcation
in any lanner prohiblted.
Page 2.7
RAXCU VAX/\JIJIS
Performance llanagenent Seninar
-----Notes
Output- Escape sequence valj-dation, nidth checking'
tab conversions, length checking' etc. These
are needed only for debuqging PurPosesShould be disabled for any hich use hish
terninal output volune production softnare.
These features are turned off conpletely by the
IOSM-NO!'0RIAT function nodifier nhen uakinS QIO calls
for terninal I/0. High level facilities (languaqes'
Efficiency gains of 2OZ
Rl{S) can not turn then off.
to 802. nill follon.
;(
High Ievel screen nanagenent softrare (TU{S' -FlF' the
- vJ !*\ n -$
possible
developnent
in
savings
offers
routines)
S!!G
.,.,".si+
and naintenance tlme and flexibility as to device 5 tr';;'--, '-l"J''-:
type. This is obtained at the cost of substantial
-)-l'- 7^'-eI{ '
anounts of CPU tine and not necessarily optiual IlO l"-.
strategies. Users nust carefully consider the
[rrr3 - I =i.
operational costs versus any actual aavings:
\*-)
.:-::,
Is device flexibility really
large?
Could less broadly functional in-house "library"
routines suitable for your flexibili.ty requirenents be developed to yield the saue applicati.on
developnent sarrings as the DEC routines?
DEslm renote terninal connections (via a SEI HOST
comand) use substantially nore resourcest than anY
other forn of terninal connunication. This is due the
the hlqh resource denands of EECNgf in general, ard
the fact that DECNET is oriented tonard transmission
of
nessages
of
single character
100
to
1000 characters
as
opposed to
Bessages.
lhe PSI
X.25 softt{are fron DEC has been observed to
consuue up to 60t of the processinq Poner of the yAX
nhen used for a substantial portion of the nachine'a
tenrinal connections. Hardnare inplenented PADs are
considerably cheaper and totally offload this activity
fron the VN(.
Copyright RAXC0,Inc. 1986 Duplicatlon
!n any lanner prohibited. fage
?.,:
!'
i<r iie]]
necessary?
Is the developnent saving really that
crX
2-8
RAICO UN(/V]|S
-----Notes
Perforuance l{anaqenent Seninar
Iuplications
Avoid uneceesary ter^nina1
I/0:
- Iavish pronpts, explanations, graptrics' etc.
are often unnecessary in softnare used intensively by users fanilfsp nith it. They can' in
fact, be an annoyance. (Exanple of nhat not
to do: compare V4 m, error Eessages nith \I3. )
-
Hiqh volume printing should be done
connected
nith parallel interfaces.
to devices
- Avoid screen repainting if selective updates
nill do.
- !{iniuize curser addressj.nq. Use tabs instead
of spaces.
- For oraphics applications, use intelligent
devices and enploy their features. (Graphics
qeneration is one of the fen applications
better perforned in local intelliqence.
)
Keep the nunber of actual I/O requests to a ninirrun. Ortput should be assenbled in a butfer, then
rritten 512 bytes at a tiue, with enbedded CR's,
LF's and other control sequences. High level 1/O
routines utret be avoided.
of
screen lntensive softscreen based editors are
costly softnare to use. Other alternatives nay be
Eore producti-ve, both from a nachine and user
Recosnize the hi.qh cost
Hare. EI and other
point of vien.
type devices as opposed to !Z's. Overall'
perfornance sains of 15? are typical. But S{F's
rcn't help very nuch if I/O lengths are short.
Use DltlF
If printers nust be driven off tenrinal lines'
fljlF' s are
critical
.
of the 58tr HOST facility nust be prevented
or etrongly discouraged.
The use
Copyricht RMC0,Inc. 1986 Duplicatlon
in any
nanner
prohibited.
Page 2.9
RA(CO
-----Notes
lIAX/Vl6 Perforuance Manaqenent Seninar
Avoid data transfers (through DECNEI, or rith
utiJ.ities such as GRI{II) over async }ines, or
confine then to periods of lon aysten activity.
If you must do then, use illE's for output. For
input, this application and block uode teruj-nals
represent the only areas nhere the pertornance of
the IAT server nill equal or exceed the Il{F.
SoftHare for async data transfers nust be carefully nritten. Turn off all tertinal features' use
Ionq fixed tenqth transfers, uinisize handshaking
and protocol. Use DECNET for conveni-ence, but
avoid it for intensive acti.vities.
was designed for noving blocks of data
nhole documents, ftrssinilgg, files' etc.)
anons office nachines. It nas not designed for
character by character teruinals, even as a nrltiplexino device. For netxorks of terninals there
is equipnent specifically desiqned for the PurPose
nhich is nore ponerful, reliable, flerible and
etficient. If you are usins IAT, nake the ti-neouts as long as poasible nithout causing unacceptable response.
Ethernet
(eg.,
baud rate has no significant effect on
the anount of CPU resources consuned to do output
-- it sinply alters the rate at nhich those resources are consumed. The only exception is nhen
users of hiqh baud rates are frequently haying
needed output scroll off the screen before it can
be paused resulting in comards being re-executed.
Tersinal
?he Xyplex ard Telenatics terrninal interfacing
hardnare can otfload sone of the CPU load introduced by terninal I/O. Honever, urch of the saue
benefits can be achieved through the types of
softnare nodifications indicated above. fie net
gain these devices niIl produce after such uodifications uust be carefully in liqht of their cost.
In the case of the Xyplex, if its sritching capabilities are to be used, its lack of capabj.lity in
that regard as coopared to the Eore tradltional
alternatives should also be consldered. (This
applies to any local area netnork based teroinal
gnitching. Tercinal ${itches based on ride area
netnorkinq are generally as inexpensive and effective for loca1 use as those based on a local area
netlrork guch as Ethernet. )
Copyrlght RAICO,Inc. 1985 Dupllcatlon
tn
any nanner
prohiblted.
Paqe 2.10