Data Base Technology W. McGee

Data Base Technology W. McGee
W. C. McGee
Data Base Technology
The evolution of data base technology over the past twenty-fiveyears is surveyed, and major IBM contributions to this
technology are identified and briefly described.
Introduction
Around 1964 a new termappeared
in the computer
literature to denote a new concept. The term was “data
base,” and it was coined by workers in military information systems to denotecollections of data shared by endusers of time-sharing computer systems. Thecommercial
data processing world at the time was in the throes of
“integrated data processing,” and quickly appropriated
“data base” to denote the data collection which results
fromconsolidating the data requirements of individual
applications. Since that time, the term and the concept
have become firmly entrenched in the computer world.
Today, computer applications in which many users at
terminals concurrently access a (usually large) data base
are called data base applications. A significant new kind
of software,thedata
base managementsystem,
or
DBMS, has evolved to facilitate the development of data
base applications. The development of DBMS, in turn,
has given rise to new languages, algorithms, and software
techniques which together make up what might be called
a data base technology.
Data basetechnology has been driven by, and to
a large
extent distinguished from other softwaretechnologies by,
the following broad user requirements.
Dataconsolidation
Early data processing applications used master files to
maintain continuity between program runs. Master files
“belonged to” applications, and the masterfiles within an
enterprise were often designed and maintained independently of one another. As a result, common data items
often appeared in different master files, and the values of
such items often did not agree. There was thusa requirement to consolidate the various master files into a single
data basewhich could be centrally maintained and shared
among various applications. Data consolidation was also
required for the development of certain types of “management information” applications that were not feasible
with fragmented master files.
Data independence
Earlyapplicationswereprogrammed
in low-level languages, such as machine language andassembler language. Programmerswere not highly productive with
such languages, and theirprograms contained undesirable
hardware dependencies. Further, the complexity of programming made data inaccessible to nonprogrammers.
There was a requirement to raise the level of languages
used to specify application procedures, and also to provide software for automatically transforming high-level
specifications into equivalent low-level specifications. In
thedatabasecontext,
this property of languages has
come to be known as data independence.
Data protection
The consolidation of master files into data baseshad the
undesirable side effect of increasing the potential for data
loss and unauthorized data use. The requirement for data
consolidation thus carried with it a requirement for tools
and techniques to control the use of data bases and to
protect against their loss.
This papersurveysthe
development of database
technology over the past twenty-five years and identifies
the major IBM contributions to this development. For
Copyright 1981 by International Business Machines Corporation. Copying is permitted without payment of royalty provided that (1)
each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page.
The title and abstract may be used without further permission in computer-based and other information-service systems. Permission
to republish other excerpts should be obtained from the Editor.
IBM J. RES. DEVELOP.
VOL. 25
NO. 5
SEPTEMBER 1 9 8 1
this purpose we organize the technology into three areas,
roughly parallelingthe three broad user requirements just
cited:
1. The development of data structuring methods for the
representation of consolidated data;
2. The development of high-level data languages for
defining and manipulating consolidated data; and
3. The development of generalized data protection facilities for protecting and controlling the use of consolidated data.
Because of space limitations, coverage is limited to
specific IBM activities that in the author’s opinion have
had the greatest impact on the technology. As a result,
much important work has, unfortunately, had to be
omitted. Also for space reasons, only brief descriptions
are given of the activities which are included.
Data structuring methods
A data base management system is characterized by its
data structure class, i.e., the class of data structures
which it makes available to users for the formulation of
applications. Most DBMS distinguish between structure
instances and structure types, the latter being abstractions of sets of structure instances.
A DBMS also provides an implementation of its data
structure class, which is conceptually a mapping of the
structures of the class into the structures of a lower-level
class. The structures of the former class are often referred
to aslogical structures, while those of the latter are called
physical structures.
The data structure classes of early systems were derived from punched card technology, and thus tended to
be quite simple. A typical class was composed offiles of
records of a single type, with the record type being
defined by an ordered set of fixed-lengthjields. Because
of their regularity, such files are now referred to as flat
files. Records were typically usedto represent the entities
of interest to applications ( e . g . , students and courses),
and fields were used to represent entity attributes (such
as student name and course number). Files were typically
implemented on sequential storage media, such as magnetic tape.
506
W. C. McGEE
When data consolidation was first attempted, the limitations of early data structuring methods immediately
became apparent. The main problem was the lack of an
effective method for representing the entity associations
that frequently appear when data are consolidated ( e . g . ,
the one-many associations between courses and course
offerings
and
the many-many associations between
course offerings and students). The processing required
to reflect such associations was not unlike punched card
processing, involving many separate sorting and merging
steps.
Early structuring methods had the additional problem
of being hardware-oriented. As a result, the languages
used to operate on structures were similarly oriented.
In response to these problems, data base technology
has produced a variety of improved data structuring
methods, many of which have been embodied in DBMS.
Whilemanyspecific
data structure classes have been
produced (essentially one class per system), these classes
have tended to cluster into a small number of “families,”
the most important of which arethe hierarchic, the
network, the relational, and the semantic families. These
families have evolved moreor less in the order indicated,
and all are represented in the data structure classes of
present-day DBMS.
e Hierarchicstructures
The hierarchic data structuring methods which began to
appear in the early 1960s provided some relief for the
entity association problem. These methods were developed primarily to accommodate the variability that frequently occurs in the records of a file. For example, in the
popular two-level hierarchic method, a record was divided into a header segment anda variable number of trailer
segments of one or more types. The header segment
represented attributes common to all entities of a set,
while the trailer segments were used for the variably
occurring attributes. The methodwas also capable of
representing one-many associations between two sets of
entities, by representing one set as header segments and
the other as trailers, and thus provided a primitive toolfor
data consolidation.
By the mid-l960s, the two-level hierarchic record had
beengeneralized to n levels. For example, GIS [l, 21
provided up to fifteen levels, but with a single segment
type only at each level. By the end of the 1960s, n-level
hierarchies withmultiplesegment
types at each level
were found in such systems as TDMS [3], MARK IV [4],
andIMS [5, 61. Implementations of n-level hierarchic
structures on sequential media tended to follow the
segmented-record approach, with segments being recorded in “top down, left-right’’ sequence. These structures
have also been implemented extensively on direct access
storage devices, which afford numerous additional representation possibilities.
IMS was one of the first commercial systems to offer
hierarchic data structuring and is often cited to illustrate
IBM J. RES. DEVELOP.
VOL. 25
NO. 5
SEPTEMBER 1981
the hierarchic structuring concept. The IMS
equivalent of
a file is the physical data base,which consists of a set of
hierarchically structured records of a single type. A
record typeis composed according to thefollowing rules:
The record type hasa single type of root segment.
The root segment type may have any number of child
segment types.
Each child of the root may also have any number of
child segment types, and so on, upto a maximum of 15
segment types in anyone hierarchical pathand a
maximum of 255 segment types in the complete data
base record type.
Record occurrencesare
rules:
derived fromthe
following
A record contains a single root segment.
For one occurrence of any given segment type there
may be any number of occurrences (possibly zero) of
each of its children.
No child segment occurrencecanexist
without its
parent. This point is essentially a restatement of the
hierarchic philosophy. It means, for example, thatif a
givensegment occurrence is deleted, so are all its
children.
An unusual feature of IMS is the multiple implementations which have been provided for its data structure
class. For any given physical data base, the user may
select an implementation that best matches the use to be
For example,the Hierarchic
made of thatdatabase.
Indexed Sequential Access Method (HISAM) implementation uses physicalcontiguity torepresent hierarchic
record structure, and thus
provides efficient sequential
accesstothesegments
of a record.TheHierarchic
Indexed Direct Access Method (HIDAM)implementation, on the other hand, uses pointers to represent
hierarchic structure, thusproviding for efficient segment insertion and deletion.
Network structures
While hierarchic structures provided some relief for the
entity associationproblem in the early 1%Os, a more
general solution hadto await the introductionof the direct
access storage device(DASD), which occurred ona large
scale in the mid- 1960s. DASD made possiblea new family
of data structuring methods, the network methods, and
openedthedoortothe
development of present-day
DBMS.
The first network structuring method to be developed
for commercial data processing had its origins in the billof-materials application, which requires the representation of many-many associations between a set of parts
IBM J. RES.
DEVELOP. VOL. 25
NO. 5
SEPTEMBER 1981
and itself; e . g . , a given part may simultaneously act as an
assembly of other parts and as
a component of other
parts. To simplify the development of such applications,
IBM developed in the mid-l%Os an access method called
the Bill-Of-Materials Processor (BOMP), and in the late
1960s, an enhancedversion of BOMP known as the Data
Base Organization and Maintenance Processor (DBOMP)
[7]. The BOMP (and DBOMP) data structure class provides two types of files, masterfiles and chainfiles, each
file type containing records of a single fixed-format type,
and a construct called a chain, consisting of a single
master file record and a variable number of records from
one chain file. Agivenchain file record can reside in
multiple chains of different types, thus associating the
master file records at the
head of these chains. For the
bill
of materials application,two chain types-a“component” chain and a “where used” chain-are sufficient to
represent many-many part associations.
Although developed for bill-of-materials applications,
the BOMP data structure classhas been used extensively
in a variety of other applications. Essentially the same
data structure class is provided in the TOTAL DBMS of
CINCOM, perhaps the most widely used DBMS in the
of files are
world today [SI. InTOTAL,twokinds
provided: masterfiles (or single-entry files), corresponding to the masterfiles of BOMP, and variable-entry$les,
corresponding to BOMP chain files. Provision for creating chains in TOTAL issimilar to thatin BOMP, although
many of the restrictions in BOMP have been removed in
TOTAL (e.g., variable-entry files can have multiple record types). While the TOTAL system goes considerably
beyond BOMP in terms of function provided, its BOMP
heritage is still clearly discernible.
Another highly successful network structuring method
is that developed by C. W. Bachman and associates at
General Electric forthe Integrated DataStore (IDS)
System [9]. In IDS, a data base is composed of records
and record chains. There is no concept of a file. The
record chain is analogous to the BOMP chain, consisting
of a single owner record anda variable number of member
records. Asin BOMP, arecord canbe a member of
multiple chains of different types.Unlike BOMP, an
owner record can, in turn, be a member of other chains.
Thisgeneralization permits the construction of hierarchies of any depth, as well as networks of considerable
complexity.
The IDS data structure class
was used as thebasis of a
data base languagedeveloped by the Data Base Task
Group of CODASYL in the late 1960s and early 1970s
[lo]. Thislanguage introduced some new terminology
(e.g., chainsbecame sets) and generalized somefeatures
507
W. C. McGEE
STUDENT
I
NUMBER
I
NAME
12345
15418
31416
BOSWELL
CHICHESTER
Figure 1 STUDENT table.
of the IDS class ( e . g . , providing an ownerless set, yielding the equivalent of a file). The DBTG language has been
incorporated into the COBOL Journal of Development
[ll] and has been implemented in a number of DBMS,
including Cullinane’s IDMS [12] and UNIVAC’s DMS/
1100 [13].
The IMS system provides alogical relationship
facility,
which yields many of the benefits of the DBTG data
structure class. With this facility, a segment may be (in
DBTG terms) a member of two sets: the set ofphysical
child segments of a physical parent segment,all appearing in the same data base record, and the set of logical
child segments of a logical parent segment, which may
occur in different records in the same or different data
bases. Thelogical relationship is thusa special case of the
DBTG setconstruct,but
is neverthelesscapable of
modeling most information situations of practical importance, such asmany-many binary associations. The logical relationship is not, strictly speaking, a part of the IMS
data structure class, since a mapping facility is used to
shield theprogrammerfrom
logical relationships and
preserve his strictly hierarchicalview of data. It is a
significant contribution to thetechnology because it demonstratesthattheentity
associationproblem
canbe
solved without exposing complex networks to the programmer.
Relationalmethods
In themid-I960s, a number of investigators beganto grow
dissatisfied with the hardware orientation of then extant
data structuring methods, and in particular with the
manner in which pointers and similar devices for implementingentity associations were being exposed to the
users. These investigators sought a way of raising the
perceived level of data structures, and at the same time
bringing them closer to the way in which people look at
information.
Within
IBM, Davies [14], Raver [15],
Meltzer [16], and Engles [I71 at differenttimes and in
different contextsdescribedan
entity set structuring
method, wherein information is represented in a set of
tables, with each tablecorresponding to a set of entities of
a single type. (A similar constructwasused
in the
MacAIMS system of MIT as a canonical form for representing associations among data items.) The rows of a
0
508
W. C. McGEE
table correspond tothe entitiesin
theset,andthe
columns correspond to the attributes which characterize
the entity settype.The
intersection of a row and a
column contains the value of a particular attribute for a
particular entity. For example, theSTUDENT table in Fig.
1 describes a set of students having attributes NUMBER
and NAME.
Tablescan
alsobe used torepresent associations
among entities. In this case, each row corresponds toan
association, and the columns correspond to entityidentito
fiers, i.e., entity attributes which can be used uniquely
identify entities.Additionalcolumns
may beused to
record attributes of the association itself (as opposed to
attributes of the associated entities). For example, the
ENROLL table of Fig. 2 describes a set of associations
between courseofferings (identified by COURSE and DATE)
and the students (identified by STUNUM) enrolled in those
offerings.
The key newconcepts in the entity set method
were the
simplicity of the structures it provided and the use of
entity identifiers (rather thanpointers or hardware-dictated structures) for representing entity associations. These
concepts representeda major step forwardin meeting the
general goal of data independence.
In the late 1960s, E. F. Codd [18] noted that an entity
set could be viewed as a mathematical relation on a set of
domains Dl, D,, . . ., D,, where each domain corresponds
to a different property of the entity set. Associations
among entities could be similarly represented, with the
domains in this case corresponding to entity identifiers.
Codd defined a (data) relation to be a time-varying subset
of the Cartesian product Dl X D, X . . . X D,, i.e., a set of
n-tuples (or simply tuples) of the form
(VI, VZ’. . ., VJ,
where vi is an element selected from
domain Di. One or
more domains whose values uniquely identify the tuples
of a relation is called a candidate key.
Aside from the mathematical relation parallel, Codd’s
major contribution to data structures was introduction
the
of the notions of normalization and normal forms.Codd
recognized thatthe
domains on which a relation is
constructed can in general be composed of elements of
any kind;in particular, domainscan be composed of
other relations, thus leading to the “nesting” of relations
of potentially any depth. Codd showed that there was no
fundamental advantage tothis nesting and that, in fact, it
only tended to complicate theinformation modeling process. Instead, he proposed that relations be built exclusively on domains of elementary values-integers, char-
IBM J. RES.
DEVELOP. VOL. 25
NO. 5
SEPTEMBER 1981
acter strings, etc. H e called such relations normalized
relations and the process of converting relations to normalized form, normalization. Virtually all work done
since with relations has been with normalized relations.
M23
W78
12345
31416
31416
Codd also perceived that the unconstrained construction of normalized relations could leadto semanticanomalies. For example, when a tuple represents an association between two or more entities and at the same time
represents (parasitically) the attributes of the individual
entities, values for the latterwill in general be replicated
throughout therelation,
entailing duplicate updating.
Similarly, when a tuple represents an entity, someof the
attributes therein may be attributesof a second (masquerading) entity which is associated in some way with the
first entity. When this occurs, entities of the second type
cannot be represented (inserted, deleted, etc.) independently of entities of the first type.
Figure 2 ENROLL table.
To better explain these effects, Codd postulated levels
of normalization called normal forms. An unconstrained
normalizedrelation is in first normal form (INF). A
relation in 1NF in which all non-key domains are functionally dependent on (i.e., have their valuesdetermined
by) the entire key are in second normal form (2NF),
which solves the problem of parasitic entity representation. A relation in 2NF in which all non-key domains are
dependent only on thekey is in third normal form (3NF),
which solves the problem of masquerading entities.
Codd recognized the existence of many possible manipulation languages for relations and proposed that the
relational calculus be used as the standardagainst which
these languages could be measured for completeness. In
[I91 he defined relational completeness: “a language is
relationally complete if, given any finite collection of
relations R,, R,, . . ., Rn in simple normal form,the
expressions of the languagepermit definition of any
relation definable from R,, R,, . . ., Rnby expressions of
the relational calculus.”
To avoid update anomalies, Codd recommended that
all information berepresented in third normal form.
While this conclusion may seem obvious today,it should
be remembered that at thetime the recommendation was
made, the relationship between data structures andinformation was not well understood. Codd’s work in effect
paved the way for much of the work done oninformation
modeling in the past ten years.
Codd characterized his methodology as a data model,
and thereby provided a concise term for an importantbut
previously unarticulated data base concept, namely, the
combination of a class of data structures and the operations allowed on the structures of the class. (A similar
concept, the abstract data type or data abstraction, has
evolved elsewhere in softwaretechnology.) Theterm
“model”hasbeen
appliedretroactively to early data
structuring methods, so that, for example, wenow speak
of “hierarchic models” and “network models,” as well
as therelational model. The term is now generally usedto
denote an abstract data structure class,
although there isa
growing realization that it should embrace operations as
well as structures.
As part of the development of the relational method,
Codd postulated a relational algebra, i . e . , a set of operations on relationswhichwasclosedin
the sense of a
traditionalalgebra, and therebyprovided an important
formal vehicle for carrying out a variety of research in
datastructuresandsystems
[19]. In addition tothe
conventional set operations, the relational algebra provides such operations as restriction, to delete selected
tuples of a re1ation;projection ,to delete selected
domains
of a relation; and join, to join tworelations into one.
Codd also proposed a relational calculus [19], whose
distinguishing feature is themethod used to designate sets
of tuples. The method is patterned after the predicate
calculus and makes use of free and bound variables and
IBM J.
RES. DEVELOP.
VOL. 25
NO. 5 .SEPTEMBER
1981
the universal and existential quantifiers. For example, the
set of names of students who received an ‘A’ in any
offering of course M23 would be expressed as
{x[NAME] E STUDENT:
(3y E ENROLL)(y[COURSE] = “23’ &
y[GRADE] = ‘A’ &
y[STUNUM] = x[NUMBER])}
IBMinvestigators have made several refinements to
Codd’s original definitions of normal forms.Kent [20]
simplified the definitions by removing references toprime
attributes (an attribute in any candidate key). Boyce [21]
noted that Codd’s definition of 3NF still permitted undesirable functional dependencies among prime attributes
and postulated a normal form which excluded these
dependencies. Codd and Boyce later collaborated on the
definition of the Boyce-Codd normal form (BCNF), a
System
Developed by
MacAIMS
MIT
Project
Motors
General
RDMS
Reference
MAC
Ish
IBM UK Scientific
Centre
U. California,
Berkeley
INGRES
U. Toronto
ZETA
System R
IBM Research, San Jose
IBM
QBE
Research, Yorktown
ORACLE
Relational
Software
Inc.
SQL/DS
IBM
Figure 3 Relationalsystems.
redefinition of 3NF whichsubsumed Boyce’s normal
form andmadenoreferencetoeitherkeysor
prime
attributes [22].
Fagin [23] notedthat relations in BCNF could still
contain higher-order dependencies, which he called multivalued dependencies. He proposed a fourth normal form
(4NF) to eliminate multivalued dependencies andprovided algorithms for reducing relations to 4NF. In subsequent work [24], Fagindescribed the projection join
normal form(PJ/NF),the ultimate normalform when
only the projection and join operators areallowed.
By providing a common context for the formulation of
data problems, the relational model has proved of great
value as a vehicle for research and for communication
among research workers. Areas in which the relational
model has been used include data base system architecture, data base machines, concurrency theory, language
completeness, view updating, query decomposition (especially in distributed systems), and data equivalence.
In addition, the relational model has been implemented
in anumber
of DBMS. Two majorimplementations
within IBM are SystemR [25, 261, an exploratory DBMS
developed by the IBM Research
Division in San Jose,and
SQL/DS [27], a program product based on System R for
use in the DOS/VSE operating system environment.
A partial list of relational systems appears in Fig. 3.
510
W. C. McGEE
Aquestion frequentlyaskedabout
relational model
implementations is: How efficiently do they represent the
entity associations required for the consolidation of data
into data bases? At the user level, a relation seems no
different from-a flat file, and if the latter wasnot adequate
for data consolidation, how can we expect the former to
be? The answer lies in the hardware improvements that
have been madesince flat file days(notably, DASDand
faster CPUs with larger memories) and in a better understanding of the problemsof implementing high-level data
models. Thus, relational systems make extensive use of
indexes and pointers in implementing relations and relational operations. Through theuse of suchdevices,
relational systems seem capable of achievingperformance competitivewithnonrelational
systems, without
compromising the simple view of datafor which the
model was conceived.
Semantic models
During the evolution of thehierarchic,network,
and
relational methods,it gradually becameapparentthat
building a data base was in fact equivalent to building a
model of anenterpriseandthatdatabases
could be
developed moreorless
independently of applications
simply by studying the enterprise. This notion has been
articulated in the widely referenced ANSI/SPARC data
base system architecture [35], which provides the notion
of a conceptual schema for the application-independent
modeling of an enterprise andvarious external schemata
derivable fromthe conceptual schema for expressing data
requirements of specific applications.
Application-independent modeling hasproduced
a
spate of semantic data models and debate over which of
these is best for
modeling “reality.” One of the most
successful semantic models is the entity-relationship
model [36], which provides data constructs at twolevels:
the conceptual level, whose constructs include entities,
relationships(n-aryassociations
among entities),value
sets,andattributes;andthe
representation level,in
which conceptual constructs aremapped into tables. The
latter are similar to relations in the relational model, with
the important difference that the entity-relationship model provides distinct table types for representing
entity sets
and relationship sets. Such semantic interpretations of
relations have existed for some time, but it took Chen’s
paper to give them wide circulation and to createa surge
of interest in the entity-relationship model.
Thedatastructureclass
of theIBM DB/DC Data
Dictionaryprogram product is an embodiment of the
entity-relationship model [37]. The Dictionaryprovides
subjects, which may have attributes and which may
participate in many-manybinary relationships, which
may alsohaveattributes.Inthe
initial release of the
Dictionary, subject and relationship types were fixed in
the productdesign and reflected the entity typestypically
found in a computer installation about which the user
wanted to recordinformation: data bases, records,fields,
programs, etc. Subsequently, the Dictionary has provided anextensibility facility, which allows the user to
define
arbitrary subject and relationship types. With this exten-
IBM J. RES.
DEVELOP.
VOL. 25
NO. 5
SEPTEMBER 1981
sion, theDictionary has themodeling power of a generalized DBMS, making it one of the first systems to implement the entity-relationship model.
Data model implementation
The success of a data model depends not only on the
degree of itshardwareindependence,butalsoonthe
ability to translate operations on its constructsefficiently
into equivalent operations on the underlying hardware.
As one might expect, these goals often conflict with one
another.
For performance reasons, most data model implementations make use of indexes, which are essentially sets of
key value-data location pairs. Rather than develop indexing techniques from the ground up,many DBMS use
existing indexed access methods as their implementation
base. Two access methods which have been used extensively for this purpose are the IBM Indexed Sequential
Access Method(ISAM) andtheIBM
VirtualStorage
Access Method (VSAM). ISAM and VSAM are generalized indexedsequential accessmethods, meaning that
they cater simultaneously to both random and sequential
access to data.
ISAM was introduced in 1%6 as a component of OS/
360 and was thefirst indexed sequentialaccess method to
find widespread use in the data processing community.
ISAM made practical the use of DASD for many users,
especially those who could not devote thetime and effort
required to develop a viable indexed access method of
their own. It has beenwidely referenced in the literature
and in textbooks as thetypical indexed sequential access
method.
VSAM [38] was introduced in 1972. Its major contribution was the useof a record-splittingstrategy toovercome
the tendency in ISAM forlong overflow chains todevelop
after many recordinsertions.Inaddition,
VSAM has
madeinnovativecontributions
in theareas of index
compression andindex replication. The VSAM index
organization is known more generally as the B-tree organization, which was developed independently by Bayer
and McCreight in the early 1970s [39].
Also for performance reasons,many data model implementations make use of hashing, i . e . , the calculation of
data locations from key values. W. W. Peterson [40] was
one of the first to apply hashing to DASD, and his work
has been extensively referenced. V.
Lum and his associates at IBM’s Research Division ( e . g . , [41, 421) have
conducted systematic investigations of hashingtechniques and demonstrated the general utility of the divisionlremainder method, which is widely used today.
IBM J.
RES.DEVELOP.
VOL. 25 e NO. 5
SEPTEMBER 1981
High-level data languages
The history of computer applications has beenmarked by
a steady increase in the level of the languages used to
implementapplications. Indatabase
technology,this
trend is manifested in the development of high-level data
definition languages and data manipulation languages.
A data definition language (DDL) provides the DBMS
user with a way to declare the attributes
of structure
types within his data base, and thus enable the
system to
perform implicitly manyoperations ( e . g . , name resolution, data typechecking) that would otherwise haveto be
invokedexplicitly.A
DDL typically providesforthe
definition of both logical and physical data attributes, as
well as the definition of different views of the (logical)
data. The latter areuseful in limiting or tailoring the way
in which specific programs or end-users look at the data
base.
A datamanipulationlanguage
(DML)provides the
user with a way toexpressoperationsonthedata
structure instancesof a data base,using names previously
establishedthrough data definition. Data manipulation
facilities are of two general types: host-language andselfcontained.
A host-language facility permits the manipulation of
databasesthroughprogramswrittenin
conventional
procedural languages, such as COBOL or PL/I. It provides
statements that the user may imbed in a program at the
points where data base operations are to be performed.
When such a statement is encountered, control is transferredtothedatabasesystem,
whichperforms
the
operation and returns the results (data and returncodes)
to the program in pre-arranged main storage locations.
A self-contained facility permits the manipulation of
the data base through a high-level, nonprocedural language, which is independent of any procedural language,
i . e . , whose language is “self-contained.” An important
type of self-contained facility is the query facility, which
enables “casual” users to accessa data base without the
mediation of a professional programmer. Other types of
self-contained facility are available for performing generalizable operations on data base data, such as
sorting,
report generation, and data translation.
Host-language facilities
Host-language facilities evolved from the need to standardize within an installation the way in which programmerscodecertain
common data handling operations,
such as buffering, error handling, and label processing.
This need resulted in “I/O subroutine packages” which
were invoked by all programs in the installation. Such
511
W. C. McGEE
packages, in turn, were generalized over computers of a
given type into “I/O systems” and “access methods”
applicable to manyinstallations.
The introduction of
DASD greatly extended the setof operations which could
be usefully generalized. Functions typically included in
DASD access methods are space allocation, formatting,
key-to-address transformation, and indexing.
to avoid some of the mathematical appearance of the
relational calculus andat thesame time remainrelationally complete. The set of names of ‘A’ students in M23
would be expressed in SQUARE as:
NAME
0
With the introduction of data base management systems, the access method interface was replaced by the
data base sublanguage. The data manipulation facilities
of a data basesublanguage tendto be more powerful than
those of access methods, permitting, for example, the
updating or deleting of multiple records witha single
statement. Additionally, a database sublanguage may
include statements unique to the data baseenvironment,
such as locking and transaction control statements.
Because of main storage limitations, the units of data
on which data base sublanguages operate are normally
relatively small, therecord being the typicalunit. To
access larger collections of data, the programmer must
“navigate” through the data base. To assist him in this,
the DBMS may provide objects called cursors or current
position indicators, which the programmer cansetto
point to a particular item of data, and later use torefer to
that item or to a related item.
A high-level data language which is proving to be of
considerable importance to data base technology is the
SQL data base sublanguage of system R [43, 441. SQL is a
relational language which had its origins in several relational languages developed by IBM’s Research Division
in the early 1970s, including:
The ALPHA data basesublanguage [45], an adaptation of
the relational calculus for usewith conventional procedural languages. Continuing the example of the section
“Relational Methods,” the ALPHA sequence
RANGE STUDENT X
RANGE ENROLL Y SOME
STUDENT
STUNUM
NUMBER
ENROLL
(‘M23’, ‘A’)
COURSE,GRADE
The SEQUEL language [49], a general purpose query
language based on SQUARE but providing a string-type
syntax with English keywords. For basic queries, SEQUEL borrowed the SELECT-FROM-WHERE construction
of existing query languages suchas GIS andthen
elaboratedthis structure in a consistent manner to
achieve the completeness of the relational calculus, but
with much improved readability. An important characteristic of SEQUEL isthe ability to“nest” SELECT
clauses, permitting complex queries to be articulated
into intellectually manageable chunks without losing
the important nonprocedural nature of the language.
For example, the previous query would be rendered in
SEQUEL as follows:
SELECT NAME
FROM
STUDENT
WHERE
NUMBER
IN
(SELECT
FROM
STUNUM
WHERE
COURSE
AND
GRADE = ‘A’)
ENROLL
=
“23’
The SQL language of System R is an enhanced version
of SEQUEL. In addition to SEQUEL’S query facilities, SQL
provides
0
0
Data manipulation facilities that permit the insertion,
deletion, and updating of individual tuples or sets of
tuples.
Data definition facilities for defining relations, views,
and other data objects.
Data controlfacilitiesfor defining access authorities
and for defining transactions, i . e . , units of recoverable
processing.
GET W X.NAME:
3Y((Y.COURSE = “23’) &
(Y.GRADE = ‘A’) &
(Y.STUNUM = X.NUMBER))
512
W. C. McGEE
returnsthe
names of ‘A’ students in M23 tothe
workspace relation W, where they can be operated on
by statements of the host language.
The GAMMA-0 language [46], a low-level relational language intended for implementing relational algebras and
query languages.
The SQUARE language [47, 481, a general purpose query
language which attempted through graphic
conventions
The use of SQL from programs is facilitated by permitting language variables to appear in SQL statements and
by providing a cursor facility for manipulating individual
tuples. The statement
$LET
cursor-name BE select-statement
associates the set of tuples designated by select-statement with the named cursor. A cursor contains a “current tuple” pointer, so that individual tuples can be
designated simply through a cursor name. For example,
the statement
IBM J. RES.
DEVELOP. VOL. 25 NO. 5
SEITEMBER 1981
$FETCH
cursor-name
returns thetuple pointed toby the current tuple pointer
of
cursor-name and advances the pointer to the nexttuple.
To illustrate, the following pseudo-programprovides
processing of the names of ‘A’ students in M23:
initialize;
$LET P BE
SELECT NAME INTO $STUNAME
with the weighting adjustable fordifferent system configurations. In computing cost, the optimizer makes use of
such “statistics” as relation sizes and number of distinct
key values within a relation.
Like the relational model on which it is based,SQL has
been widely adopted as a research and educational vehicle and has been implemented in a number of DBMS
products such as SQLIDS.
FROM STUDENT
WHERE NUMBERIN
(SELECT STUNUM
FROM
ENROLL
WHERE COURSE
AND
GRADE
=
=
“23’
‘A’);
$OPEN P;
do until the tuple set designated by P is exhausted;
$FETCH P;
process one name in variable
end;
STUNAME;
$CLOSE P;
Self-contained facilities
The nonprocedurality of data processingspecifications
that can beachieved with a host-language facility is
effectively limited by the procedurality of the host language. This limitationwas recognized as early as the
mid1950s, when anotherapproachto applicationdevelopment was conceived. This approach took cognizance of
the fact that most data
processing logic can bearticulated
into executions of a small set of generalized routines,
which can be particularized for specific applications with
a fraction of the effort required to write an equivalent
customized program.Theprocesses which have been
most frequently generalized for this purpose are report
generation, file maintenance, and(morerecently)
data
translation.
The use of SQL in generalized programs whose data
requirements are notknown until the program is invoked
is facilitated by the PREPARE and EXECUTE statements.
These statements may be used to construct string repreOne of the earliest generalized file processing systems
sentations of SQL statements ( e . g . , including data names
was
developed at the Hanford Atomic Products Operasupplied by the invoker) and then cause these representation
in
the mid-1950s [54]. Theworkdonethereon
tions tobe executed exactly asif they had appeared in the
generalized routines for sorting, report generation, and
program to begin with.
file maintenance was picked upby the SHARE organization around 1960 and distributed under the title “9PAC”
Whereasmostrelational
DBMS usean interpretive
[SI. This work, in turn, was extended in many directions
approachtotheexecution
of data sublanguage stateoverthe nextfifteen years, giving riseto numerous
ments, SystemR uses a compiler approach. Programs are
families of generalized systems [56].
first processed by aprecompiler [50], which generates a
tailored datu access routinefor each SQL statement in the
The most pervasive application of the Hanford concept
program and whichreplaces the SQL statement with a
is found in the reportprogramgenerator,
a software
CALL to the access routine.When the program is executpackage intended primarily for the production of reports
ed, all the access routines are loaded to provide targets
from formattedfiles. Attributes of the sourcefiles and the
for the translated CALLS. This approach has two advandesired reportsare describedby the user in a simple
tages:
declarativelanguage, and this description isthenpro1. Much of the work of parsing, name binding, access
cessed by a compiler to “generate” a program which,
path selection, and authorization checking canbe done
when run, produces thedesired reports. A key conceptof
once by the precompiler
and thus be removed fromthe
the report program generator is the use of a fixed strucprocess of running the program.
tureforthegenerated
program,consisting
of input,
2. Theaccessroutine,
because it is tailored toone
calculation, and output phases.Such a structure limits the
specific program, ismuch smaller and runsmuch more
transformations thatcan becarried out with a single
efficiently than a generalized SQL interpreter would.
generated program, but has nevertheless proved remarkably versatile (report program generators are routinely
The tailoring of System R access routines is doneby an
used for file maintenance as well as report generation).
optimizer component [Sl-531, which attemptsto miniPerhaps more importantly,the fixed structure of the
mize the “cost” of carrying out SQL statements. Cost isa
generated program imposes a discipline on the user
which
weighted combination of CPU and DASD I/O activity,
enables him to produce a running program much more
IBM J. RES.DEVELOP.
VOL. 25
NO. 5
SEPTEMBER 1981
513
W. C. McGEE
quickly than he could with conventional languages. Report program generators areespecially popular in smaller
installationswhereconventional programming talent is
scarce, and in some installations it is the only “programming language” used.
The original report program generatorwas the IBM
Report Program Generator introduced in the early 1%Os
for theIBM 1401 computer [57]. It was patterned after the
SHARE 9PAC system andproved to bea valuable tool in
helping users tomigrate from punched card equipment to
electronic data processing. A report program generator
for the Systed360series was introducedin 1964. A much
enhanced version, RPG 11, was introduced in 1969 for the
IBM Systed3 [58]. RPG 11 has beenimplementedon
Systed370 and many other machines, and today it isone
of the most widely usedcomputer programming languages.
While RPG was being developed in IBM’sbusiness
sector, a closely related family of products, theformatted
file systems,were being developed jointly by IBM’s
Federal Systems Division and various military and intelligence agencies of the federal government. A formatted
file system typically provides a set of generalized programs which are sufficient to implement the bulk of the
application at hand. The programs are separately invokable and are so designed that the output of one canbe used
as inputs totheothers.
File structureshave limited
complexity, typically providing a two-levelhierarchic
record with multiple segment types at the second level.
The formatted file systems havebeen used extensively in
intelligence andcommand-controlapplications,where
information requirements are exceptionally volatile, and
the time available to respond to new requirements precludes the use of conventional programming.
IBM has been a major contributor to a number of the
formatted file systems, including:
0
514
W. C. McGEE
The Formatted File System for the Air Force Strategic
Air Command,developed forthe IBM 7090 around
1959 and used mainly for intelligence applications (this
is believed to be the first formatted file system) [59];
The Information Processing System(IPS) for the Navy,
developed in the early 1960s for theIBM 7090 and CDC
1604 [60];
The Formatted File System for theNaval Fleet Intelligence Center in Europe (FICEUR), developed for the
IBM 1410 (believed to be the most widely used of the
formatted file systems) [61];
The National Military Command System Information
Processing System (NIPS), developed for theIBM 1401
and later
converted to the
IBM System/360 [62].
The report program generators and the formatted file
systems were the precursors of the contemporary DBMS
query facility. A query processor is in effect a generalized
routine which is particularized to a specific application
( i . e . , the user’s query) by the parameters (data names,
Boolean predicates, etc.) appearing in the query. Query
facilities are more advanced than most early generalized
routines in that they provide online (as opposed to batch)
access to data bases(as opposed to individual files). The
basic concept is unchanged, however,
and the lessons
learned in implementing the generalizedroutines,and
especially in reconciling ease of use withacceptable
performance, have been directly applicable to query
language processors.
Most query facilitiesuse string-type languages, such as
A significant departure from this practice is the
Query-By-Example (QBE) language [63, 641, which is a
graphical language intended for use froma display terminal. The QBE user ispresented with an outline of the
tables he wishes to query, andthen heexpresses his
query by filling in the outline with the appropriate names
and special characters. The basic idea is for the user to
show the system anexample of the information he wants
to seeand for the system torespond by showing the user
all instances that conform to the example.
SQL.
For example, to query the ENROLL table (Fig. 2), the
system user would first call up theoutline in Fig. 4(a). To
see all students with an ‘A’ grade in any offering of course
M23, the user would enter ‘A’ in the GRADE column and
“23’ in the COURSE column, and then in the STUNUM
column enter an example of student number, underlined
to indicate that it isan example only, and annotated with
a P to indicate that itis valuesof this columnthat are to be
printed or displayed [Fig. 4(b)]. The system responds by
displaying the numbersof all qualifying students as in Fig.
4(c).
Queries involving two or more tables are expressed by
using common values as examples of the attributes on
which the tables are to be matched. For example, the
names of all students in the previous query would be
retrieved with the query shown in Fig. 4(d). The system
responds by displaying the names, as in Fig. 4(e).
Through the use of various other graphic conventions,
the QBE user is able to express quite sophisticated queries. Predicates may include Boolean expressions ( e . g . ,
grade = ‘A’ or grade = ‘B’), comparison of two variables
( e . g . , grade better than a specific student’s grade), and
universal quantifiers ( e . g . , all grades = ‘A’). Both predicates and retrieved values can include aggregate operators, such as SUM, COUNT, and AVERAGE. The main goal
IBM I.
RES. DEVELOP. VOL. 25
NO. 5
0
SEPTEMBER 1%1
of the language, however, is to make the expression of
simple queries very easy. Tests conducted by Thomas
and Gould [65] suggest that QBE has indeed achieved this
objective.
ENROLL
I
COURSE
I
DATE
Data protection facilities
The consolidation of data accentuates the need to protect
the data from loss or unauthorized use. This protection is
in many cases secured (ironically) by re-introducing redundancy into the data, but in a controlled way.
P.12345
This section surveys the facilitieswhich data base
technology has provided for the protection of data. For
specific examples, we drawon IMS, whichiswidely
regarded as the DBMSwhich pioneered data integrity
technology, and on System R, which is believed to be the
first relational DBMS to incorporate a full range of data
protection facilities.
(b)
ENROLL
I
STUNUM
12345
31416
(d
Concurrent access control
Most DBMS permit a data base to be accessed concurrently by a number of users. If this access is not controlled, the consistency of the data can be compromised
(e.g., lost updates), or the logic of programs canbe
affected (e.g., nonrepeatable read operations).
Concurrent access control generally takes the form of
data locking, i . e . , giving a user exclusive access to some
part of the data base for as long as necessary to avoid
interference. Locking can, in general, lead to deadlock
among users, necessitating some methodof detecting and
breaking deadlocks.
In early releases of IMS, concurrent access was controlled through program scheduling, i . e . , a program intending to update certain segment types would not be
started until all programs updating these segment types
had completed. Under this regime, the granule of sharing
was effectively the segment type. The segment types to
be updated by a program were effectively locked when
the program wasstarted and unlocked whenit completed.
Deadlock did notoccur, since allresources required by a
program were obtained at one time.
Around 1974, aprogram isolation facility was added to
IMS which permitted programs updating the same segment type to run concurrently and which prevented
interference by locking individual data base records as
required. With program isolation, records are locked for a
program upon updating any item within the record and
unlocked when the program reaches a synchpoint, i . e . , a
point at which the changes made by the program are
committed to the data base. Deadlocks can occur and are
IBM I. RES. DEVELOP.
VOL. 25
0
NO. 5
SEFTEMBER 1981
12345
NAME
NUMBER
STUDENT
I
STUDENT
I
I
NAME
Figure 4 Query-by-exampledisplays.
resolved by selecting one of the deadlocked programs and
restarting itatits
most recent synchpoint (see next
section).
In addition tothe implicit protection provided by
program isolation, IMS permits programs to explicitly
lock and unlock segments and permits users to explicitly
request exclusive use of segment types and data bases
(for whatever reason) before a program is started.
51S
W. C. McGEE
A significant new capability in IMS is the ability for
programs running under different invocations of the system ( e . g . , in differentCPUs) to concurrently access a
common set of data bases. Additional computer capacity
may thus be applied to the processing of common data,
andthesystems
sharing thedata may betailored to
specific user needs while still retaining access tocommon
data.
System Remploys animplicit locking technique similar
to programisolation and like IMSallows theuserto
explicitly lock data objects at several levels of granularity. A novel feature is theability of the user tospecify one
of three consistency levels in reading data:
1. Read “dirty” data, i . e . , data subject to backout in the
event that another programupdating the data ends
abnormally (see next section).
2. Read “clean” but possibly unstable data,i . e . , data not
subject to backout, butsubject to update by other
users between successive reads by this user.
3. Read “clean,” stable data, i.e., data as it would be
seen by this user if running alone.
The lower levels of consistency required less locking and
produce lesslock contention, and may thusbeused,
whenthe application permits, to improve system
performance.
Recovery from abnormal program termination
The data baseupdating performed by a program does not
occur instantaneously (typically requiring several thousands of machine instruction executions); hence, there is
nonzero probability that the program will fail to complete
normally and as a resultleavethedatabase
in an
inconsistent state ( e . g . , crediting one bank account without a matching debit to another account).A program can
fail to complete for a variety of reasons, including illegal
instruction execution, termination by the systemto break
a loop or deadlock, and system failure.
516
W. C. McGEE
IMS protects against data inconsistency due to abnormal programtermination
by recording all database
changes made by aprogramin
a dynamic log. If the
program reaches a synchpoint, its dynamic log entries are
discarded, thereby committingits data changes. If the
program ends abnormally before a synchpoint is reached,
the system (after restart, if necessary) uses the dynamic
log to back out all data basechanges made by the program
since its most recent synchpoint. If abnormal end is due
program from
to a program error, the system prevents the
beingrescheduleduntil
an operator intervenes. Otherwise,thesystem
automatically restartsthe program.
IMS also protects against anomalous input and output
behavior which can result from abnormal program termination. If a program ends abnormally, the system discards any output messages produced by the program
since the most recent synchpoint and restores the program’sinputmessage
toaninputqueue.The
input
message is discarded and the output messages are delivered to their destinations only when a synchpoint is
reached.
In SystemR, recovery fromsystem failure isfacilitated
through the useof a novel dual-copy recording technique
andtheuse of “maps”or directorieswhichpoint
to
physical records on DASD.Updatedphysical records,
instead of being overwritten to their original locations,
are written to available DASD space,and a “current
map” is updated to point to them. At checkpoints, the
a “backupmap,”
and a new
currentmapbecomes
current map is started. A log is also kept of all updates
occurring between checkpoints. Following a system outage, the data base is restored to
a consistent state by
reinstating the backup mapand using the log to re-do the
updates of transactions which completed after the last
checkpoint.
The processing which a program does between syncha fundamentally important
points has turned out to be
concept in data basetechnology. This unit has come to be
known generally as a transaction, in recognition of the
factthat it is typically doneon behalf of one input
message or “transaction.” A transaction has been defined by Eswaran et al. at IBM’s Research Division [66]
as a unit of processing which transforms a consistent data
stateinto a new consistentstate. A transactionthus
behavesexternallyas if it were atomic,even though
internally it may extendoveran
arbitrarily long time
interval. Eswaran et al. also introduced the concept of a
schedule of interleaved actions of a set of concurrent
a schedule is consistent
transactionsandshowedthat
( i . e . , equivalent to the serial execution of the transactions) only if transactions can be divided into twophases:
a growing phase, inwhichlocks
are acquired, and a
shrinking phase, in which locks are released. Releasing
locks at the end
of a transaction thus proves to be a
special case of a more general procedure for achieving
schedule consistency.
The transaction concept has
been implemented in System R. Programs may use the BEGIN TRANSACTION and
END TRANSACTION statements
tobracket
processing
which is to be considered atomic. A COMMIT statement
permitsthe program tocreateintermediate
points of
consistency, analogous to IMS synchpoints, and a RE-
IBM I.
RES. DEVELOP.
VOL. 25
NO. 5
SEPTEMBER 1981
STORE TRANSACTION statement maybeused
to back
out all data base changes to the most recent point of
consistency.
Data recovery
A data base may be damagedin a variety of ways,
includingwrite errors, physicaldamage toa volume,
inadvertent erasing by an operator, and by an application
program error. The effect of such a loss on the user’s
installation can be mitigated through the use of data base
recovery facilities.
The basic approach to data base recovery in IMS is to
make periodic copies of the data sets that underlie the
data base and to record data base changes on the system
log. Inthe event of failure in a data set,
the latest copy can
be updatedwith changes logged since the copy was made,
thus restoring the data set to its condition at the point of
failure.
A database change is recorded in the system log in the
form of two segment images: the segment as it appeared
before the change and the segment as it appeared after the
change. Additionalinformation recorded includes the
identity of the program that made the change, the date
and time of entry, and the identity of the data base, data
set, and record being modified.
The copying of data bases is done with an image copy
utility program, which creates an image copy of the data
set on disk or tape. Data bases are normally copied just
after the data base has been initially loaded (to obviate
reloading in the case of failure) and immediately after
reorganization. (Copies made before a reorganization
cannot be used in recovery.) Copies may also be made at
intermediate points, as determined by the update activity
against the data base. Copying may be done “on-line,”
i . e . , while the data base is being used by other programs.
When data base damage is discovered, the affected data
sets maybe recovered by running a recovery utility
program. For each data set to be recovered, the utility
allocates space for a new version of the data set,loads the
latest image copy into this space, and then reads the
system login the forward (timeascending) sequence
looking for changes that have been made to the data set
after the image copy. For each such log entry, the “after”
image is used to replace the corresponding data in the
data set.
System R uses a similar approach to data base recovery. Provision is made for copying data base and log
checkpoint information to tape, whence it may be recalled to reconstruct the data base in the event of damage
to DASD contents.
IBM J. RES. DEVELOP.
VOL. 25
NO. 5
SEPTEMBER 1981
Access authorization
Consolidated data often constitute sensitive information
which the user maynotwantdivulged
to other than
authorized people, for reasons of national security, competitive advantage, or personal privacy. DBMS, therefore, provide mechanisms for limiting data access to
properly authorized persons.
The basic technique used by IMS to control access to
data is to control the use of programs which access data
and the use of transactions and commands which invoke
such programs. The system provides for the optional
definition of security tables which are used to enforce
control. These tables contain entries of the form (r,u),
where r is a class of resources, u is a class of users, and
the occurrence of an entry (r,u) signifies that user class u
is authorized to use resource class r. For example, the
entry (UPDATE, LTERMl) might signify that UPDATE transactions can be entered onlythroughterminal LTERMi.
“Users” who may be controlled in this manner include
both terminals and programs.
IMS also provides for the use of individual user passwords in order to further control the use of a terminal.
Through suitable definitions, passwords can be required
at sign-on to IMS and at the entry of individual transactions and commands.
In System R, access control is provided through two
mechanisms:
1. The view facility [67], which permits subsets of data to
be defined through SQL SELECT statements, and thus
restricts the user of the view to those subsets. SELECT
statements may contain predicates of the form fieldname = USER, to restrict access to tuples containing
the user’s identification code.
2. The grunt facility [68], which permits a system administrator to grant specific capabilities with respect to
specific data objects to specific users. Grantable capabilities with respect to relations include the capability
to read from the relation, to insert tuples, to delete
tuples, to update specificfields,and
to delete the
relation. The holder of a capability may also be given
authority to grant that capability to others, so that
authorization tasks may be delegated to different individuals within an organization.
Conclusion
Data base technology has evolved in response to user
needs to consolidate data in a secure, reliable way andto
provide easy end-user access to these data. Although its
accomplishments are impressive, it has yet to satisfactorily address a number of important requirements. Chief
W. C. McGEE
among these is the need to distribute the data base over
geographically separated computers. Such distribution is
being motivated by anumber of factors, such as theneed
to reduce responsetime for user access to the data base
and theneed to providelocal or autonomous control over
parts of the data base. Such distribution is, at the same
time, being enabled by the continuing reduction in the
cost of computer hardware, starting with mini-computers
sometenyears ago and continuing today with microprocessors, which promise to place substantial processing capability in the hands of individual users.
A number of challenging problems remain tobe solved
in distributing data bases. Theseinclude the maintenance
of replicated data, which most distributed data schemes
entail; the linking of different DBMS, having different
data models and languages, into cooperative networks;
and the provision of essentially continuous system availto on his data base
ability, so that the end user comes rely
system in the same way that he relies on his telephone
and utility services.
The solution of these problems promises to make the
next twenty-five years of data base technology as eventful and stimulating as the past twenty-five yearshave
been.
Acknowledgment
The author wishes to thank W. F. King for his help with
early versions of this paper and for information on
System R.
References
1 . J. H. Bryant and P. Semple, Jr., “GIS and File Managethe ACMNationalConference,
ment,” Proceedingsof
Association for Computing Machinery, New York, 1966, pp.
97-107.
2. Generalized Information System Virtual Storage (GZSNS)
General Information Manual, Order No. GH20-9035, avail-
518
W. C . McGEE
able through IBM branch offices.
3. E. W. Franks,“A Data Management System for TimeShared File Processing Using a Cross-Index File and SelfDefining Entries,” Proc. Spring Joint Computer Conference
(AFIPS), AFIPS Press, Montvale, NJ, 1966, pp. 79-86.
4. J. A. Postley, “The MARKIV System,” Datamation 14,
28-30 (1968).
5 . W. C. McGee, “The Information Management System IMSI
VS,” ZBM Syst. J. 16, 84-168 (1977).
6. ZMSIVS Version 1 General Information Manual, Order No.
GH20-1260, available through IBM branch offices.
7. System1360 Data Base Organization and Maintenance Processor Application Development Manual, Order No. GH200771, available through IBM branch offices.
8. D. Kroenke, Database Processing, Science Research Associates, Inc., 1977, pp. 280-293.
9. C. W. Bachman and S. B. Williams,“A General Purpose
Programming System for Random Access Memories,” Proc.
Fall Joint Computer Conference (AFIPS) 26, AFIPS Press,
Montvale, NJ, 1964, pp. 411-422.
10. CODASYL Data Base Task Group, April 1971 Report,
Association for Computing Machinery, New York.
11. CODASYL Programming Language Committee, CODAS YL
COBOL Journal of Development,Department of Supply and
Services, Government of Canada, Technical Services
Branch, Ottawa, Ontario, Canada.
12. R. F. Schubert, “Basic Concepts in Data Base Management
Systems,” Datamation 18, 42-47 (1972).
13. E. J. Emerson, “DMS 1100 User Experience,” Database
Management Systems, D. A. Jardine, Ed., North-Holland
Publishing Company, Amsterdam, 1974, pp. 35-46.
14. C. T. Davies, “A Logical Concept for Control and Management of Data,” Technical Report AR-0803-00, IBM Laboratory, Poughkeepsie, New York, 1967.
15. N. Raver, “File Organization in Management Information
Control Systems,” Selected Papers from File 68: Occasional
Publication No. 3, Swets and Zeitlinger, Amsterdam, 1968.
16. H. S. Meltzer, “Data Base Concepts and Architecture for
Data Base Systems,” IBM Report to SHARE Information
Systems Research Project, August 20, 1969.
17. R. W. Engles, “A Tutorial on Data Base Organization,”
Annual Review in AutomaticProgramming 7, Pergamon
Press, Inc., Elmsford, NY, 1972, pp. 1-64.
18. E. F. Codd, “A Relational Model of Data for Large Shared
Data Banks,” Commun. ACM 13, 377-387 (1970).
19. E. F. Codd, “Relational Completeness of Data Base Sublanguages,” CourantComputerScienceSymposia,Vol.6:
Data Base Systems, Prentice-Hall, Inc., Englewood Cliffs,
NJ, 1971.
20. W. Kent, “A Primer of Normal Forms,” Technical Report
TR.02.600, IBM Laboratory, San Jose, CA, December 1973.
21. R. F. Boyce, “Fourth Normal Form and its Associated
Decomposition Algorithm,” ZBM Tech. Disclosure Bull. 16,
360-361 (1973).
22. E. F. Codd, “Recent Investigations in Relational Data Base
Systems,” Proceedings ZFZP Congress 7 4 , North-Holland
Publishing Company, Amsterdam, 1974, pp. 1017-1021.
23. R. Fagin, “Multivalued Dependencies and a New Normal
Form for Relational Data Bases,” ACM Trans. Database
Syst. 2, 262-278 (1977).
24. R. Fagin, “Normal Forms and Relational Database Operators,” Proceedings of the 1979ACM SZGMOD International
Conference on the Management of Data, Association for
Computing Machinery, New York, 1979, pp. 153-160.
25. M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P.
Eswaran, J. N. Gray, P. Gritfiths, W. F. King, R. A. Lone,
P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B.
W. Wade, and V. Watson, “System R: Relational Approach
to Database Management,” ACM Trans. Database Syst. 1,
97-137 (1976).
26. M. W. Blasgen et a l . , “System R-An Architectural Overview,” ZBM Syst. J. 20, 41-62 (1981).
27. SQLIDS GeneralInformationManual, Order No. GH245012, available through IBM branch offices.
28. R. C. Goldstein and A. L. Stmad, “The MacAIMS Data
Management System,” Proceedings 1970 ACM SZGFZDET
Workshop on Data Description and Access,Association for
Computing Machinery, New York, 1970, pp. 201-229.
29. V. K. M. Whitney, “RDMS: A Relational Data Management
System,” Proceedings Fourth International Symposium on
Computer and Information Sciences (COZNS ZV), Plenum
Press, New York, 1972.
30. M. G. Notley, “The Peterlee ISL System,” ScientiJic Centre
Report UK-SC 0018, IBM Scientific Centre, Peterlee, United Kingdom, 1972.
31. G. Held, M. Stonebraker, and E. Wong, “INGRES-A
Relational Data Base System,” Proceedings of the National
Computer Conference, AFIPS Press, Montvale, NJ, 1975,
pp. 409-416.
32. B. Czamik, S. Schuster, andD. Tsichritzis, “ZETA: A
Relational Data Base Management System,” Proceedings of
the ACM PaciJc 75 Regional Conference, Association for
Computing Machinery, New York, 1975, pp. 21-25.
IBM J.
RES.DEVELOP.
8
VOL. 25
8
NO. 5
8
SEPTEMBER 1981
33. Query-by-ExampleProgramDescriptionlOperations
Manual, Order No. SH20-2077, available through IBM branch
offices.
34. H. M. Weiss, “The ORACLE Data Base Management
System,” Mini-Micro Syst. 13, 111-114 (1980).
35. D. C. Tsichritzis and A.Klug,
“The ANSIlX3/SPARC
DBMS Framework: Report of the Study Group on Data
Base Management Systems,” Info. Syst. 3, 173-192 (1978).
36. P. Chen, “The Entity-Relationship Model-Toward A Uni1, 9-36
fiedView of Data,” ACM Trans. Database Syst.
(1976).
37. DBIDC Data Dictionary General Information Manual, Order No. GH20-9104, available through IBM branch offices.
38. R. E. Wagner, “Indexing Design Considerations,” ZBM
Syst. J. 12, 351-367 (1973).
39. R. Bayer and E. M. McCreight, “Organization and Maintenance of Large Ordered Indexes,” Proceedings of the ACM
SIGFIDET Workshop on DataDescriptionandAccess,
Association for Computing Machinery, New York, 1970, pp.
107-141.
40. W.W. Peterson, “Addressing for Random-Access Storage,” IBM J. Res. Develop. 1, 130-146 (1957).
41. V. Y. Lum, P. S. T. Yuen, and M. Dodd, “Key to Address
Transform Techniques, A Fundamental Performance Study
on Large Existing Formatted Files,” Commun.ACM 14,
228-239 (197 1).
42. S. P. Ghosh and V. Y. Lum, “Analysis of Collision When
Hashing by Division,” Info. Syst. 1, 15-22 (1975).
43. D.D. Chamberlin, M. M. Astrahan, K. P. Eswaran, P. P.
Griffiths, R. A. Lone, J. W. Mehl, P. Reisner, and B. W.
Wade, “SEQUEL 2: A Unified Approach to Data Definition, Manipulation, and Control,” ZBM J. Res. Develop. 20,
560-575 (1976).
44. D. D. Chamberlin, “A Summary of User Experience with
the SQL Data Sublanguage,” Proceedings of the International Conference on Data Bases, British Computer Society
and University of Aberdeen, Aberdeen, Scotland, 1980, pp.
18 1-203.
45. E. F. Codd, “A Data Base Sublanguage Founded on the
Relational Calculus,” Proceedings of the ACM SZGFZDET
Workshop on Data Description, Access, and Control,Association for Computing Machinery, New York, 1971.
46. D. Bjorner, E. F. Codd, K. L. Deckert, and I. L. Traiger,
“The GAMMA-0 n-ary Relational Data Base Interface
Specification of Objects and Operations,” Research Report
RJ1200, IBM Research Division, San Jose, CA, 1973.
47. R. F. Boyce, D.D. Chamberlin, W. F. King, and M. M.
Hammer, “Specifying Queries as Relational Expressions:
SQUARE,” Data Base Management, J. W. Klimbie and K.
L. Koffeman, Eds., North-Holland Publishing Company,
Amsterdam, 1974, pp. 169-177.
48. R. F. Boyce, D. D. Chamberlin, W. F. King, and M.M.
Hammer, “Specifying Queries as Relational Expressions:
the SQUARE Data Sublanguage,” Commun. ACM 18,621628 (1975).
49. D. D. Chamberlin and R. F. Boyce, “SEQUEL-A Structured English Query Language,” Proceedings of the ACM
SIGFZDET Workshop on DataDescription,Access,
and
Control, Association for Computing Machinery, New York,
1974, pp. 249-264.
50. R. A. Lorie and B. W. Wade, “The Compilation of a High
Level Data Language,” ResearchReportRJ2598,
IBM
Research Division, San Jose, CA, 1979.
51. M. W. Blasgen and K. P. Eswaran, “Storage and Access in
Relational Data Bases,” IBM Syst. J. 16, 363-377 (1977).
IBM I.
RES. DEVELOP.
VOL. 25
NO. 5
SEPTEMBER 1 9 8 1
52. P. G. Selinger, M. M. Astrahan, D.D. Chamberlin, R. A.
Lorie, and T. G.Price, “Access Path Selection in a Relational Database Management System,” Proceedings of the ACM
SZGMOD International Conference, Association for Computing Machinery, New York, 1979, pp. 23-34.
53. RaymondA. Lorie and Jorgen F. Nilsson, “An Access
SpecificationLanguage for a Relational Data Base System,”
IBM J. Res. Develop. 23, 286-298 (1979).
54. W. C. McGee, “Generalization: Key to Successful Electronic Data Processing,” J. ACM 6, 1-23 (1959).
55. SHARE 7090 9PAC, Part I: Introduction and General
Principles, Order No. J28-6166, IBM 7090 Programming Systems,
Systems Reference Library, 1961.
56. J. P. Fry and E. H. Sibley, “Evolution of Data-Base
Surv. 8, 7-42
Management Systems,” ACMComputing
(1976).
57. H. Leslie, “The Report Program Generator,” Datamation
13, 26-28 (1967).
58. Zntroduction to RPG IZ, Order No. GC21-7514, available
through IBM branch offices.
59. J. H. Bryant, “AIDS Experience in Managing Data-Base
Operation,” Proceedings of the Symposium on Development and Management of a Computer-Centered Data Base,
System Development Corporation, Santa Monica, CA, 1 9 6 4 ,
pp. 36-42.
60. Naval Command Systems Support Activity, “User’s Manual for NAVCOSSACT Information Processing System
Phase I,” NAVCOSSACT Document No. 90S003A, CM-51,
IBM Federal Systems Division, Bethesda, MD, July 1%3.
61. Intelligence Data Processing System Formatted
File System, U S . Navy Fleet Intelligence Center and IBM Federal
Systems Division, Bethesda, MD, May 1963.
62. NMCS Information Processing System 360 Formatted File
System (NIPS FFS), National MilitaryCommand System
Support Center, CSMVM 15-74, IBM Federal Systems
Division, Bethesda, MD, October 1974. (Nine volumes.)
63. M. M. Zloof, “Query by Example,” Proceedings of the
National Computer Conference (AFIPS) 44, AFIPS Press,
Montvale, NJ, 1975, pp. 431-437.
64. M. M. Zloof, “Query-by-Example: A Data Base Language,” IBM Syst. J. 16, 324-343 (1977).
65. J. C. Thomas and J. P. Gould, “A Psychological Study of
Query by Example,” Proceedings of the National Computer
Conference (AFIPS) 44, AFIPS Press, Montvale, NJ, 1975,
pp. 439-445.
66. K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger,
“On the Notions of Consistency and Predicate Locks in a
Data Base System,” Commun. ACM 19, 624-633 (1976).
67. D. D. Chamberlin, J. N. Gray, and 1. L. Traiger, “Views,
Authorization, and Locking in a Relational Data Base System,” Proceedings of the National Computer Conference
(AFIPS) 44,AFIPS Press,Montvale, NJ, 1975, pp. 425-430.
68. P. Griffiths and B. W. Wade, “An Authorization Mechanism
for a Relational Database System,” ACM Trans. Database
Syst. 1, 242-255 (1976).
R e c e i v e d D e c e m b e r 23, 1980; revised March 16, 1981
The a u t h o r i s l o c a t e d a t IBM Data Processing Division
Sun Jose, California
laboratory, 555 BaileyAvenue,
95150.
W. C. McGEE
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement