Ph.D
Emerald Database—Integrating Transactions,
Queries, and Method Indexing into a system
based on mobile objects
Ph.D. Thesis
Niels Elgaard Larsen,
DIKU,
Department of Computer Science,
Faculty of Science
University of Copenhagen
Universitetsparken 1
DK-2100 Copenhagen Ø
E-mail: [email protected]
May 14, 2006
Abstract
The Emerald database is a distributed, object-oriented database system integrated seamlessly with the Emerald system. It has an associative query
language that can exploit indexes on paths of functions (method indexing) (objects are encapsulated) and avoids impedance mismatch with the
application-language.
Significant effort has been put into producing a seamless integration of the
database system into the existing Emerald language: Database operations are
legal wherever Emerald statements are legal, all Emerald objects can be used
in queries, set-expressions in queries are first-class objects, the strong typing
of the Emerald type system also covers the query language (i.e., application
programs and their queries are type-checked at compile-time), the scoping
rules of Emerald are extended to also cover the query language (i.e., the scope
of an expression covers identifiers outside the expression, range-variables in
queries, identifiers outside the query etc.). It is possible to have queries (and
other database operations) inside object literals in expressions in queries.
At run-time, every query in an application program is optimized to take
advantage of existing indexes. Queries are compiled and dynamically recompiled when necessary due to changes in the index population.
Objects in the database are mobile. Transactions with ACID properties
on mobile objects are supported. Transactions and normal Emerald proceses
can coexist and communicate in the same run-time system without violating
ACID properties.
Any Emerald functions in a type (which cannot have side effects) can be
used for indexing, even if located in a mobile object or invokes other objects.
We present models and implementation for method indexing, transactions, and queries.
The database system imposes almost no overhead on the general use of
the Emerald system.
Contents
1 Introduction
1.1 Our proposed model . .
1.2 Work Performed . . . . .
1.3 Contributions . . . . . .
1.4 Technical Contributions
1.5 Main Thesis . . . . . . .
1.6 Results . . . . . . . . . .
1.7 Limitations . . . . . . .
1.8 Thesis Layout . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
10
10
11
12
12
12
13
13
2 Background
2.1 Choice of platform . . . . . . . . . . . . . . . . .
2.2 Distributed Transactions . . . . . . . . . . . . . .
2.3 Persistence . . . . . . . . . . . . . . . . . . . . . .
2.4 Method Indexing . . . . . . . . . . . . . . . . . .
2.4.1 Examples . . . . . . . . . . . . . . . . . .
2.4.2 Methods, Computations, and Concurrency
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14
14
16
16
16
18
21
3 The
3.1
3.2
3.3
3.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emerald run-time system
The Emerald Language . . . . . .
Processes and concurrency . . . .
Object mobility and distribution .
The Emerald Kernel . . . . . . .
4 The Emerald Language
4.1 Introduction . . . . . . . . . .
4.2 Emerald Objects . . . . . . .
4.2.1 Object Creation . . . .
4.2.2 One-of-a-kind Objects
4.2.3 No Metaclasses . . . .
4.3 Abstract Types . . . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
22
23
24
24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
28
30
30
33
35
36
CONTENTS
4.4
4.5
4.6
4.3.1 Type Definition . . . . . . . . . . . . . .
4.3.2 Type Conformity . . . . . . . . . . . . .
4.3.3 Relationship between Types and Objects
4.3.4 Separation of Type and Implementation
4.3.5 Support for Polymorphism . . . . . . . .
Concurrency . . . . . . . . . . . . . . . . . . . .
Other Features . . . . . . . . . . . . . . . . . .
4.5.1 Block Structure and Nesting . . . . . . .
4.5.2 Syntax . . . . . . . . . . . . . . . . . . .
Final Remarks . . . . . . . . . . . . . . . . . . .
4.6.1 Current Status . . . . . . . . . . . . . .
4.6.2 Conclusions . . . . . . . . . . . . . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
37
44
46
48
53
55
56
57
59
59
60
5 Persistence in the Emerald Database
65
5.1 Example: A Persistent Tree . . . . . . . . . . . . . . . . . . . 66
6 Distributed transactions
6.1 The Transaction Model . . . . . . . . . . . . . . . . . . . .
6.2 Emerald Transaction Model . . . . . . . . . . . . . . . . .
6.2.1 Creating transactions . . . . . . . . . . . . . . . . .
6.2.2 Inter-transaction synchronization . . . . . . . . . .
6.3 Concurrency control . . . . . . . . . . . . . . . . . . . . .
6.4 Synchronization . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1 Deadlock detection . . . . . . . . . . . . . . . . . .
6.4.2 Synchronization of transactions and processes . . .
6.4.3 Serializing transactions . . . . . . . . . . . . . . . .
6.4.4 Special objects . . . . . . . . . . . . . . . . . . . .
6.5 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 Emerald Transaction Implementation . . . . . . . . . . . .
6.7 Interaction between the compiler and the run-time system
6.8 Transactions and persistence/mobility . . . . . . . . . . . .
6.9 Performance . . . . . . . . . . . . . . . . . . . . . . . . . .
6.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Queries
7.1 The Query Model . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Type System . . . . . . . . . . . . . . . . . . . . . .
7.1.2 Collections and Indexes . . . . . . . . . . . . . . . . .
7.1.3 Query Language . . . . . . . . . . . . . . . . . . . .
7.1.4 Interface between the Programming Language and the
Query Language . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
68
71
72
73
74
75
78
79
83
83
83
86
87
88
88
91
.
.
.
.
92
93
94
94
95
. 96
CONTENTS
7.2
7.3
7.4
7.5
7.6
7.7
3
7.1.5 An Example . . . . . . . . . . . . . . . . . . . . . . . .
Nested Database Operations . . . . . . . . . . . . . . . . . . .
7.2.1 Advantages of Nested Database Operations . . . . . . .
Architecture of the Runtime System . . . . . . . . . . . . . . .
The Emerald Query Model . . . . . . . . . . . . . . . . . . . .
7.4.1 Indexes . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.2 Query Architecture Background . . . . . . . . . . . . .
Architecture of the Current Version . . . . . . . . . . . . . . .
7.5.1 Interface between the Emerald Language and the Query
Language . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.3 Incremental Compilation of Queries in Emerald . . . .
7.5.4 Implementation of Nested Database Operations . . . .
7.5.5 Propagation of References in Nested Database Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Limitations/Further Work . . . . . . . . . . . . . . . . . . . .
7.6.1 Index Maintenance . . . . . . . . . . . . . . . . . . . .
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Distributed Method Indexing
8.1 Indexes on constants and local variables . . . . . . . . . . .
8.2 Indexes on functions that only depend on the local state . .
8.3 Indexes on functions that only depend on the local state and
transitively immutable objects . . . . . . . . . . . . . . . . .
8.4 Indexes on functions that depend on mutable objects . . . .
8.5 Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5.1 Non-database applications should not pay for database
applications . . . . . . . . . . . . . . . . . . . . . . .
8.5.2 Objects with simple indexing functions should not pay
for objects with complex indexing functions . . . . .
8.5.3 Static classification of indexes . . . . . . . . . . . . .
8.6 The Dependency graph . . . . . . . . . . . . . . . . . . . . .
8.6.1 Structure . . . . . . . . . . . . . . . . . . . . . . . .
8.7 Designing indexes . . . . . . . . . . . . . . . . . . . . . . . .
8.7.1 KeyVal objects . . . . . . . . . . . . . . . . . . . . .
8.7.2 The DB-bit . . . . . . . . . . . . . . . . . . . . . . .
8.7.3 Update-objects and update-vectors . . . . . . . . . .
8.8 Deletions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.9 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . .
8.9.1 One process, one transaction . . . . . . . . . . . . . .
8.9.2 Different transactions . . . . . . . . . . . . . . . . . .
99
100
101
103
104
104
104
106
106
106
106
107
108
108
108
109
111
. 113
. 113
. 114
. 115
. 116
. 116
.
.
.
.
.
.
.
.
.
.
.
.
116
117
117
117
121
122
122
123
123
124
124
124
CONTENTS
4
8.9.3 Multiple processes, one transaction . . . . . . . . .
8.10 Performance . . . . . . . . . . . . . . . . . . . . . . . . . .
8.11 Concurrency control and indexes . . . . . . . . . . . . . .
8.11.1 Using the index facilities without using transactions
8.12 Optimization of the invalidation scheme . . . . . . . . . .
9 Evaluation and Conclusion
9.1 Distributed Method Indexing
9.2 Distributed Transactions . . .
9.3 Queries . . . . . . . . . . . . .
9.4 Performance Overheads . . . .
9.5 Future work . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
124
124
125
128
129
.
.
.
.
.
.
.
.
.
.
131
131
132
132
132
134
List of Figures
2.1
Use of Nested Database Operation . . . . . . . . . . . . . . . 15
3.1
3.2
A Remote Invocation . . . . . . . . . . . . . . . . . . . . . . . 26
Object Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
Example use of an Object Constructor . . . . . .
Creating several integer node objects . . . . . . .
An object that creates integer nodes . . . . . . .
A Unique Id Server . . . . . . . . . . . . . . . . .
“Metaclass hierarchies” in Emerald and Smalltalk
An example of a type object . . . . . . . . . . . .
Some compiler types . . . . . . . . . . . . . . . .
Three objects with identical types . . . . . . . . .
An integer node creator with getSignature . . . .
An alternative implementation of an Id Server . .
A polymorphic operation . . . . . . . . . . . . . .
A List type generator . . . . . . . . . . . . . . . .
Some polymorphic objects . . . . . . . . . . . . .
A dining philosopher . . . . . . . . . . . . . . . .
An organizer object . . . . . . . . . . . . . . . . .
Concurrent programming in Emerald . . . . . . .
Block structure and object constructors . . . . . .
5.1
Operating on a persistent tree . . . . . . . . . . . . . . . . . . 67
6.1
6.2
6.3
6.4
6.5
6.6
6.7
Database transaction . . . . . .
Two database transaction . . .
An Emerald process . . . . . . .
An Emerald transaction . . . .
A failure-handler . . . . . . . .
A sample set of transactions . .
Transactions operating on a tree
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
32
33
34
35
37
40
42
46
47
50
51
53
54
54
56
58
69
69
72
72
74
76
77
LIST OF FIGURES
6
6.8
6.9
6.10
6.11
6.12
6.13
A parent process waiting for a transaction to
Snapshot of Deadlock detection . . . . . . .
Two transactions execute one after the other
Wrong way . . . . . . . . . . . . . . . . . .
Process to transaction signaling (wrong) . .
Serializing transactions in Emerald . . . . .
complete
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
82
84
85
85
86
87
7.1
7.2
7.3
7.4
7.5
7.6
Example: Iterators and indexes . . . .
The Student/Teacher/Lunch database
Number of red parts for supplier . . . .
A Nested Query . . . . . . . . . . . . .
Number of red parts for supplier (II) .
Architecture . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
93
99
101
102
103
104
8.1
An example. Indexes on C1.Fa, C1.Fb, C2.Fa, C2.Fc, and
C3.Fd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Teacher type . . . . . . . . . . . . . . . . . . . . . . . . . . .
Teacher type 2 . . . . . . . . . . . . . . . . . . . . . . . . .
Teacher, grouped by salary . . . . . . . . . . . . . . . . . . .
Teacher object, depending on mutable salary object. . . . . .
Representation of teachers and courses . . . . . . . . . . . .
Calls in Larsen.close colleagues . . . . . . . . . . . . . . . .
Dependency graph for close colleagues . . . . . . . . . . . .
Set-of-tree dependency graph . . . . . . . . . . . . . . . . .
Set-of-tree dependency graph for Larsen and Jensen . . . . .
Set-of-set dependency graph for Larsen and Jensen . . . . .
A dependency graph . . . . . . . . . . . . . . . . . . . . . .
A dependency graph with DB-flags Update-vectors . . . . .
A lazy deletion . . . . . . . . . . . . . . . . . . . . . . . . .
Indexes without transactions . . . . . . . . . . . . . . . . . .
Indexing and transactions . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
112
113
114
115
115
118
119
119
120
120
121
125
126
127
128
129
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
8.11
8.12
8.13
8.14
8.15
8.16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
6.1 Signaling from process to waiting process . . . . . . . . . . .
6.2 Performance of constant transaction tasks (on a SPARC 10)
6.3 Performance of transaction, tnum == 20, inum == 50 . . . .
6.4 Performance of distributed transactions . . . . . . . . . . . .
7.1
7.2
7.3
.
.
.
.
80
89
90
90
Performance of basic tasks . . . . . . . . . . . . . . . . . . . . 109
Performance of selected tasks . . . . . . . . . . . . . . . . . . 110
Population of test-database . . . . . . . . . . . . . . . . . . . 110
7
Chapter 1
Introduction
This dissertation is about integrating database concepts such as indexing,
transactions, concurrency control, and queries with object-oriented languages.
In particular, we are interested in distributed and mobile database facilities
integrated with an object-oriented language, specifically, Emerald [Hut87a].
It is our thesis that object orientation will make it simpler to design and implement mobile database facilities. We show this by proposing new facilities
and by implementing them in an object oriented language, specifically the
Emerald system [Jul88, Hut87b]. Mobile database facilities include transactions, indexes, and queries.
One of the main design principles of our project is to make different database facilities orthogonal to each other. We consider persistence to be orthogonal to transactions: applications might need persistence but not transactions (i.e., single user systems), or transactions but not persistence (i.e., high
performance simulations), or indexing but not transactions or persistence
(single user applications that store their data in their own file format)
For the facilities to be orthogonal we require that an application must only
pay performance-wise for the facilities that it uses. That is, if an application
does not make use of a facility, our system should not incur a significant
overhead compared to a system that does not support such a facility.
Another design principle is to integrate the facilities in the spirit of the
existing language and system, including issues as strong typing, performance,
and distribution.
We present a proof-of-concept design and implementation. We consider
four main approaches to this:
Adding object facilities to an existing database Most existing DBMS’es
are relational or object-relational. We are interested in real objects.
8
CHAPTER 1. INTRODUCTION
9
We want to experiment with low-level implementations and therefore
need full control of the DBMS. This can be obtained in two ways:
Hooks, upcalls, triggers, etc. Some database systems have a way
of letting applications handle transactions, distribution etc. We
did not, however, find any existing database system that was powerful enough to handle the facilities that we wanted to research.
Access to source code If we had access to the source code of a database, it would be possible to modify the database to our needs.
But when the project was started we did not have access to the
source code of a database systems that was useful to us.
Adding database facilities to an existing object-oriented system Indexes as data-structures can be implemented in any object-oriented
system. But a database system optimizes queries. That is, a query can
be written in the object-oriented language, but the database system
must be able to dynamically optimize and execute it. This limits us
to systems with reflection, e.g., Java or PJama [ADJ+ 96] or dynamic
binding e.g., Smalltalk.
When our work was started, we found no existing object-oriented system that could serve as a basis for our system. It would, however, be
of interest to further test our ideas by basing our system on a system
like PJama, LINQ, or Java Native Queries [CR05a, CR05b].
Combining an existing database and object-oriented system It is possible to access existing database systems from object-oriented programming languages (e.g., using JDBC).
But accessing a database using a standard protocol (JDBC, ODBC,
etc.) makes it impossible to integrate the database facilities with the
object-oriented language. This approach would also make it difficult to
make the database facilities orthogonal.
Writing a database system and object-oriented system from scratch
This allows us to experiment with database systems, programming languages, and object-oriented systems.
We have chosen to add database facilities to the Emerald system. This is a
combination of the second and fourth approach, because the Emerald system
does not have facilities that makes it possible to implement our prototype in
the Emerald language, but Emerald is a research project that we are very
familiar with. We can therefore use the Emerald system as a building block
that we can change to serve our needs.
CHAPTER 1. INTRODUCTION
1.1
10
Our proposed model
We propose a model for integrating database concepts into object-oriented
systems with fine-grained mobility such as Emerald.
We have extended the Emerald system (both the Emerald compiler and
run-time system) to support the model and implemented a fully working
database system with mobile transactions, concurrency control, queries, method
indexing, and persistence in the extended Emerald system.
We have evaluated the prototype by demonstrating the facilities measuring the performance of important operations, and measuring the overhead of
the extended Emerald system relative to the original Emerald system.
This dissertation presents the model and the implementation in a specific
system, Emerald. The main areas are:
Indexing The work on indexing is a consequence of extending the concept of
indexing from Relational databases to object-oriented databases. Relational Database Management Systems (RDBMS) [BG81] store records
in relations. Each record consists of a set of attributes, defined by the
domain of the relation. Attributes are simple values. Indexes index
attributes in domains.
Transactions The work on transactions deals with the complications that
mobility adds to concurrency control, recovery, and deadlock detection.
Queries The work on queries deals with how query optimization can be
integrated into an object-oriented system.
We propose models for transactions, distributed method indexing, and
queries with mobile objects. These models are based on systems with object-oriented languages and distributed run-time systems (virtual machines).
The proposed model should be applicable to all fine-grained object-oriented
systems.
We demonstrate the viability of our ideas by implementing these models in
a prototype based on the Emerald system and performing initial performance
measurements.
1.2
Work Performed
The basic architecture, persistence model, and query language is based on
the work presented in [Lar92a].
In this thesis we propose new models for:
CHAPTER 1. INTRODUCTION
11
• Indexing methods on distributed objects
• Distributed transactions
• Implementing the addition of database querying capabilities and nested
database operations to an object oriented language.
To evaluate the new models we have:
• Extended the Emerald compiler with the “type as string” primitive,
that converts any Emerald type to a string representation. This is
used at run-time to generate code for abstract types that can be linked
dynamically.
• Integrated the database query language with the Emerald programming
language. We have extended the syntax of the Emerald language and
implemented strong type checking of database queries.
• Modified the Emerald compiler to support concurrency control necessary for transactions.
• Implemented distributed deadlock detection for transactions on mobile
objects in the run-time system.
• Modified the Emerald runtime system to support distributed transaction.
• Modified the Emerald compiler to add functionality that let a process
track objects, it modifies.
• Modified the Emerald runtime system to let objects get notified after
they are modified.
• Implemented indexing of methods on distributed objects.
• Measured performance of simple transactions, queries, etc.
1.3
Contributions
Our contributions are:
• A novel model that allows any expression in the programming language
to be indexed. The proposed model supports mobile objects.
• A model for integration of distributed transactions on mobile objectsn
and standard processes that do not have transactions properties.
CHAPTER 1. INTRODUCTION
1.4
12
Technical Contributions
• We ported the Emerald compiler and runtime-system to the 64-bit
Digital Alpha architecture. This was the first time Emerald had been
ported to a non 32 bit platform. This included implementing a loader
for the OSF/1 COFF format with code executing in the TEXT segment.
• We have implemented invocation piggy-backing of location information
in Emerald to improve performance in highly mobile environments.
• We have implemented remote moves in Emerald.
• We have implemented the fix, refix, isfixed, and unfix operations in
the Emerald compiler and the Emerald run-time system.
1.5
Main Thesis
Our goal is to propose a model for integrating advanced database facilities
into existing object-oriented systems that support mobility as to provide a
system that fully integrates applications and database operations.
We focus on the following areas:
• Transactions with mobile objects.
• Mobile object method indexing.
• Integration of queries optimization and object-oriented systems.
We produce a fully working and efficient prototype based on the model.
1.6
Results
We have created a working 32- and 64-bit prototype of a database-system
with persistence, distributed transactions, and integrated query language
implemented in Emerald.
We have integrated database queries, transactions, persistence into the
Emerald language and run-time system without compromising keeping the
spirit of Emerald.
The overhead on Emerald applications, that does not take advantage of
the database facilities is low. We have performed basic usability tests by
implementing and executing Emerald database applications.
These results clearly demonstrate our thesis.
CHAPTER 1. INTRODUCTION
1.7
13
Limitations
The persistence layer is only a proof of concept. Therefore we have not
measured performance of persistence.
We have not been able to do extensive performance measurements on
modern equipment.
1.8
Thesis Layout
In chapter 6 we propose a model for transactions in systems with mobile
objects. This includes mobile dead-lock detection.
We propose a model for mobile transactions specific to the Emerald system. This includes synchronization between transaction and synchronization, between transaction and applications, special Emerald objects, and the
Emerald run-time system. The chapter ends with an evaluation of the performance and overhead of mobile transactions in Emerald.
In chapter 7 we propose a model for query optimization in object-oriented
systems and a model for query optimization specific to the Emerald system.
The chapter ends with an evaluation of the performance and overhead of
queries in Emerald.
In chapter 8 we propose a model for method indexing on mobile objects.
The model is based on distributed dependency graphs that represent information about which operations will invalidate which entries in the implementation of indexes. The overhead of the support for Distributed Method
Indexing is measured.
In chapter 9 we summarize the evaluation from chapter 6 to 8.
Chapter 2
Background
2.1
Choice of platform
We were inspired by the Emerald system because:
• It is object-oriented
• It supports distribution
• It supports fine-grained mobility
• The distributed features are fully integrated into the Emerald programming language.
The Emerald Database (EDB) [Lar92b] demonstrated that an ODBMS
based on the Emerald system could provide powerful features. For example:
• A powerful and intuitive integration between programming language
and query language. This also means a natural integration between
application programs and database operations.
• Support for nested database operations. That is, application code can
contain queries. Queries can contain application code, that again can
contain queries etc. This allows application programmers to use database queries in all objects, including objects that will be stored in the
database.
The EDB consisted of a pre-compiler, index implementations, and a synchronization object implemented in the Emerald language. This meant that
the Emerald system (compiler and run-time system) did not have to be
changed.
14
CHAPTER 2. BACKGROUND
15
class person {
string: name;
integer: yearOfBirth
person: mother, father;
person function getSibling()
function getOldestSibling {
return(DB.select(person where person.father=father AND
person.mother=mother ORDER by yearOfBirth).getFirst())
}
}
main {
Set.of(person): OldestSiblings;
DB.create(person)
DB.createIndex(person.father);
DB.createIndex(person.mother);
DB.createIndex(person.getOldestSibling);
... initialize database
OldestSiblings=DB.select(person where person.getOldestSibling=person);
}
Figure 2.1: Use of Nested Database Operation
An ODBMS would also need support for transactions. Because we focused
on systems with mobile objects, transactions would have to support mobile
objects.
In the EDB only constant functions (in our terminology this is functions
that always return the same) could be indexed. We felt that in an objectoriented database system it should be possible to make an index on any
expression in the programming language.
Based on the experiences with EDB we wanted the query language and
the programming language to be fully integrated. That is, the programming language compiler should be extended to handle the query language
(see 7.4.2).
CHAPTER 2. BACKGROUND
2.2
16
Distributed Transactions
We consider an object-oriented system with active, distributed, mobile objects. That is a system with a set of nodes and a set of objects. Objects
can move from node to node. Objects can invoke methods on objects on
other nodes. The system can have many processes (threads) executing concurrently on each node. An object can move to another node even if it is
part of a running process. Such a system must already have a mechanism for
handling synchronization (semaphores, monitors, Java-like synchronization
etc.)
We extend such a system with atomic transactions [GR93] (recovery and
concurrency control). Compared to non-distributed transactions this introduces new issues:
• Traditional distributed issues such as consistency of commits, deadlock
detection.
• Transactions abort and recovery
• Performance
2.3
Persistence
Most of the work on persistence was done in [Lar92a]. Variables can be
tagged persistent. When an application makes an object persistent (using
the “store” command), objects reachable from this object through persistent
references (variables) are made persistent.
2.4
Method Indexing
This section introduces indexing [KKD89] in object-oriented databases. In
this section we do not rely on any object-oriented language or OODBMS.
The examples are presented in pseudo-emerald.
OODB´s store objects. Objects have methods (also called operations,
functions etc.). Therefore indexes in OODB´s should index methods. In
some systems objects also have attributes (fields), however, we consider object-oriented attributes a special case of methods. The attribute “A” is
equivalent to methods ”getA” and ”setA” that accesse and update the variable “A”. Object-oriented indexes are a generalization of relational indexes.
In object-oriented terms a relational index is an index on a set of records
CHAPTER 2. BACKGROUND
17
of simple (direct, atomic) types. This generalization causes several complications: Objects can reference each other, so an update of one object could
mean that an arbitrary large part of an index should be invalidated, if an
index entry is based on a method that contains non-trivial code, new synchronization issues arise etc.
In databases we have collections of objects. A collection is an unordered
set.
An index on a collection is defined by an operation that can be performed
on all objects in the collection. The operation returns a value. In general
this operation is not an operation in an object-oriented language. It is an
expression in a OODB language that contains references to an object from
the collection.
For example, assume we have a collection of objects, C, and an index, I,
defined by an indexing operation, IO.
This index, I is an object that given a target-value, efficiently can return
a collection of objects, Ob, for which IO(Ob) = T.
For example we might want to make an index on the first name of persons
in a collection:
The collection is a collection of persons.
const Person ==
type persont
function firstname -> [string]
function lastname -> [string]
function dateOfBirth -> [integer]
function age -> [integer]
function yearsalary -> [integer]
function monthsalary -> [integer]
function adress -> [Address]
end persont
PersonColl = collection.of[Person]
We want to search the collection on names so we create an index:
nameIndex = Index.create[PersonColl.firstname]
The index operation is PersonColl.name. For all objects in the PersonColl
collection it is possible to find the first name.
We now can search the PersonColl collection:
CHAPTER 2. BACKGROUND
18
Pollys = PersonColl\P where P.firstname = “Polly”
The search engine in the DB can use nameIndex to find all Pollys:
nameIndex.find [“Polly”]
nameIndex finds all objects in PersonColl for which the result of the index
operation PersonColl.firstname is the target value, ”Polly”.
2.4.1
Examples
An OODB can support different levels of indexing.
Constant
Consider an index on PersonColl.dateOfBirth:
Person
const:
dateOfBirth
The constant attribute dateOfBirth is indexed. In the figure this is shown
by underlining dateOfBirth.
This is simple to support because indexes are never invalidated.
Variable Attribute
Consider an index on PersonColl.age:
Person
const:
var: age
Variables change values. This means that we must be able to invalidate
indexes that depend on a given value. In this case we must invalidate all
indexes on age on collections that contain this Person-object.
Maintaining the index data-structure is a well-known problem. But in
object-oriented systems there can be two problems not known in traditional
database systems:
CHAPTER 2. BACKGROUND
19
• Updates of values used in indexes must be detected. In traditional
database systems such updates are made by explicit operations on the
DBMS. In the EDB the DB layer is transparent. Objects are not aware
that they are in a DB. Therefore they cannot explicitly notify the
DB. Therefore we must have a mechanism that allows the OODBMS
to detect such updates. We consider conservative approaches. For
example, each time an object that contains a variable used in an index
is updated the OODBMS could be notified.
• When an update of an indexed value is detected, all indexes that depend
on this value must be found. In relational databases this is done using
meta-data. There is a collection (”relation”) of indexes. This is also
possible in an OODB.
Local Functions and Expressions
Person
var: monthSalary
const: name
fun: yearSalary
Local functions can be indexed:
YearSalaryIndex = Index.create[PersonColl.yearSalary]
The yearSalary function depends on the variable monthSalary. Therefore
YearSalaryIndex must be invalidated when monthSalary is updated.
It is not necessary to make an expression a function in an operation to
index it. We could make the index:
YearSalaryIndex = Index.create[12*PersonColl.monthSalary]
This is a generalization of indexing in relational databases. In a relational
database it is possible to index on a set of keys in a relation. In an OODB
this can be accomplished by indexing on an expression that contains multiple
attributes and ensures that different sets of keys cannot give the same result.
For example:
name salaryIndex = Index.create[name + monthSalary.tostring]
or
name salaryIndex = Index.create[Pair.create[name, monthSalary.tostring]]
CHAPTER 2. BACKGROUND
20
Name
Person
const: name
fun: getlastname
const: firstname
const: lastname
fun:lastname = return (name.lastname)
Multiple Objects
Methods (functions) in objects can be indexed. Functions can reference other
objects.
We can have an index on the method ”PersonColl.lastName” or the expression ”name.firstname + name.lastname”. Because all references from
in the method and the expression are local, this can be handled as local
constants.
Person
const: name
var: address
Address
var: street
var: city
This more complicated. If there is an index on ”PersonColl.address.city”
that index can be invalidated not only if an instance of Person is updated,
but also if an instance of Address is updated.
Person
const: Polly
var: address
Address
var: Strandvej
var: Copenhagen
Person
const: Peter
var: address
The same address can be shared by many persons. This means that an
update to an Address object can invalidate not only many indexes, but also
many entries in the same index.
Summary
In general an entry in an index can depend on any number of objects (variables in objects) and an update in one object can invalidate any number of
CHAPTER 2. BACKGROUND
21
entries in any number of indexes. We discuss several ways to handle this
complexity efficiently.
2.4.2
Methods, Computations, and Concurrency
Consider an index on the method ”foo” in the class fooClass:
class fooClass{
var a:barType
var b:barType
fun foo: barType{
return complexFuntion(a,b)
}
fun setA(na:barType){
a=na;
}
}
We do not know anything about barType. If we want to sort instances of
fooClass on an index on ”foo” we must require that ”barType” has a partial
order.
We must also require that ”complexFunction” terminate for all ”a”’s and
”b”´s.
Furthermore we should be able to estimate the cost in time of ”complexFunction”. Otherwise a query optimizer cannot decide, e.g., whether or not
to use the index on ”foo”.
Chapter 3
The Emerald run-time system
In this chapter we give an overview of the Emerald system and the relevant parts of its implementation. We focus on areas that are of interest for
database systems.
The Emerald system is well suited for supporting persistence and concurrency control. It is already concurrent, distributed, efficient (strongly typed,
compiled), and has a simple object model. This chapter describes aspects of
the Emerald system relevant for this paper. For further details see [Hut87b],
[HRB+ 87], and [JLHB88].
The Emerald system consists of a compiler and a run-time system. The
compiler generates native code. The run-time system, which runs in a single
UNIX process, dynamically loads programs into its own address space and
executes them. The compiler-generated code calls the run-time system for
certain tasks (e.g., inter-node communication).
3.1
The Emerald Language
The language syntax is inspired by Algol/Simula ([Nau60, BDrMN73]). All
data is represented by objects as in Smalltalk ([GR83]). In contrast to
Smalltalk, Emerald is strongly typed and supports polymorphic types. This
mean that if a database query language is integrated with the Emerald programming language the query language will also be strongly typed.
The Emerald programming language has been efficiently implemented
compared to traditional compiled languages ([Hut87a]).
Each Emerald object consists of ([Hut87b, RTL+ 91]):
• A name, which uniquely identifies the object within the network.
22
CHAPTER 3. THE EMERALD RUN-TIME SYSTEM
23
• A representation, which, except in the case of a primitive object, consists of references to other objects.
• A set of operations, which define the functions and procedures that the
object can execute. Some operations are exported and may be invoked
by other objects, while others may be private to the object. Functions
are operations that return one result and have no side-effects.
• An optional process, which is started after the object is initialized, and
executes in parallel with invocations of the object’s operations. An
object without a process is passive and executes only as a result of
invocations, while an object with a process has an active existence and
executes independently of other objects.
The only mechanism for communication in Emerald is through invocation.
An Emerald object may invoke some operation defined in another object,
passing arguments to the invocation and receiving results. Assuming that
target is an object reference, the phrase:
target.operationName[argument1, argument2 ]
means execute the operation named operationName on the object currently referenced by target, passing argument1 and argument2 as arguments.
Invocations are synchronous; the process performing the invocation is suspended until the operation is completed. ([Hut87b])
3.2
Processes and concurrency
Emerald supports concurrency both between objects and within an object.
Within the network many objects can execute concurrently. Within a single
object, several operation invocations can be in progress simultaneously, and
these can execute in parallel with the object’s internal process.
To control access to variables shared by different operations, the shared
variables and the operations manipulating them can be defined within a monitor. Processes synchronize through built-in condition objects. An object’s
process executes outside of the monitor, but can invoke monitored operations
should it need access to shared state. ([Hut87b])
A database system that supports transactions or persistence needs to control access to shared data. This can introduce a large performance overhead.
In Emerald shared variables can only be accessed from operations or
functions in the monitor in which they are declared (constants and variables
CHAPTER 3. THE EMERALD RUN-TIME SYSTEM
24
in functions do not have to be declared in monitors). Because only one
process can execute in a monitor, the database system only needs to know
when a process enters or leaves a monitor. For operations that access more
than one variable or access that same variable more than one time this will
result in a lower overhead than if the database system had to know about
every access to every shared variable.
Invocation of a function in a monitor constitutes a read of shared data.
In most cases invocation of an operation in a monitor will constitute a write
of shared data (not always, an operation can be to return zero or more than
one result even if there are no side-effects). If a database transaction system
considers invocation of operations in monitors to be writes it is safe with
regards to transaction properties. It can, however, result in false deadlocks.
Each object has an optional initially section that executes exactly once
when the object is created and is used to initialize the object state. When
the initially operation is complete, the object’s process is started and invocations can be accepted. ([Hut87b])
3.3
Object mobility and distribution
Object mobility in the Emerald system is characterized by three principles
([Jul88]):
• A uniform object model for both local and distributed computation
• Fine-grained mobility
• Language support for mobility to achieve an efficient implementation
In Emerald the existence of distribution and of mobility in Emerald does
not interfere with the performance of objects on a single node. ([Jul88])
3.4
The Emerald Kernel
For each object in the system there is an object descriptor on each node that
has references to the object. The node on with the object is located holds
the “object data area”, which represents the state of the object. There are
special representations for built-in, local, and immutable objects.
Each object descriptor has a flag that, if set, causes an object fault (a trap
to the run-time system) when a process on that node tries to access (invoke,
move) its object. This is used to support object mobility, but can also support persistence and concurrency control (see chapter 6). Local objects are
CHAPTER 3. THE EMERALD RUN-TIME SYSTEM
25
referenced using a pointer to their object descriptor; remote objects are referenced using logical object identifiers (OID’s). The runtime-system maintains
information about the location of objects and supports a location-protocol
so that mobile objects can be accessed.
An object can have a process which is started when the object is created.
Emerald processes are synchronized using monitors. An object can have
at most one monitor. A monitor contains operations and functions that
can access these variables. The Emerald compiler ensures that each operation/function in the monitor checks if the monitor is free (using a lock-bit in
the object) before it executes its body; if not, it traps to the runtime system
and the process executing the operation is placed in a monitor queue. This
facility can be extended to support transactions (chapter 6). The compiler
makes each operation/function in a monitor check if the active process is a
transaction.
Processes are implemented using stack-segments. A stack-segment is a
stack of activation records where each activation record is “bound” to the
object it is invoking.
When a process is created, it gets its own stack-segment. When a process makes a remote invocation (RPC) it gets a new stack-segment on the
remote node. The new stack-segment contains a reference to the previous
(the callers) stacksegment so that it can return to its caller stacksegment,
which might have moved to another node during the invocation.
Objects can be moved between nodes. When an object is moved, all
activation records that are invoking it must follow it. This is accomplished
by breaking stacksegments into smaller stacksegments and making them look
like remote invocations. When an object is moved, all stacksegments in its
monitor-queue (all stacksegments waiting to invoke the object) also follow it.
An example of a remote invocation is shown in figure 3.4. Process two
(P2) was started on node 1 with the initial stacksegment S3. It then invoked
an operation in object O21. From this operation P2 invoked an operation
in O22 and from this operation it invoked O23. This is a remote invocation
(object O23 is located on node two). Therefore P2 received a new stacksegment, S4 on node two. The new stacksegment keeps a reference to the old
stacksegment S3 (the thick dotted line) and S3 keeps a reference to the invoked object O23 (not shown on the figure). Both references are valid across
node boundaries because they are implemented using OID’s.
If object O22 moves to node three, we would get the situation shown in
figure 3.2.
A reference in an object can be declared as attached—e.g., “var attached foo: fooType ”. When the object is moved to another node, all
objects that can transitively be reached from it through attached references,
CHAPTER 3. THE EMERALD RUN-TIME SYSTEM
26
Top of
process
stack
AR
O23
Node 2
T1,P2,S4
AR
AR
O22
O21
T1,P2,S3
Node 1
AR: Activation Records
T1: Transaction (ID=1)
P2: Process (ID=2)
S3, S4 stacksegments (ID’s 3 and 4)
O21, O22, O23: object with IDs 21, 22, and 23
Figure 3.1: A Remote Invocation
are also moved to that node [HRB+ 87, JLHB88]. attached declarations are
unidirectional. If the variablefoov is declared attached in bar, the object foo
is assigned to foov and bar is moved to another node, then foo is also moved,
but if foo is moved then bar is not moved.
Because any process can move any object, the Emerald system does not
guarantee that the object is actually moved. But in the absence of failure, if
only one process moves an object, the object(s) will be moved as quickly as
possible.
CHAPTER 3. THE EMERALD RUN-TIME SYSTEM
27
Figure 3.2: Object Mobility
Top of
process
stack
O23
AR
Computer 2
P2,S4
AR
AR
O22
P2,S3
O21
AR
Computer 3
P2,S5
Computer 1
Chapter 4
The Emerald Language
This chapter is written by us. It is a copy of [RTL+ 91] and serves to give the
reader a background of the Emerald language.
Emerald: A General-Purpose Programming Language
Rajendra K. Raj, Ewan Tempero, Henry M. Levy
Department of Computer Science and Engineering, University of Washington,
Seattle, WA 98195
[rkr,ewan,levy]@cs.washington.edu
Andrew P. Black
Digital Equipment Corporation, Cambridge Research Laboratory
One Kendall Square, Bldg. 700, Cambridge, MA 02139
[email protected]
Norman C. Hutchinson
Department of Computer Science, University of Arizona, Tucson, AZ 85721
[email protected]
Eric Jul
DIKU, University of Copenhagen, Universitetsparken 1, DK-2100, Copenhagen,
Denmark
[email protected]
4.1
Introduction
Emerald[1, 2] is a strongly-typed programming language that supports an
atypical variant of the object-oriented paradigm. Although originally devel28
CHAPTER 4. THE EMERALD LANGUAGE
29
oped to simplify the construction of efficient distributed applications,[3, 4]
Emerald provides a general-purpose programming model. This paper describes the main features of Emerald, comparing and contrasting them with
similar features of other languages.
Although Emerald may be bracketed with module- or object-based languages, it differs in several ways from other such languages. Unlike Ada[5] or
Modula-2,[6] the sole unit of programming is the object, which behaves like
a dynamic instance of a Modula-2 module or Ada package. Unlike C++[7] or
Smalltalk,[GR83] Emerald has no notion of class; Emerald provides object
constructors for run-time creation of objects and abstract typing for object
classification and comparison. Access to Emerald objects is provided via
operations that are invoked by other objects.
Past experience has suggested that language designers must choose between the security and efficiency of static typing and the flexibility of dynamic typing. For example, Smalltalk and Self [9] emphasize flexibility by
delaying type checking until run-time, while Modula-2 and Ada are statically
typed and preclude the safe interaction of existing programs with entities of a
new type. Emerald provides both static typing and flexibility using a simple
and efficiently-implemented notion of type that allows compile-time checking,
polymorphism, encapsulated types, and subtyping. A type in Emerald is a
collection of operations and their signatures; significantly, a type is itself an
object. The Emerald language includes block structure and nesting as found
in Simula-67[10] and Beta[11] but noticeably missing from Smalltalk.
Emerald supports the programming of distributed applications by making the invocation of objects independent of their location. Such location
independent object invocation allows the application programmer to treat
the distributed system as a single machine. Thus, the Emerald programming
model is for the most part non-distributed. This paper focuses on Emerald’s
programming model; we will mention Emerald’s distributed upbringing only
when motivating the design of some of its features. Several aspects of the
distributed nature of Emerald are discussed in the companion paper in this
publication.[12]
This paper discusses, with examples, the main features of Emerald that
help in supporting general-purpose programming. In the next section we
describe the Emerald notion of objects, and how they are constructed and
used. Section 4.3 discusses the Emerald abstract type concept and its usage.
We then examine the Emerald approach to object-based concurrency, and
outline other interesting features of Emerald. Finally, we indicate the current
status of our work and summarize its major contributions.
CHAPTER 4. THE EMERALD LANGUAGE
4.2
30
Emerald Objects
The Emerald object is an abstraction for the notions of data and type. Emerald objects consist of private data, private operations, and public operations.
An object is entirely responsible for its own behavior and so must contain
everything necessary to support that behavior; we call this property autonomy. Object autonomy, originally motivated by the need for object mobility
in a distributed environment, also enforces the modular approach espoused
in software engineering.[13] An object may be accessed only by its public
operations; this is done in Emerald by the invocation. An Emerald object
may invoke some operation defined in another object, passing arguments to
the invocation and receiving results. Invocations are semantically similar to
procedure calls in traditional languages and to messages in Smalltalk.
4.2.1
Object Creation
In most object-based languages, an object is created by invoking an operation
on a class object; Smalltalk and Eiffel[14] are examples of this model. The
class object has multiple functions: it defines the structure, interface, and
behavior of all its instances; it defines the place of those instances in the graph
of all classes; and it responds to new invocations to create new instances.[15]
In prototype-based languages such as Self, objects are typically cloned from
existing objects that act as templates.
In contrast, object creation in Emerald is done via the object constructor, an Emerald expression that, when evaluated, creates a new object. This
expression defines the object’s representation, and its public and private operations. The syntax of an object constructor is:
object anObject
% private state declarations
% operation declarations
end anObject
where anObject is a local name whose scope is confined to the object constructor itself.
Figure 4.1 illustrates the creation of an object with an example taken
from the Emerald implementation of the Emerald compiler. As part of its
execution, the compiler creates a parse tree whose nodes represent the various constructs in an Emerald program. The compiler creates such nodes as
it discovers each construct, annotates them with attributes relevant to the
construct, and later generates code associated with their semantics. In this
CHAPTER 4. THE EMERALD LANGUAGE
31
const anIntegerNode ←
object IntegerLiteral
export getValue, setValue, generate
monitor
var value : Integer ← val
operation getValue → [v : Integer ]
v ← value
end getValue
operation setValue[v : Integer ]
value ← v
end setValue
operation generate[stream : OutStream]
stream.printf [“LDI %d\n”, value]
end generate
end monitor
end IntegerLiteral
Figure 4.1: Example use of an Object Constructor
example, the object represents an integer literal whose only attribute is the
value of the literal. The object constructor has the name IntegerLiteral , and
the object it creates has public operations getValue, setValue, and generate.
Its private state consists of a single integer variable named value that is initialized to val , which may be regarded as a literal constant for the present.
The monitor construct, to be discussed in the concurrency section, is used
to serialize concurrent invocations of the object.
When this object constructor expression is evaluated (i.e., executed), it
creates the described object, which is named by the identifier anIntegerNode.
By virtue of the const binding, anIntegerNode always refers to this object,
whose state, however, may be changed by invocations of its setValue operation. Objects may also be declared immutable, which asserts that their
abstract state does not change. The compiler may use such information for
optimization, but the language does not attempt to restrict the operations
that can be performed on the concrete state. This allows an object to reorganize its concrete state—for example, to make future calls more efficient—so
long as doing so does not change the result or effect of the future calls. The
export clause lists the operations that can be called from outside the object;
invocation of one of these operations is the only way in which a client can
examine or modify the object’s state. Thus, objects in Emerald are fully encapsulated, unlike those of Simula; this makes an Emerald object very similar
CHAPTER 4. THE EMERALD LANGUAGE
32
% Assume count and nodes have been suitably declared
count ← 5
loop
exit when count = 0
count ← count − 1
nodes(count) ←
object IntegerLiteral
% as in Figure 4.1
end IntegerLiteral
end loop
Figure 4.2: Creating several integer node objects
to a module in Modula-2 or a package in Ada.
Object constructors perform the following subset of the functions carried
out by Smalltalk classes:[15]
1. they generate new objects,
2. they describe the representation of objects, and
3. they define the code that implements operations (methods in Smalltalk
terminology).
The essential point to note is that the object constructor is more primitive
than the class; the functions of a class can be provided in Emerald by using
object constructors in combination with other language features.
The control abstractions available in ordinary languages permit expressions to be executed as desired, i.e., never, once, twice, or as many times
as necessary. By treating object constructors as expressions and providing
standard control structures, Emerald allows objects to be created as required.
Multiple similar objects may be created by placing the constructor in a repetitive construct such as a loop or the body of an operation of another object.
Figure 4.2 illustrates how a loop may be used to fill an array with integer
literals. Figure 4.3 presents an example of the usefulness of placing an object
constructor within an operation. In this example, the object named IntegerNodeCreator exports an operation new that returns the object created
by executing the object constructor IntegerLiteral. This object constructor
is the same as that in Figure 4.1, but the initialization of its value variable
is now determined by the argument val of new . Thus, every time new is
invoked on IntegerNodeCreator, a new node is created. In other words, the
IntegerNodeCreator object performs the instance creation function of a class.
CHAPTER 4. THE EMERALD LANGUAGE
33
const IntegerNodeCreator ←
immutable object INC
export new
const IntegerNodeType ←
type INType
function getValue → [Integer ]
operation setValue[Integer ]
operation generate[OutStream]
end INType
operation new [val : Integer ] → [aNode : IntegerNodeType]
aNode ←
object IntegerLiteral
% as in Figure 4.1
end IntegerLiteral
end new
end INC
Figure 4.3: An object that creates integer nodes
There is nothing magical about the use of the operation name new —
any other name would be equally valid. Note that IntegerNodeCreator has
been declared immutable, an assertion that the object’s local state, which in
this case consists only of the object named by IntegerNodeType, will never
change. Note also that IntegerNodeType actually names a type object; types
are discussed in the next section.
The approach used in Emerald to create class-like objects can be compared and contrasted with the approach used in a more traditional language
such as Ada. An Ada implementation of IntegerNodeCreator could be very
similar to the Emerald version, but there will be two notable differences.
First, because Ada is not object-oriented, the data to be manipulated must
be passed to each function; second, the Ada package that implements the
creator defines both the operations that create instances and the operations
that manipulate those instances.
4.2.2
One-of-a-kind Objects
The absence of a class mechanism from Emerald illustrates the flexibility of
the object constructor. When an object constructor is executed only once, a
“one-of-a-kind” object is created. To our knowledge, the only other languages
that support such a feature are prototype-based languages such as Self.[9]
CHAPTER 4. THE EMERALD LANGUAGE
34
const idServer ←
object IDS
export getNextId
monitor
var nextId : Integer ← 0
operation getNextId → [newId : Integer ]
newId ← nextId
nextId ← nextId + 1
end getNextId
end monitor
end IDS
Figure 4.4: A Unique Id Server
Class-based languages are suited to situations where many objects with
the same behavior are needed, but they are unnecessarily verbose when only
one instance of an object is needed. For example, consider the Boolean
objects true and false. In Smalltalk, where one-of-a-kind objects cannot
be directly defined, one needs to define a class for each object and then
instantiate it. Two classes, True and False, are defined as subclasses of
Boolean, with the objects true and false as their only instances. However,
this arrangement does not guarantee that only a single instance of class True
is created; this is rather awkward in Smalltalk. For example, this can be
done by defining a class for the required object, along with a tailored new
class method that allows only a single instantiation, and then instantiating
the required object. In languages such as Simula, where not all entities are
objects and there is no support for metaclasses, one-of-a-kind objects cannot
be created at all.
In practice, it is not just primitive types such as Boolean that use oneof-a-kind objects. Figure 4.4 depicts the creation of an object that provides unique identifiers (integers) used in naming the objects produced by
the Emerald compiler. Different invocations of the nextId operation return
different identifiers. Only one such server is required in the compilation environment. Indeed, having more than one server managing the same identifier
space would be disastrous. As another example, in a typical workstation
there is exactly one mouse object, exactly one keyboard object, and perhaps
only one console window object. The class abstraction is redundant for any
of these situations when exactly one object with a given behavior is needed.
The more primitive Emerald object constructor provides programmers with
the flexibility to create exactly as many objects as are required in each situ-
CHAPTER 4. THE EMERALD LANGUAGE
35
MetaClass
IdServer
MetaClass
IdServer
Class
idServer
idServer
Object
Instance
(a) Emerald object/creator structure
(b) Smalltalk instance/class/metaclass structure
Figure 4.5: “Metaclass hierarchies” in Emerald and Smalltalk
ation.
4.2.3
No Metaclasses
The fact that Emerald objects are modeled as containing their own operations removes the need for the multi-level instance/class/metaclass hierarchy
found in Smalltalk. Figure 4.5 illustrates the difference. In class-based languages, each object depends upon its class object to describe its structure and
behavior. In the Smalltalk hierarchy, each idServer object is described by its
class IdServer, which is described by its class (metaclass of class IdServer),
which in turn is described by its class (Metaclass).
Object constructors provide an ability to create not just class-like objects
(creators), but also creators of creators, creators of creators of creators, and
so on; multiple levels are indeed useful, as will be illustrated in the section
on Polymorphism. In contrast, the metaclass concept and the termination
of metaregress in Smalltalk have been found to be one of the biggest hurdles to be overcome in learning Smalltalk.[16] Like Self, Emerald avoids this
metaregress by making each object completely self-describing.
CHAPTER 4. THE EMERALD LANGUAGE
4.3
36
Abstract Types
The notion of type serves several purposes in conventional programming
languages.[17, 18] Types facilitate the representation independence required
for portability and maintainability of programs. Traditionally, types describe implementations, and the creation of data instances is controlled by
their type. When types describe implementation, having type information
available at compile time also permits the generation of better code. Finally,
types enable the early detection of errors, and permit more meaningful error
reporting.
In Emerald, these functions are performed by specialized constructs.
The object provides representation independence. The object constructor
provides for object creation and encapsulates implementation information.
What in other languages are type-dependent optimizations are performed by
the Emerald compiler based on how objects are used. Since it is freed from
these other concerns, the Emerald type system can concentrate on the single
task of object classification, thus preventing incorrect usage.
Emerald’s type system reflects its design goal of being used for the development of software in constantly running distributed systems. In such
systems, objects may be developed and implemented separately and differently on different machines at different times. Furthermore, to accommodate
situations where the types of the objects to be bound to an identifier are
not known at compile-time, the Emerald type system does not distinguish
between objects based on their implementation.
Emerald allows normal type checking to be performed entirely at compiletime. The only occasion on which types must be checked at run-time is when
the programmer explicitly requests that an object should be treated as if it
had a type that is stronger than the type that could be attributed to it
at compile time. For example, a programmer could state that an object
retrieved from a list of arbitrary objects is an integer; this claim can only be
verified at run-time.
4.3.1
Type Definition
An Emerald type is a collection of operation signatures, where a signature
consists of the operation name and the types of the operation’s arguments
and results. Note that a type contains no information about implementation; it describes only an interface. For this reason, the term abstract type
is often used to refer to an Emerald type. Abstract types allow for the
complete separation of typing and implementation, and mean that multiple
implementations of the same type can co-exist and interoperate.
CHAPTER 4. THE EMERALD LANGUAGE
37
const IntegerNodeType ←
type INType
function getValue → [Integer ]
operation setValue[Integer ]
operation generate[OutStream]
end INTType
Figure 4.6: An example of a type object
Each identifier in an Emerald program—this includes the names of constants, variables, arguments and results—has a declared type, which must be
evaluable at compile time; this is called the syntactic type of the name. Type
checking is the process of ensuring that the object to which a name is bound
always satisfies the syntactic type of the name. For example, the argument
declaration for val in the new operation in Figure 4.3, and the variable declaration for value in the monitor in Figure 4.1, cause both of these names to
be given the syntactic type Integer. An attempt to assign a string to value,
or to pass a file as an argument to new , is detected as a type error at compile
time.
Each Emerald type is an object and as such can be manipulated by the
ordinary facilities of the Emerald language, such as assignment, constant
binding, and parameter passing. Type objects may be passed as parameters
to implement polymorphism or inspected at run-time to implement run-time
type checking.
An example of an Emerald type is IntegerNodeType, which has been extracted from Figure 4.3 and shown in Figure 4.6. This code fragment binds
the name IntegerNodeType to the result of evaluating a type constructor,
which is delimited by type INType . . . end INType. The type constructor generates an Emerald object whose type is AbstractType, that is, a type
object. Objects of this type must export the three operations getValue,
setValue, and generate with the indicated argument and result types. IntegerNodeType, being an Emerald object, may be passed as a parameter,
assigned to a variable, or invoked at run-time.
4.3.2
Type Conformity
The idea of an object “satisfying” the syntactic type of the name to which it
is bound, as discussed above, is made precise in Emerald by the conformity
relation. If an object O is bound to a name I , then the type of O must
conform to the syntactic type of I .
CHAPTER 4. THE EMERALD LANGUAGE
38
The motivation behind Emerald’s definition of conformity is the notion of
substitutability. Informally, a type S conforms to a type T (written S ◦> T )
if an object of type S can always be substituted for one of type T , that is,
the object of type S can always be used where one of type T is expected.
For S to be substitutable for T in this way requires that:
1. S provides at least the operations of T (S may have more operations).
2. For each operation in T , the corresponding operation in S has the same
number of arguments and results.
3. The types of the results of S ’s operations conform to the types of the
results of T ’s operations.
4. The types of the arguments of T ’s operations conform to the types of
the arguments of S ’s operations. (Notice the reversal in the order of
conformity for arguments.)
Properties 1 and 2 are fairly straightforward and do not need further discussion. Why conformity needs property 3 is illustrated by these types:
type JunkDeliverer
operation Deliver → [Any]
end JunkDeliverer
type PizzaDeliverer
operation Deliver → [Pizza]
end PizzaDeliverer
The keyword Any denotes the type with no operations; any object may be
assigned to an identifier that has type Any. These types describe delivery
services that may deliver anything and those that deliver only pizzas. Whatever one expects to do with what is delivered by a JunkDeliverer can also
be done with the pizza delivered by a PizzaDeliverer . That is, a PizzaDeliverer can be substituted for a JunkDeliverer , but not vice versa. Formally
stated, PizzaDeliverer conforms to JunkDeliverer , but JunkDeliverer does
not conform to PizzaDeliverer .
As motivation for property 4, consider the following types:
type JunkHeap
operation Deposit[Any]
end JunkHeap
type Bank
operation Deposit[Money]
end Bank
CHAPTER 4. THE EMERALD LANGUAGE
39
These types define entities where arbitrary objects and money can be deposited (without retrieval!) respectively. Intuitively, one knows that where
one can deposit anything, one should be able to deposit money, and conversely, where one deposits only money, one cannot possibly deposit anything
else. That is, JunkHeap can be substituted for Bank but not vice versa.
More formally, JunkHeap conforms to Bank , but Bank does not conform to
JunkHeap.
The above properties 1–4 characterize a number of relations on types.
Conformity is defined as the largest relation that satisfies these rules.[BHJ+ 87]
The object type matching rules in Modula-3,[19] for example, define a smaller
relation. Modula-3 requires that the argument and result types of operations
be identical, not merely that they conform appropriately.
Property 4 is known as contravariance. In mathematics, a function from
A to B is also considered to be a function from A0 to B 0 if A0 is a subset of
A and B is a subset of B 0 ; this is the contravariance in the argument of the
function space operator.[20] In programming, contravariance means that an
operation on objects of a given type should also work on objects of subtypes of
that type. Applied to the deposit operation, contravariance results in Bank
not conforming to JunkHeap. Although such a consequence is sometimes
considered to be a limitation of conformity,[21, 22] property 4 is included in
Emerald not just to make the type system safe and consistent—although this
would be reason enough—but also because contravariance helps to enforce
correct program design. In other words, contravariance is a feature rather
than a limitation.
The utility of contravariance may be illustrated by additional objects
from the Emerald compiler. Nodes corresponding to integer literals were discussed earlier; additional nodes of other types are used to represent declarations, control statements, and expressions. The type TreeNode characterizes
those properties that are common to all the elements of the language; consequently the operations in type TreeNode must be exported by all nodes.
Both Expression nodes and Statement nodes export more operations than
do TreeNodes, but each has operations that are different from those of the
others; these types and the relationships between them are depicted in Figure 4.7. Type Statement does not conform to type Expression because Statement does not implement the getType operation, and Expression does not
conform to Statement because Expression does not implement the typeCheck
operation. However, they both conform to TreeNode.
Now consider the objects codeImprover and expressionImprover :
CHAPTER 4. THE EMERALD LANGUAGE
const TreeNode ←
type TN
operation generate[OutStream]
end TN
Legend
Name
operations
const Expression ←
type E
function getType → [Expression]
operation generate[OutStream]
end E
const Statement ←
type S
operation typeCheck
operation generate[OutStream]
end S
(a) Type definitions
40
Type
Conforms to
TreeNode
generate
Expression
Statement
generate
getType
generate
typeCheck
(b) Relationship between types
Figure 4.7: Some compiler types
const codeImprover ←
object CI
operation optimize[root : TreeNode] → [result : TreeNode]
...
end CI
const expressionImprover ←
object EI
operation optimize[root : Expression] → [result : Expression]
...
operation factor [root : Expression] → [result : Expression]
...
end EI
Should the type of expressionImprover conform to that of codeImprover ?
The former clearly has more operations, and this could be reason enough
to declare that it conforms. However, expressionImprover is clearly not
substitutable for codeImprover in every context. An invocation of codeImprover.optimize may legally be presented with a TreeNode as an argument,
but an attempt to present the same argument to expressionImprover.optimize
is a type error. This is why Emerald’s conformity rules clearly state that expressionImprover does not conform to codeImprover .
Before leaving this example, let us look a little more closely at what
would happen if Emerald were more lenient in its type checking. Then,
CHAPTER 4. THE EMERALD LANGUAGE
41
it would be possible for expressionImprover.optimize to be invoked with a
TreeNode object as its argument, which would then be bound to the name
root within the optimize operation, even though the syntactic type of root
is Expression. Now there are two situations to consider within the body of
expressionImprover.optimize (denoted by the ellipsis above). If the operation
body invokes only generate operations on root and never getType operations,
the body would execute successfully. If the operation body does invoke the
getType operation on root, the invocation would fail because getType is not
implemented on treeNodes.
It is of course exactly this sort of run-time failure that Emerald’s type
system prevents. At first glance, it may seem that the cost of this security
is that a legal and useful program—the first case in which getType is never
invoked—cannot be written in Emerald. But a little thought shows that this
is not so. If the body of expressionImprover.optimize never invokes getType
on its argument, then the argument should not be declared to be of type
Expression: it need only be of type TreeNode.
const expressionImprover ←
object EI
operation optimize[root : TreeNode] → [result : TreeNode]
...
operation factor [root : Expression] → [result : Expression]
...
end EI
With this change to the definition of expressionImprover , the type of
expressionImprover conforms to that of codeImprover . The type system now
permits expressionImprover to be substituted for codeImprover ; the “legal
and useful” program mentioned above can be written in Emerald.
The lesson here is that the Emerald programmer should strive to characterize argument and result types as accurately as possible, listing exactly
what is expected of arguments, and describing results as fully as possible.
This is similar to the discipline that must be imposed when writing a program using formal specifications: the declared precondition of an procedure
should characterize the properties of the arguments that are required for the
procedure to work. The precondition should not characterize everything that
happens to be true of the argument in all of the uses of the operation that
have been conceived of to date, for doing so severely limits the reusability of
the procedure.
This discipline is dubbed as the use of best-fitting types. Best-fitting
types provide good documentation and encourage good design, as well as
permitting the maximum reusability of code. There is a cost, however: it is
CHAPTER 4. THE EMERALD LANGUAGE
42
const Stack ←
object S
export insert, remove
operation insert[Element] → [ ]
% Suitable body
operation remove → [Element]
% Suitable body
end S
const Queue ←
object Q
export insert, remove
operation insert[Element] → [ ]
% Suitable body
operation remove → [Element]
% Suitable body
end Q
const Stack2 ←
object S2
export insert, remove
operation insert[Element] → [ ]
% Suitable body
operation remove → [Element]
% Suitable body
end S2
Figure 4.8: Three objects with identical types
harder to reimplement the same operation using a different algorithm. This
is because the new algorithm may want to use some operation of an argument
that is not possessed by the syntactic type of the argument, even though it
is available on the actual argument in all real invocations.
Using best-fitting types works only because argument and result conformity can extend to an arbitrary depth. This makes the Emerald’s programming style different from that used in languages without conformity, where
perfect matching of argument and result types is required. Here, the programmer has to decide a priori on “envelope types” that are weak enough
to work correctly in all applications, current and future.
Implicit versus explicit conformity. Implicit conformity can sometimes
lead to “mistaken” type-matches. In Figure 4.8, since insert and remove have
CHAPTER 4. THE EMERALD LANGUAGE
43
identical signatures in all three objects, Emerald would regard Stack, Queue,
and Stack2 as having the same type. This allows the unintended use of the
Queue object as a Stack. On the other hand, because Emerald is an open
system where objects can be compiled and added at run-time, Stack can be
replaced by Stack2 (which might be a newer version of Stack). In order to
allow the substitution of Stack2 for Stack, but prevent the substitution of
Queue for Stack, the type system would either require the programmer to
explicitly declare the conformity relation between types, or need information
about the semantics of the operations.
The subtyping notion of Trellis[SCB+ 86] is similar to that of Emerald
conformity, but any subtype relationship between types has to be explicitly
declared. The use of such explicit declaration prevents meaningless substitutions of objects. In the above example, the Trellis programmer would declare
Stack2 as a subtype of Stack but not establish any subtype relationship between Queue and either Stack or Stack2 . The price paid in Trellis is that a
type Stack3 defined elsewhere in the type hierarchy cannot be a subtype of
Stack unless that relationship is explicitly stated. Moreover, Stack cannot
also be a subtype of Stack2 or Stack3 . Eiffel[14] also requires an explicit
declaration of type conformance (although the use of covariance in Eiffel’s
notion of conformance has been found to make the type system unsafe[24]).
In contrast, the implicit nature of Emerald conformity allows the types of
these three stack objects conform as desired.
Languages such as Algol 68 that use structural equivalence and conventional types systems have similar problems. For example, stacks and queues,
implemented using the same singly-linked list and the field names, have the
same structure, and hence the same type. Modula-3,[19] which uses structure
equivalence, recognizes this problem and provides the ability to brand types
in a unique way. A branded type cannot match another type, no matter what
its structure is, thus providing a name equivalence option in a system based
primarily on structure equivalence. In Emerald, “mistaken” type-matches
may be reduced by including operations that are specific to the type defined. For example, a Queue object (as in Figure 4.8) supplemented by a
queueLength operation cannot be mistaken for a stack.
Conformity versus inheritance. It is useful to contrast the Emerald
notion of type conformity with that of inheritance provided in other objectoriented languages. In short, conformity is a mechanism for object substitution, and inheritance is a mechanism for code (and representation) sharing
among objects. Although conformity and inheritance are fairly distinct concepts, many object-oriented languages such as C++ have not distinguished
CHAPTER 4. THE EMERALD LANGUAGE
44
clearly between the two concepts, and the use of the same language structure for providing both conformity and inheritance has caused considerable
confusion.
Emerald supports type conformity (and hence object substitution), but it
does not support inheritance: code cannot be shared among object implementations. The absence of code sharing is due to Emerald’s use in distributed
systems, where object mobility is facilitated by self-contained objects and
hampered by dependencies on other objects. Since code sharing promotes
software reuse, the lack of inheritance is sometimes considered to be a deficiency of Emerald.
However, several aspects of inheritance actually impede software reuse.
Inheritance compromises object encapsulation to varying degrees.[25] The
use of a single explicit structure for both object substitution and code sharing among objects is often restrictive and sometimes unsafe. For example,
Trellis restricts code sharing to its conformity hierarchy, and Eiffel forces
inheritance and conformance to be the same with disastrous consequences
on type safety.[24] Inheritance has other conceptual problems. It violates
the principle of locality,[26] thus preventing the use of local reasoning in understanding and implementing an object. Adequate control is not provided
over the granularity of code sharing; the basic inheritance paradigm requires
inheriting objects to share all their ancestors’ implementations. The need for
explicit removal via overriding to prevent undesired parts from being shared
shows that there is no easy way of inheriting only the needed parts of the
ancestors’ implementations.
To provide inheritance-like software reuse, we have prototyped an Emerald extension called Jade[27] in which object substitution is provided via conformity, and code sharing among objects via composition. The basic building
blocks in Jade are self-describing code components, such as operations and
objects, which may be combined to form larger objects. Since different objects may use the same components, this leads to conceptual sharing of code
among objects but without several shortcomings of inheritance. The noteworthy point about Jade is the complete orthogonality of the code sharing
(composition) and object substitution (conformity) mechanisms. A detailed
discussion of the Jade language and environment is beyond the scope of this
paper, and may be found elsewhere.[27, 28]
4.3.3
Relationship between Types and Objects
Each Emerald object may belong to several types because an object O belongs
to a type T when
CHAPTER 4. THE EMERALD LANGUAGE
45
typeof O ◦> T
The application of typeof to an object returns its maximal type, that is,
the largest Emerald type that the object can belong to. Because this permits objects to be classified by their types, it now becomes possible to relate
implementations and types via conformity. The anIntegerNode of Figure 4.1
names an object whose type conforms to the type definition given in Figure 4.6; we can thus write
typeof anIntegerNode ◦> IntegerNodeType
A type is any object whose type conforms to:
immutable type AbstractType
function getSignature → [Signature]
end AbstractType
In other words, any object that exports a getSignature operation that returns
a Signature is a type. Objects of type Signature are created by the type constructor syntax (type x ... end x). A Signature is a built-in implementation
of an AbstractType that can be generated only by the compiler. A signature
object exports a getSignature operation (returning itself) so it conforms to
AbstractType. It also exports several secret operations that enable the Emerald implementation to determine the operations provided by the type, and
the signatures of these operations. Because the names of these operations
are secret, no programmer defined objects will ever have types that conform
to Signature; all signature objects must stem from a type constructor expression in some Emerald program. Consequently, we can guarantee that
the type checker is able to get adequate and consistent information about a
type.
The type value denoted by a type object is the result of its getSignature function. Consider the IntegerNode in Figure 4.9. This object was constructed by the addition of a getSignature function to the IntegerNodeCreator
of Figure 4.3. The code for IntegerNode reveals that typeof IntegerNode is:
type IntegerNodeCreatorType
function getSignature → [Signature]
operation new [Integer ] → [IntegerNodeType]
end IntegerNodeCreatorType
When IntegerNode is treated as an object, the operations that can be invoked
on it are described by IntegerNodeCreatorType. However, when IntegerNode
is treated as a type, its value is IntegerNodeType, the result of the getSignature function.
CHAPTER 4. THE EMERALD LANGUAGE
46
const IntegerNode ←
immutable object INC
export getSignature, new
const IntegerNodeType ←
type INType
function getValue → [Integer ]
operation setValue[Integer ]
operation generate[OutStream]
end INType
function getSignature → [sig : Signature]
sig ← IntegerNodeType
end getSignature
operation new [val : Integer ] → [aNode : IntegerNodeType]
aNode ←
object IntegerLiteral
% as in Figure 4.1
end IntegerLiteral
end new
end INC
Figure 4.9: An integer node creator with getSignature
4.3.4
Separation of Type and Implementation
The three features discussed above—object constructors for object creation,
abstract types to characterize interfaces, and conformity for type comparison—
allow Emerald to elegantly separate implementations from types. Emerald
allows several different implementations to be used for the same type within
a single program. Much of the flexibility found in untyped languages such
as Smalltalk is thus available within the framework of the strongly-typed
Emerald language. Not only does Emerald’s separation of types and implementations simplify separate compilation of programs, but it also enables
newly defined objects to be added to a running system, and take the place
of previously defined objects, so long as they have conforming types and
semantics.
For example, the reliability of the idServer defined in Figure 4.4 may
be enhanced by implementing a new version, shown in Figure 4.10, that
stores its state in a file in order to survive crashes. The type of this new
server conforms to the type of the original and so may be used anywhere
the original was expected. Note that this substitution can be made into
CHAPTER 4. THE EMERALD LANGUAGE
47
const anotherIdServer ←
object IDS
export getNextId
monitor
operation getNextId → [newId : Integer ]
var inF : InStream
var outF : OutStream
inF ← InStream.FromUnix [“/usr/emerald/ID”]
newId ← inF.getInteger
inF.close
outF ← OutStream.ToUnix [“/usr/emerald/ID”]
outF.putInteger [newId + 1 ]
outF.close
end getNextId
end monitor
end IDS
Figure 4.10: An alternative implementation of an Id Server
an application that is already executing. Such flexibility is not normally
available in strongly-typed languages.
It is worth noting that the clean separation of type and implementation found in Emerald is not possible in languages whose objects are not so
strongly encapsulated. In CLU or C++, for example, the implementation of
an operation in a class is allowed to access the internals of all the objects that
are the same type as the receiver: these languages assume that ‘same type’
implies ‘same class’. In Emerald, as in Smalltalk, the implementation of each
operation on an object has access to the internals of that object alone.
In Modula-2, separation between abstract typing and implementation is
not provided because DEFINITION and IMPLEMENTATION MODULEs are actually treated as two parts of one module.[6] The coupling of the definition
and implementation modules is done during linking, which precludes multiple implementations at run-time. In Ada, each specification corresponds
to an implementation in the library. When a different implementation for a
specification is needed, the program needs to be recompiled, substituting the
new one for the old one, and then relinking the new implementation. Thus,
Ada does not allow multiple implementations of the same abstract type to
coexist.
Object-oriented languages such as C++ and Smalltalk support a programming style that allows multiple implementations of the same abstraction.
This is done via the use of abstract classes, which describe only object inter-
CHAPTER 4. THE EMERALD LANGUAGE
48
faces, but may be extended into concrete subclasses for instantiating multiple
implementations. There are some obvious similarities between such abstract
classes and Emerald’s abstract types, but there are important differences.
First, the former represents a conventional use of the class system, while the
latter is a language feature. Second, abstract classes permit the definition
of default behavior. Although such definitions are occasionally useful, they
violate the complete separation of a type from its implementations, as well
as the separation of multiple implementations. Finally, the use of abstract
classes requires each concrete implementation to be explicitly declared a subclass of the abstract class. In other words, the abstract class must exist before
multiple implementations are created. In Emerald, there is no such restriction since the relationship between implementations and abstract types is
deduced by the type system.
4.3.5
Support for Polymorphism
At least three broad varieties of polymorphism have been identified in the
literature.[17, 21, 29, DD85]
Inclusion polymorphism refers to the situation where an object may belong to many different types that need not be disjoint, that is, a type may
include one or more of the other types. Subtyping is a good example of
this kind of polymorphism. Objects belonging to a type are manipulable as
belonging not only to that type, but also to its supertypes. In implementation terms, object representations are chosen so that operations can work
uniformly on instances of subtypes and supertypes.
Parametric polymorphism allows a function to have an either implicit or
explicit type parameter that determines the type of argument required for
each of its applications. This is perhaps the purest form of polymorphism,
permitting the same operation to be applied to arguments of different types.
Implicit type parameters are used in ML[29], and explicit type parameters
appear in Russell.[DD85]
Ad-hoc polymorphism refers to situations where a procedure works, or
appears to work, on several types. It is usually not considered to be true
polymorphism because these types are not required to exhibit any common
structure; neither are the results of the procedure required to be similar. Adhoc polymorphism covers the notions of overloading, where the same name
denotes different functions in different contexts (e.g., the operator “+” in
most programming languages), and coercion, where values are (automatically) converted to the type expected by a function (e.g., in 2 + 3.1, the
integer 2 is coerced to the real 2.0); the difference between these two notions
often disappears in untyped languages.
CHAPTER 4. THE EMERALD LANGUAGE
49
Relating traditional discussions of polymorphism to object-oriented languages has to be done carefully. Inclusion polymorphism fits naturally into
object-oriented programming. However, consider the following standard example of a polymorphic function:
function length[ t : someType ]
This kind of polymorphism is unnatural in object-oriented languages because
functions are not first-class entities in such languages. That is, rather than
applying functions to objects, operations are invoked on objects. To obtain
the length of an object, one invokes the length operation of that object since
the object obviously knows its own type and does not need the type as an
argument. Instead of applying a length function to a list to find its length,
the list itself is asked what its length is. In this setting there is no need for
length to be polymorphic. If one implementation of length is shared by many
classes, it is polymorphic. If length is separately defined for every object, it
is not polymorphic. One cannot tell from the interface which case applies.
Although Emerald provides an efficient, practical form of polymorphism,
its emphasis on type conformity and object autonomy makes the above classification scheme inappropriate. Based on the role played by the different
polymorphic entities, an Emerald-oriented classification of polymorphism is
proposed below.
1. polymorphic operations, which work “correctly” regardless of the types
of their arguments,
2. operations that return types, and
3. polymorphic objects (values), which can be used in situations requiring
different types.
Notice that 1 and 3 result from Emerald’s notion of conformity which is a
form of inclusion polymorphism.
Polymorphic Operations. All operations are naturally polymorphic because objects (of different types) may be used as arguments provided they
conform to the types declared for the formal parameters of the operation.
Figure 4.11 illustrates Emerald’s support for polymorphic operations using the function PrintIt. This function takes a parameter of type Printable,
and displays its string form on the standard output. Any object that understands the asString operation conforms to the type Printable, thus permitting
any such object to be passed to PrintIt. Notice the use of SPT to represent
the target of the invocation of PrintIt.
CHAPTER 4. THE EMERALD LANGUAGE
50
const SimplePolyTester ←
object SPT
const Printable ←
type aPrintableType
function asString → [String]
end aPrintableType
operation PrintIt[p : Printable]
stdout.PutString[p.asString || “\n”]
end PrintIt
process
SPT.PrintIt[0 ]
SPT.PrintIt[1.1 ]
SPT.PrintIt[true]
SPT.PrintIt[“This is rather trivial.”]
end process
end SPT
Figure 4.11: A polymorphic operation
Operations that return types. Most programming languages support
the array type constructor, which is usually a built-in operator that takes
a type as an argument and returns a new type as a result, for example,
Array[Integer] and Array[Boolean].
While languages such as Pascal support this kind of polymorphism for
arrays and similar built-in types, such support is not usual for user-defined
types. For example, the programmer cannot describe a List type without also
specifying the type of the elements. Some languages such as Modula-2+[31]
allow elements of any type to be used, but ensure the homogeneity of the
List through a required user-specified run-time check. Other languages such
as ML[29] allow the specification of polymorphic operations through implicit
type variables that are instantiated when type-checking is performed.
Emerald supports user-defined operations that return types subject only
to the constraint that the types of variables, constants, and formal parameters
be evaluable at compile-time. Since types are objects, types may be passed
as arguments to functions that create types. As an example, consider the
polymorphic list creator in Figure 4.12. List is an object that exports a single
function, of. This function takes a type eType as an argument, and returns a
List creator for lists whose elements are of type eType. This resulting classlike object can be used to create homogeneous lists. For example, a list of
integers can be declared and created as follows:
CHAPTER 4. THE EMERALD LANGUAGE
51
const List ←
immutable object ListCreator
export of
function of [eType : AbstractType] → [result : ListCreator ]
where
ListCreator ←
immutable type LC
function getSignature → [Signature]
operation new → [ListType]
end LC
ListType ←
type LT
operation AddAsNth[eType, Integer ]
operation DeleteNth[Integer ]
function GetNth[Integer ] → [eType]
function Length → [Integer ]
end LT
end where
result ←
immutable object NewListCreator
export getSignature, new
function getSignature → [r : Signature]
r ← ListType
end getSignature
operation new → [result : ListType]
result ←
object theList
export AddAsNth, DeleteNth, GetNth, Length
% Implementations of the various operations
end theList
end new
end NewListCreator
end of
end ListCreator
Figure 4.12: A List type generator
var iList : List.of [Integer ] ← List.of [Integer ].new
The above example reveals the expressiveness of Emerald invocations. List
CHAPTER 4. THE EMERALD LANGUAGE
52
is an object that exports the operation of , whose single argument must be
an abstract type. The invocation
List.of [Integer ]
returns an object that is used in two ways. First, when used to specify the
abstract type of iList, the denoted type is obtained by invoking getSignature
on List.of [Integer]. Second, when used to initialize iList, the new invocation yields an object that is a list of integers. What is unusual about the of
invocation is that it is a compile-time invocation, that is, the Emerald compiler itself performs this invocation and creates the resulting object. This is
possible because List is immutable, of is a function, and the argument to
the invocation, Integer, is an immutable compile-time constant. As a result,
iList is just as efficient as a list of integers that is directly defined; in fact,
exactly the same code is generated in either case.
It is interesting to examine the various Emerald features that went into
the construction of List in Figure 4.12. The function of uses the fact that
types are objects to specify a type parameter. The separation of type and
implementation allows restriction of the behavior of the elements in the list
without constraining their implementation. Finally Emerald’s object constructor makes this entire description possible in a self-contained way. Note
that there are three levels of constructor, which is not possible in any other
object-oriented language. Conformity allows List.of [Integer] to act both as
a type and as a creator.
The result type of the operation List.of depends on the type of its argument. Result types of Emerald operations may also depend on the value
of their arguments. For example, the Emerald compiler uses TreeNodes to
represent different constructs of the language. Since constructs are identified
by different keywords, the result type of the compiler operation that creates
TreeNodes depends on the value of the input keyword.
Polymorphic Objects. The notion of value in traditional languages is
subsumed by the notion of object in object-oriented languages. Traditional
languages with typed pointers such as Pascal usually support the notion of
nil, which may be assigned to any pointer variable regardless of its type.
This is an example of a polymorphic value.
Emerald supports polymorphic objects. An object may be assigned to
any identifier provided that the object’s type conform to that declared for
the identifier. A direct analogue of Pascal’s nil pointer value is the object
nil in Emerald. None, the type of nil, conforms to every Emerald type.
Any identifier in Emerald, regardless of its declared type may be assigned
CHAPTER 4. THE EMERALD LANGUAGE
53
const PolyValue ←
object pv
const Printable ←
type Printable
function asString → [String]
end Printable
process
var aStringObj : String
var aPrintableObj : Printable
var anyObj : Any
aStringObj ← “Emeralds are green”
aPrintableObj ← “Emeralds are green”
anyObj ← “Emeralds are green”
end process
end pv
Figure 4.13: Some polymorphic objects
the object nil. Figure 4.13 illustrates a more restricted example. Here, the
identifier aPrintableObj may name any object that understands the asString
function, and the same string object “Emeralds are green” may be assigned
to aStringObj, aPrintableObj, and anyObj because its type conforms to the
types of all three identifiers.
This kind of polymorphism is naturally available in most class-based languages because each object belongs to its class as well as the class’s superclasses. However, in these languages, all the classes that an object can belong
to are known before the instantiation of the object. In Emerald, this determination is made dynamically. As a consequence, Emerald objects are also
polymorphic not merely with respect to existing types, but also to hitherto
unthought of types to which they conform.
4.4
Concurrency
Emerald’s support for concurrency takes the form of active objects. An active
object contains a process that is started after the object is initialized, and
executes in parallel with invocations of that object’s operations and other
active objects. This process continues to execute its specified instructions
until it terminates. An object with a simple process is shown in Figure 4.14.
The process in this object performs the repetitive actions of a philosopher in
CHAPTER 4. THE EMERALD LANGUAGE
54
const aPhilosopher ←
object P
% Assume suitable objects Table and Node are defined
% Assume integer objects eatingTime and thinkingTime have been defined
process
loop
Table.pickupForks[p]
Node.sleep[eatingTime]
Table.putdownForks[p]
Node.sleep[thinkingTime]
end loop
end process
end P
Figure 4.14: A dining philosopher
const anOrganizer ←
object Organizer
process
var count : Integer ← 5
const philoArray ← Array.of [Philosopher ].create[5 ]
loop
philoArray[count] ←
object P
% as in Figure 4.14
end P
count ← count − 1
exit when count = 0
end loop
end process
end Organizer
Figure 4.15: An organizer object
the well-known dining philosophers’ problem.[32]
An Emerald process represents an independent thread of control. New
threads may be created dynamically simply by creating new objects. For
example, in Figure 4.15, five instances of the philosopher object are created.
Since the object assigned to the elements of philoArray export no operations,
the object constructor is used here to achieve an effect similar to the use of
CHAPTER 4. THE EMERALD LANGUAGE
55
the fork procedure.[33]
In Emerald, operation invocation is synchronous. Thus, a thread of control can be thought of as passing through other objects when it invokes
operations on those objects, that is, the invocation of an operation provides
the thread of execution for that operation. This means there can be multiple
simultaneous invocations of an operation in the same object. Each invocation can proceed independently, providing fine-grained parallelism. In other
distributed object-oriented languages, such as EPL,[34] each object has a
message queue for incoming requests and a set of threads of control to respond to those requests.
Emerald uses monitors to regulate access to an object’s local state that
is shared by the object’s operations and process, and synchronization is
achieved using system-defined condition objects.[35] The Emerald monitor
is similar to the monitor in Concurrent Pascal[36] or Concurrent Euclid,[37]
but is completely enclosed within an object. An object’s process commences
execution outside the monitor, but can enter the monitor by invoking monitored operations when it needs access to shared state.
The choice of the monitor mechanism in Emerald for synchronization
reflects designer prejudice and familiarity rather than its clear superiority
over other mechanisms. However, monitors were found to provide a natural
protection mechanism within the object-oriented paradigm, and they are
also efficient for synchronization within an object especially when there is
no contention—in this case, monitor entry and exit are performed by a few
in-line machine instructions.
Figure 4.16 illustrates the various aspects of Emerald’s concurrent programming model with a naive implementation of a clock. The clock object
uses a simple internal representation (theTime) that is protected by a monitor and can be manipulated by several monitored operations. The internal
representation is constantly updated by the clock object’s process, which synchronizes with a timing pulse provided by the system object. The expensive
operations for converting between the internal and external representations
are defined outside the monitor, thus allowing multiple simultaneous invocations of them to proceed concurrently. The System object demonstrates
a simple use of Emerald condition objects to provide buffering of timing
pulses.
4.5
Other Features
In this section we discuss those remaining features of Emerald that contribute
to its programming model.
CHAPTER 4. THE EMERALD LANGUAGE
56
const Clock ←
object C
export getTimeOfDay, setTimeOfDay
monitor
var theTime : Integer ← 0
operation IncTime
theTime ← theTime + 1
end IncTime
function getTime → [r : Integer ]
r ← theTime
end getTime
operation setTime[r : Integer ]
theTime ← r
end setTime
end monitor
operation setTimeOfDay[newTime : String]
var t : Integer
% store newTime in some internal form
setTime[t]
end setTimeOfDay
operation getTimeOfDay → [currentTime : String]
var t : Integer ← getTime
% return the String form of t
end getTimeOfDay
process
loop
System.Tock
IncTime
end loop
end process
end C
const System ←
object S
monitor
const timing ←
Condition.Create
% Tick is invoked
% by a hardware clock
operation Tick
signal timing
end Tick
operation Tock
wait timing
end Tock
end monitor
end S
(a) A clock object
(b) A system object
Figure 4.16: Concurrent programming in Emerald
4.5.1
Block Structure and Nesting
The use of object constructors makes block structure and nested object definitions natural. We have already seen several examples of such nested object
constructor definitions, for example, in Figure 4.12, the object constructor
CHAPTER 4. THE EMERALD LANGUAGE
57
theList nested inside NewListCreator which in turn is nested inside aListCreator.
Emerald also supports the notion of blocks. Emerald blocks, similar to
those in Algol-60, define a new scope for identifiers. The scoping of Emerald
identifiers is traditional, with the one exception that an identifier is visible
throughout the scope in which it is declared, that is, both before and after
the declaration. Examples of constructs that open new scopes for identifiers
are operation definitions, object and type constructors, monitors, or loop
statement bodies. Non-local identifiers are implicitly imported into nested
scopes unless they are re-defined.
Since object constructors and type constructors create new objects that
persist beyond the end of the enclosing block, identifiers imported into these
constructs are treated specially. Each imported identifier is evaluated when
the constructor is executed and within the constructor the value is bound to
an implicitly declared constant with the same name.
Consider the full form of the integer literal node creator shown in Figure 4.17. All objects created by invocations of IntegerNode.new are identical
except for the binding of the identifier value, which is initialized from the
argument val to each invocation. Once an object is created, it cannot see
changes to the argument val , nor is it bothered when that identifier ceases
to exist upon return of the new invocation.
Emerald blocks therefore have several similarities with those in Algol-60,
and hence Simula and Beta. Smalltalk abandoned block-structure in favor
of the single-level declaration of classes and global access to class definitions.
Multi-level class definitions are not permitted; the Smalltalk programmer
cannot declare a class inside another class.
There has been considerable discussion in the literature both supporting
and opposing the use of block structure in programming languages.[38, 39, 40]
By ensuring locality, block structure permits the seclusion of object constructors to the context in which they make sense, thus emphasizing good
programming style. On the other hand, a nested object constructor has dependencies on enclosing objects, thus making it difficult for it to be reused
in different contexts. Jade, the Emerald extension mentioned in Section 4.3,
improves the reusability of nested object constructors by parameterizing contextual dependencies.[27]
4.5.2
Syntax
In recent years, as parsing techniques have become better understood, the
syntax of a language has usually come to represent the prejudices of its
designers rather than any significant improvement. This has meant that
CHAPTER 4. THE EMERALD LANGUAGE
58
const IntegerNode ←
immutable object INC
export new
const IntegerNodeType ←
type INType
function getValue → [Integer ]
operation setValue[Integer ]
operation generate[OutStream]
end INType
operation new [val : Integer ] → [aNode : IntegerNodeType]
aNode ←
object IntegerLiteral
export getValue, setValue, generate
monitor
var value : Integer ← val
operation getValue → [v : Integer ]
v ← value
end getValue
operation setValue[v : Integer ]
value ← v
end setValue
operation generate[stream : OutStream]
stream.putString[“LDI %d\n”, value]
end generate
end monitor
end IntegerLiteral
end new
end INC
Figure 4.17: Block structure and object constructors
issues of syntax are generally considered minor and unworthy of comment.
Nevertheless, some aspects of the Emerald syntax are interesting enough to
merit a brief discussion.
The reader may have been pleasantly surprised to find the absence of semicolons in Emerald — semicolons are needed neither to terminate statements
nor to separate statements. Emerald uses square brackets uniformly for invocations and operation definitions. Note that arrays and vectors, although
system-defined, are objects that follow the standard invocation paradigm,
and do not need any special syntax. However, sugared shorthand notations
CHAPTER 4. THE EMERALD LANGUAGE
59
for common operations (such as subscripting) are provided. The shorthands
are available uniformly for both user- and system-defined objects.
Emerald uses the ← symbol uniformly for all instances of the binding
operation: assignment, constant declaration, and collecting the results of an
invocation. As part of its distributed programming heritage, Emerald clearly
distinguishes between the use of input parameters and output parameters.
Both invocations that return multiple results such as:
var point :
type T
operation getCoordinates → [Real, Real, Real ]
end T
var x, y, z : Real
...
x, y, z ← point.getCoordinates
and multiple assignments such as:
var a, b : Integer
a, b ← b, a
are conveniently expressed.
4.6
Final Remarks
4.6.1
Current Status
An Emerald prototype is implemented under Unix1 on sun-32 and vax3
processors. The compiler generates native code, which is dynamically loaded
into a pre-existing Emerald object world. Currently, Emerald does not have
a sophisticated programming environment. The success of Smalltalk (and
Trellis to a lesser extent) clearly demonstrates that an excellent programming
environment is a great boon to its user community, while shortcomings in the
programming environment are discouraging to a programmer. The lack of a
proper programming environment is not a criticism of the Emerald language
as such; it reveals that the provision of a complete programming system
is a must for getting the most from the language. Despite this handicap,
the development of various applications (such as a prototype mail system,
1
UNIX is a trademark of AT&T Bell Laboratories.
SUN-3 is a trademark of Sun Microsystems.
3
VAX is a trademark of Digital Equipment Corporation.
2
CHAPTER 4. THE EMERALD LANGUAGE
60
a multi-user calendar system, and the Emerald compiler itself) reveals the
usefulness of the Emerald language features.
The focus of our current work is on improving implementation sharing
and software reuse in Emerald. We are also working on a new, more portable,
implementation of the language based on the reimplementation of the compiler in Emerald.
4.6.2
Conclusions
We have presented and discussed a number of examples that bring out the
flavor of the Emerald approach to programming. Emerald is not just the
sum of its parts: it is also the interaction of those parts. Emerald’s object
constructor and nesting allow the simulation of Smalltalk classes. The separation of typing and implementation, types as objects, and Emerald’s notion
of conformity all combine to allow a very natural form of polymorphism.
A number of factors contribute to the differences between Emerald and
several other object-based languages. These include its emphasis on abstract typing and compile-time type-checking, the clear separation of types
and implementations, and the use of simple mechanisms for each identifiable
paradigm. For example, Emerald has the object constructor for object creation, the abstract types and conformity for the classification of objects, and
objects themselves contain their own methods. This unbundling of the functions of Smalltalk classes has been found appealing by other researchers.[21]
Our experience with Emerald has shown that it is an interesting and useful
general-purpose programming language. Indeed, we are now convinced that
the Emerald model of objects is a good foundation on which future objectoriented systems can be built.
ACKNOWLEDGEMENTS
We thank Alan Borning, Jeff Chase, Bjorn Freeman-Benson, Bill Griswold, Niels Christian Juul, John Maloney, and the referees for their feedback
and comments. Rajendra Raj, Ewan Tempero, and Henry Levy were supported in part by the U.S. National Science Foundation under Grants No.
CCR-8619663, CCR-8700106, and CCR-8907666, and by the Digital Equipment Corporation Systems Research Center and External Research Program.
Norman Hutchinson was supported in part by the U.S. National Science
Foundation under Grant CCR-8701516. Eric Jul was supported in part by
the Danish Natural Science Research Council under Grant No. 5.17.5.1.35.
Bibliography
[1] A. Black, N. Hutchinson, E. Jul, and H. Levy. Object Structure in the
Emerald System. In Proceedings of the ACM Conference on ObjectOriented Programming Systems, Languages and Applications, pages 78–
86, October 1986. In SIGPLAN Notices, 21(11), November 1986.
[2] A. Black, N. Hutchinson, E. Jul, H. Levy, and L. Carter. Distribution and Abstract Types in Emerald. IEEE Transactions on Software
Engineering, 13(1):65–76, January 1987.
[3] E. Jul, H. Levy, N. Hutchinson, and A. Black. Fine-grained Mobility in the Emerald System. ACM Transactions on Computer Systems,
6(1):109–133, February 1988.
[4] E. Jul. Object Mobility in a Distributed Object-Oriented System. PhD
thesis, TR 88-12-06, Department of Computer Science, University of
Washington, Seattle, December 1988.
[5] US Department of Defense, MIL-STD-1815, Washington, D.C. Reference Manual for the Ada Programming Language, January 1983.
[6] N. Wirth. Programming in Modula-2. Springer-Verlag, Berlin, 1983.
[7] B. Stroustrup. The C++ Programming Language. Addison-Wesley, Reading, Massachusetts, March 1986.
[8] A. Goldberg and D. Robson. Smalltalk-80: The Language And Its Implementation. Addison-Wesley, Reading, Massachusetts, 1983.
[9] D. Ungar and R. B. Smith. Self: The Power of Simplicity. In Proceedings of the Second ACM Conference on Object-Oriented Programming
Systems, Languages and Applications, pages 227–241, October 1987. In
SIGPLAN Notices, 22(12), December 1987.
[10] G. Birtwistle, O.-J. Dahl, B. Myrhaug, and K. Nygaard. SIMULA Begin.
Petrocelli/Charter, New York, 1973.
61
BIBLIOGRAPHY
62
[11] B. B. Kristensen, O. L. Madsen, B. Moller-Pedersen, and K. Nygaard.
The beta Programming Language. In B. Shriver and P. Wegner, editors,
Research Directions in Object-Oriented Programming, pages 7–48. The
MIT Press, Cambridge, Massachusetts, July 1987.
[12] H. M. Levy and E. D. Tempero. Modules, Objects, and Distributed
Programming: Issues in RPC and Remote Object Invocation. Software
Practice and Experience, 1990. This issue.
[13] D. L. Parnas. On the Criteria to Be Used in Decompositing Systems
into Modules. Communications of the ACM, 5(12):1053–1058, December
1972.
[14] B. Meyer. Object-Oriented Software Construction. Prentice-Hall International, New York, 1988.
[15] A. H. Borning. Classes Versus Prototypes In Object-Oriented Languages. In ACM/IEEE Fall Joint Computer Conference, November
1986.
[16] A. H. Borning and T. O’Shea. Deltatalk: An Empirically and Aesthetically Motivated Simplification of the Smalltalk Language. In Proceedings of the European Conference on Object-oriented Programming, June
1987. In Lecture Notes in Computer Science, Vol. 276, Springer-Verlag,
Berlin, 1987.
[17] L. Cardelli and P. Wegner. On Understanding Types, Data Abstraction,
and Polymorphism. Computing Surveys, 17(4):471–522, December 1985.
[18] N. C. Hutchinson. Emerald: An Object-Based Language for Distributed
Programming. PhD thesis, TR 87-01-01, Department of Computer Science, University of Washington, Seattle, January 1987.
[19] L. Cardelli, J. Donahue, L. Glassman, M. Jordan, B. Kalsow, and G. Nelson. Modula-3 report (revised). Technical Report #52, Systems Research Center, Digital Equipment Corporation, Palo Alto, California,
November 1989.
[20] L. Cardelli. Typeful Programming. Technical Report #45, Systems
Research Center, Digital Equipment Corporation, Palo Alto, California,
May 1989.
[21] S. Danforth and C. Tomlinson. Type Theories and Object-Oriented
Programming. Computing Surveys, 20(1):29–72, March 1988.
BIBLIOGRAPHY
63
[22] B. Meyer. Static Typing for Eiffel. Technical Report TR-EI-18/ST,
Interactive Software Engineering, Inc., Santa Barbara, California, July
1989. See Rationale for the Eiffel Rules.
[23] C. Schaffert, T. Cooper, B. Billis, M. F. Kilian, and C. Wilpolt. An
Introduction to Trellis/Owl. In Proceedings of the ACM Conference
on Object-Oriented Programming Systems, Languages and Applications,
pages 9–16, Portland, Oregon, September 1986. In SIGPLAN Notices,
21(11), November 1986.
[24] W. R. Cook. A Proposal for Making Eiffel Type-safe. The Computer
Journal, 32(4):305–311, August 1989.
[25] A. Snyder. Inheritance and the Development of Encapsulated Software
Components. In B. Shriver and P. Wegner, editors, Research Directions in Object-Oriented Programming, pages 165–188. The MIT Press,
Cambridge, Massachusetts, July 1987.
[26] B. Liskov. Data Abstraction and Hierarchy. In OOPSLA ’87, Addendum
to the Proceedings, pages 17–34, October 1987. In SIGPLAN Notices
(23)5, 1988.
[27] R. K. Raj and H. M. Levy. A Compositional Model for Software Reuse.
The Computer Journal, 32(4):312–322, August 1989.
[28] R. K. Raj. Composition and Reuse in Object-Oriented Languages. PhD
thesis, University of Washington, Seattle, 1990.
[29] R. Milner. A Theory of Type Polymorphism in Programming. Journal
of Computer and System Sciences, 17:348–375, 1978.
[30] J. Donahue and A. Demers. Data Types are Values. ACM Transactions
on Programming Languages and Systems, 7(3):426–445, July 1985.
[31] P. Rovner, R. Levin, and J. Wick. On Extending Modula-2 For Building
Large, Integrated Systems. Technical Report # 3, Systems Research
Center, Digital Equipment Corporation, Palo Alto, California, January
1985.
[32] E. W. Dijkstra. Cooperating Sequential Processes. In F. Genuys, editor,
Programming Languages. Academic Press, 1968.
[33] G. Agha and C. Hewitt. Concurrent Programming Using Actors. In
A. Yonezawa and M. Tokoro, editors, Object-Oriented Concurrent Programming, pages 37–54. The MIT Press, 1987.
BIBLIOGRAPHY
64
[34] A. P. Black. The Eden Programming Language. Technical Report 85-0901, Department of Computer Science, University of Washington, Seattle,
September 1985.
[35] C. A. R. Hoare. Monitors: An operating system structuring concept.
Communications of the ACM, 17(10):549–557, October 1974.
[36] P. Brinch Hansen. The Programming Language Concurrent Pascal.
IEEE Transactions on Software Engineering, 1(2):199–207, June 1975.
[37] R. C. Holt. Concurrent Euclid, The Unix System, and Tunis. AddisonWesley, Reading, Massachusetts, 1983.
[38] D. R. Hanson. Is Block Structure Necessary? Software Practice and
Experience, 11:853–866, 1981.
[39] R. D. Tennent. Two Examples of Block Structure. Software Practice
and Experience, 12:385–392, 1982.
[40] O. L. Madsen. Block Structure and Object-Oriented Languages. In
B. Shriver and P. Wegner, editors, Research Directions in ObjectOriented Programming, pages 113–128. The MIT Press, Cambridge,
Massachusetts, July 1987.
Chapter 5
Persistence in the Emerald
Database
In this chapter we give an overview of persistence in the Emerald Database.
Most of this work was done in [Lar92a].
In relational database systems all data is persistent. But data in applications, written in a host language, is not persistent. In an object-oriented
system with a single object model, making all objects persistent would impose to big overhead on applications that do not require persistence of all
objects the access.
Because all objects live in the same object-space and can be invoked by
both transactions and non-transaction processes, it cannot be determined at
compile-time or objectcreation-time if an object is persistent.
Therefore applications must at runtime determine what objects are persistent. But somehow the system must know which objects to store and
when to store them. In [Lar92a] reachability from a root-object [ABC+ 83]
was consideret but explicit storing was chosen.
Emerald objects are made persistent by explicitly storing them—the command “store v” stores a copy of the object “v” on disk [Lar92a].
In the runtime-system stored objects are identified by their Emerald Object ID. We have made no effort to optimize the performance of persistent
storage.
In [Lar92a] the attached facility was used when storing an object as well
as moving it, i.e., when a object was stores, all objects transitivily reachable
through attached variables in non-stored objects, were also stored. Storing
an object was viewed as moving it to a node with memory but no facilities
to execute Emerald processes. But we found that persistence and mobility are different concepts. Application programmers can use the attached
declarations for several purposes:
65
CHAPTER 5. PERSISTENCE IN THE EMERALD DATABASE
66
Performance For example, if variables, that are expected to be accessed
often are declared attached, network utilization can be reduced.
Fault tolerance For example. if an object is useless without another object, referenced by a variable, that variable can be declared attached to
minimize the chance of failure.
We therefore added a persistent declaration to the Emerald language—
e.g., “var persistent attached fooType foo” or “var persistent fooType
foo”
Because data-intensive applications (e.g., databases) can operate on more
data than can fit into the internal memory, stored objects are removed from
the internal memory. Objects are retrieved (moved back from disk), using the
Emerald object-fault mechanism—that is, when a stored object is invoked,
it is automatically moved back into internal memory.
Object retrieval is lazy—only the invoked object and its private objects
are retrieved.
5.1
Example: A Persistent Tree
Figure 5.1 shows an object that can be stored. It holds an persistent reference
to a binary tree, which consist of node objects that are declared persistent
and that have persistent references to the objects stored in the tree. An
invocation of the insert operation moves a number of objects, limited by
the depth of the tree, from disk to memory. After the insert invocation all
invoked tree-nodes, and the inserted object, are moved to disk by a single
store operation. The recovery section of the binary-tree object is executed
when the node is brought up after a crash. It brings the tree in a valid state
in case it was in an invalid state when the node crashed. The concurrency
control is very course-grained in this solution—only one process can operate
on the tree at a time (the insert operation in the monitor is a bottleneck).
CHAPTER 5. PERSISTENCE IN THE EMERALD DATABASE
object binary tree
monitor
persistent var rootNode : NodeType ← . . .
export operation insert[elem : elemType]
rootNode.insert[elem]
store rootNode
end insert
recovery
rootNode.validate
end recovery
end monitor
end binary tree
Figure 5.1: Operating on a persistent tree
67
Chapter 6
Distributed transactions
We want to add transaction control [RSPML78] to distributed systems with
fine-grained mobile objects, specifically the Emerald system. We consider
transactions with ACID properties [HR83]:
Atomicity
Consistency
Isolation
Durability
These properties are realized by the mechanisms: concurrency control,
recovery, and persistence (durability).
The concurrency control and recovery mechanisms can be considered independent problems, but they are intimately related. The main reasons is
locality of reference and shared data structures [AD95].
Furthermore without concurrency control, recovery is reduced to checkpointing. Even if we discount errors, concurrency control without recovery
is not possible, because we cannot avoid killing processes.
6.1
The Transaction Model
In a distributed system we want multiple threads/processes per transactions
to:
• Ensure scalability in the case of few current transactions compared to
the number of nodes in the system.
68
CHAPTER 6. DISTRIBUTED TRANSACTIONS
69
• Support queries on objects, that are distributed on multiple nodes.
The query optimizer can decide that it is more efficient to spawn a
transaction into processes on multiple nodes than to move or copy
objects.
To integrate database operations with application programs, it is necessary that a transaction can communicate with something that is not a
transaction.
In database programming languages, applications usually start a transaction and end the transaction by committing it (figure 6.1). An application
can issue multiple commits, which have the same effect as serially executing
multiple transactions (figure 6.2).
BEGIN TRANSACTION
command1
...
COMMIT
Figure 6.1: Database transaction
BEGIN TRANSACTION
command1
...
COMMIT
command2
...
COMMIT2
Figure 6.2: Two database transaction
An application can execute concurrent transactions, by executing statements from multiple contexts using threads. Applications can, however, then
break the isolation of transactions and inter-context dependencies break the
support for concurrency control, e.g. create deadlocks, the database manager
cannot resolve. [IBM].
To achieve a high level of concurrency we want transaction to execute
asynchronously. To execute transactions in a certain order, synchronization
primitives such as semaphores or monitors can be used. This can, however,
CHAPTER 6. DISTRIBUTED TRANSACTIONS
70
be difficult because of the atomicity and isolation of transactions. See section
6.2 for a discussion on how to order transactions in Emerald.
We therefore treat transactions as special kinds of processes or threads—
i.e., instead of creating a process we make the whole process a transaction.
We propose that transactions are a special kind of processes or threads.
That is, a process can be a transaction or a normal process.
Multiple processes execute in the same transactions. All objects can be
accessed by transactions as well as processes that are not transactions.
Concurrency control must ensure that transactions are isolated (as in
ACID). Concurrency control must take into account that objects can be
accessed by both transactions and processes that are not transactions.
Transactions are atomic and serializable in the sense that a transaction
has the same effect as the processes in the transaction could have had, if they
had executed alone[BG81].
In general, the set of objects accessed by a transaction cannot be determined before the transaction is terminated.
We assume that all transactions terminate and that processes not in transactions are synchronized so that they do not deadlock.
We assume synchronization primitives are well behaved. The meaning of
this depends on the object-oriented system.
To allow concurrency within transactions it must be possible to have multiple threads of control executing concurrently within the same transaction.
We have considered three ways to support this.
Static declaration When applications starts a transaction, a number of
threads to execute within this transaction are specified. Each thread
could run on a separate node in the system. This is not well suited for
our model of distributed environment where the number of nodes can
change during the execution of a transaction, and processes are mobile. In a system where transactions are computationally complete the
resources needed by a transaction are not known in advance, therefore
the level of concurrency cannot be maximized.
Dynamic creation of mobile transactions A transaction can spawn new
processes that then execute as part of the transaction. In a system with
mobile objects a thread can execute on other nodes than where it was
created. When one sub-transaction fails, the whole transaction fails.
Nested transactions A transaction can spawn new threads which become
children of the first transaction and siblings to each other. Only leaf
transactions perform operations. All transactions in the nesting tree
are isolated except that a transaction can get a lock on an object if all
CHAPTER 6. DISTRIBUTED TRANSACTIONS
71
holders of the lock are ancestors and that parents inherit locks from
committed children [Mos86]
We consider nested transactions best suited for our project. But we have
not yet implemented nested transactions. Therefore we chose ”dynamic creation of mobile transactions”. The main differences from nested transaction is that no sub-transactions are isolated from each other and that one
sub-transaction can cause the whole transaction to fail. The latter is most
important in a distributed environment. In a centralized environment transactions can survive failure of sub-transaction by executing sub-transactions
in a failure-handler, that can signal a failure to other sub-transactions. But
in a distributed system networks of node-failure can prevent the failure handler in executing. See section 3.4 for a discussion of this issue in the Emerald
implementation.
6.2
Emerald Transaction Model
The goal is to support concurrency control by extending the Emerald language and run-time system.
We propose to make transactions a part of the languages. It is possible to
implement transaction support in library functions. (e.g. a CORBA Object
Transaction Service [WRS]). But to allow the same application and the same
runtime-objects to be used both in transactions and normal processes, we
want transactions, processes, and synchronization to be part of the language.
In systems with reflection (e.g. Java) or dynamic binding, e.g. Smalltalk) it
might, however, be possible to obtain this without changing the language.
In the Emerald Language [Hut87b] any object can have zero or one processes. I.e. when an object with a process is created, a new process is created.
The newly created process is not associated with the object that created it,
except that it can access private variables, the object, and variables in its
lexical scope. Emerald processes behave as objects with regard to mobility,
but they cannot be manipulated by Emerald processes (figure 6.3).
We propose a new construct in the Emerald language: A transaction.
Transactions have the same syntax as processes, except that the keyword is
”transaction”. Transactions have the same behavior as processes except that
they can consist of multiple processes and that ACID properties are ensured
(figure 6.4)
CHAPTER 6. DISTRIBUTED TRANSACTIONS
72
object anObject
functions and operations
process
statements
end process
end anObject
Figure 6.3: An Emerald process
object anObject
functions and operations
transaction
statements
end transaction
end anObject
Figure 6.4: An Emerald transaction
6.2.1
Creating transactions
We see no need for transactions to create processes that are not executing
in a transaction. Such processes would have to be born without any state to
not break the isolation of the transaction that created them and would not
be able to change the state of any object in the system.
Because of the integration with the Emerald code any executing code can
create both a new process and a new transaction. We have to define the
semantics for the creation of a new process respectively transaction inside an
executing transaction.
Sub-transactions not allowed A transaction that creates a new process
adds that process to the transaction. A transaction that creates a new
transaction fails. I.e. it is considered an error for transactions to create
transactions. Some of these errors could be detected at compile-time,
but there could also be run-time errors.
Transaction inclusion When a transaction creates a new transaction, the
new transaction becomes a process in the old transaction. This results
in a different behavior than not allowing sub-transactions because in
general an operation that creates a process or a transaction can be
CHAPTER 6. DISTRIBUTED TRANSACTIONS
73
invoked from both a transaction and a process.
Sub-transactions A transaction that creates a new process adds that process to the transaction. The old transaction is allowed to create a new
transaction.
But the new transaction cannot complete before the old transaction
completes because that would break the isolation of the old transaction.
Even though the new transaction cannot read uncommitted writes by
the old transaction, it can have object references to objects that are
created by the old transaction and contain constants initialized by the
old transaction.
If the old transaction fails, the new transaction must also fail to avoid
reading uncommitted values.
Nested transactions A Transaction that creates a new process creates a
sub-transaction. A transaction that creates a process adds a new process to the transaction. Only leaf transactions can perform operations.
In the Emerald system that means that transactions cannot invoke operations in monitors while they have sub-transactions. It would, however, be possible to allow transactions to invoke operations in monitors
in local object, i.e. objects that the Emerald compiler have determined
cannot be seen from outside the object in which the transaction is
created.
We have implemented “transaction inclusion”.
6.2.2
Inter-transaction synchronization
The definition of isolation means that the effect of a number of transactions
that each consist of a number of processes should be the same as if each
transaction had executed alone in some order. This means that the effect of
executing a multi-process transaction alone should be the same as the effect
of a possible execution of the processes in the transaction in the absence of
failures. Therefore the Emerald monitor synchronization is still necessary for
transactions with multiple processes.
Transactions operate on objects; but in Emerald only operations inside
monitors in objects are relevant for transactions, because only variables that
are accessed through monitors can be updated. In Emerald operations, but
not functions, can have side-effects. Not all operations have side-effects and
not all operations that do have side-effects change the state in the object
in which the monitor is located. It is therefore safe to treat invocations of
CHAPTER 6. DISTRIBUTED TRANSACTIONS
74
begin
anObject1.f[42]
on failure
anObject2.f[42]
end failure
end
Figure 6.5: A failure-handler
a function in a monitor as reads, and invocations of operations considered
as writes. It would be even less conservative to extend the compiler to flag
operations that could change the state of the object to which a monitor
belongs.
On the other hand treating all functions as reads allows application programs take advantage of knowledge of the semantics of functions to increase
the level of concurrency by e.g. adding caches to functions.
6.3
Concurrency control
We have chosen to implement concurrency control with two phase locking
[BG81].
Deadlocks are detected at the process level (section 6.4.1)—When a process is aborted or selected as a victim for a dead-lock and therefore fails,
its transaction is aborted. Aborted transactions are automatically restarted;
however, Emerald failure handlers can intercept dead-lock failures as well as
other failures. The use of a failure-handler is demonstrated in figure 6.5. A
failure in the begin. . . end block will be intercepted by the failure-handler
(or by a failure handler on a lower level, i.e., in the anObject1 object).
When a transaction obtains a write-lock on an object, the variables in its
monitor are backed up (i.e., copied to the shadow). The backup (the copy of
data in the monitor) follows its object/monitor, i.e., when an object is moved
to another node the backup moves with it and when an object is stored on
disk, the backup is stored on the same disk.
Concurrency control applies automatically to all objects, except instances
of the two built-in types Instream and Outstream and the Emerald database manager object, unlike the Argus[Lis88] system where objects can be
declared atomic.
Transactions coexist with normal Emerald processes (non-transaction
CHAPTER 6. DISTRIBUTED TRANSACTIONS
75
processes). Processes are identified by process OIDs. Transactions are identified by transaction OIDs, i.e., a stacksegment contains a stacksegment OID,
a process OID and (if the process is member of a transaction) a transaction
OID. A process OID is a reference to the first stacksegment of the process.
A transaction OID is a reference to the first stacksegment in the first process
in the transaction.
The first stacksegment of a process is the stacksegment that contains the
first activation record of the process, the activation record on the bottom of
the process-stack. The last stacksegment of a process is the stacksegment
that contains the activation record on top of the process-stack.
The run-time representation of transactions is illustrated in the snapshot of
three processes in two transactions on four nodes shown in figure 6.6. The
figure shows that the first transaction (T1) was initially process two (P2)
which was executing in stack segment three (SS3).
Then transaction T1 created process P4 which therefore became a member of transaction T1. P2 invoked the objects O21 and O22 while P4 invoked
the object O22—It is not a problem that both P2 and P4 invoke object O22
because the two processes are in the same transaction. Process P4 then made
a remote invocation on an operation in object O24 which made a remote invocation on object O26. Process P2 made a remote invocation on object
O23. Then a new transaction T2 was created. Initially it consisted of process P9. Process P9 invoked object O26 and object O23 (through a remote
invocation). Both T1 and T2 have invoked object O26 and O23. The figure
does not show if, for example, P2 invoked O22 before it invoked O23.
It is necessary with concurrency control to avoid such situations where
processes from different transactions invoke the same objects. In figure 6.7 it
is demonstrated how a persistent tree can be operated on using transactions.
The treeTransaction object has a process that creates 10 transactions, that
update the tree concurrently.
Processes in the same transaction are synchronized using the Emerald
synchronization mechanisms—monitors and monitor conditions (Section 6.4.2).
6.4
Synchronization
Synchronization is done by means of locks. Invoking a monitored function
on an object requires a read-lock, invoking a monitored operation requires
a write-lock. If a transaction holds the only read-lock on an object, it can
upgrade the lock to a write-lock.
When a process in a transaction starts executing, a message is sent to
the transaction initiator (the last stacksegment of the first process in the
CHAPTER 6. DISTRIBUTED TRANSACTIONS
76
AR
AR
T2,P9S13
AR
O27
s11
AR
T2,P9,S10
O26
AR
T1,P4S12
O24
O23
AR
AR
T1,P2,S6
T1,P4,S7
s3
s5
O22
AR
AR
O21
AR
T1,P4,S5
T1,P2,S3
AR
Stack segment with OID:5, process 4 transaction 1
T1,P4,S5
O20
S17
With one activation record
Object with OID 20
OID reference to object 17 (a stack segment)
pair of object and activation record.
Computer
Figure 6.6: A sample set of transactions
CHAPTER 6. DISTRIBUTED TRANSACTIONS
77
object treeTransaction
process
var t : integer
for(t←0 : t < 10 : t←t+1)
const aT == object theT
transaction
var i : integer
for(i←0 : i < 100 : i←i+1)
binary tree.insert[i + 10 ∗ t]
store binary tree
end for
end transaction
end theT
end for
end process
end treeTransaction
Figure 6.7: Transactions operating on a tree
transaction) where a set of transaction processes is maintained.
When a process (i.e., the process’s first stacksegment) obtains a lock
on an object/monitor, it checks (using Emerald location facilities) if the
transaction initiator is resident. If it is, the locks are added to the transaction
initiator’s stacksegment. If not, the locks are added to the stacksegment that
obtained the lock. When a remote invocation returns, the locks obtained by
the returning stacksegment are sent to the calling stacksegment together with
the results of the invocation. When a process terminates, a message with the
locks obtained by that process is sent to the transaction initiator, where the
process is marked as finished.
Thus, when a transaction terminates (when all processes in the set of processes in the transaction initiator are marked as finished) all locks obtained by
all processes in the transaction will be stored in the initiating stacksegment.
When a transaction terminates, a two phase commit is started unless:
1. All objects locked by the transaction are located on the same node as
transaction manager for the transactions, which is the node on which
the transaction initiator is located.
2. None of the objects locked by the transaction is persistent.
CHAPTER 6. DISTRIBUTED TRANSACTIONS
6.4.1
78
Deadlock detection
A transaction consists of a number of processes.
Because a transaction does not succeed before all its processes have succeeded, deadlocks between processes are also deadlocks between transactions.
If a transaction is deadlocked, at least one of its processes must be deadlocked.
Therefore, we need a deadlock detection algorithm that detects deadlocks between processes in transactions.
When a deadlock is detected, a victim is selected and all processes in the
victim transaction are forced to fail.
We have implemented a deadlock detection algorithm that detects deadlocks between processes in transactions.
When a deadlock is detected, a victim is selected (arbitrarily in the current version) and all processes in the victim transaction are forced to fail.
We use Chandy and Misra’s algorithm [Kna87] for deadlock detection.
The stacksegment of a transaction process that is blocked on an invocation
on an object contains a Waiting-For value that is the OID of the objects it
is waiting to invoke. A locked monitor holds a set of owners—if there is a
read-lock on the monitor, it is a set of OIDs for the processes of the readers,
if there is a write-lock on a monitor, it is the OID of the process of the writer.
On each node a set of local transaction stack-segments waiting to get a
lock on an object (the Wait-For set) is maintained. Our version of Chandy
and Misra algorithm works as follows. For each element in the Wait-For set,
we look at the object the element is waiting for (the object is always located
on the same node because a waiting stacksegment moves along when the object is moved). A probe containing the stacksegment OID of the originator is
sent to the owners (to the first stacksegments of the processes that are owners). When a stacksegment receives a probe with its own OID as originator
a deadlock is detected.
When a stacksegment that is waiting on an remote invocation receives a
probe, it is forwarded to the stacksegment of the invoked object.
To allow a transaction to upgrade a read-lock to a write-lock, we declare
deadlock when a cycle of length > 1 is detected. This does not result in undetected deadlocks: If a process P that is directly dependent on itself (waiting
to write on an object, it has already read) was deadlocked, it must have been
waiting on at least one other process (otherwise it would have acquired the
lock). At least one of these processes must depend on P (otherwise P would
get the write lock when all the readers had terminated). Therefore there
must exist a cycle of length > 1 with P as a member and thus we would
detect the deadlock regardless of our ignoring cycles of length < 2.
Each stacksegment maintains a set of process OIDs for processes which it
CHAPTER 6. DISTRIBUTED TRANSACTIONS
79
already knows are dependent on the stacksegment itself (except process OIDs
that it obtains from probes that come directly from itself). Probes arriving
at stacksegments that are active (not waiting for entry in monitors locked by
other transactions), or those that already know that the originating process
in the probe is dependent on it, are judged meaningless and are ignored.
Deadlock-failures as well as other failures can be intercepted in an application program by using Emerald failure-handlers. If a dead-lock failure in a
transaction is not intercepted, the transaction is automatically aborted and
restarted. There are two ways of using failure handlers to handle deadlocks:
One is to have a failure handler on the top level of the transaction (in the
body of the transaction body of the first process in the transaction). When
this handler is called, all the locks obtained by the transaction have been
released and the failure handler can choose a new strategy for the transaction. The other way is to have failure handlers at lower levels. When such
handlers are invoked no locks have been released—i.e., the block covered by
the failure-handler cannot just be restarted as this would result in the same
deadlock again. But if the failure-handler can perform the task of the block
without accessing the object that caused the deadlock, a full roll-back can
be avoided.
The fact that transaction processes can wait on non-transaction processes
in monitors (Section 6.4.2) is not a problem regarding deadlock because we
assume that non-transaction processes do not deadlock.
6.4.2
Synchronization of transactions and processes
Non-transaction processes can invoke operations in monitors that are not
locked by transaction processes (readers or writers). Non-transaction invocations of functions and operations have the same priority as transaction
invocations of operations (writes), i.e., they must wait until a writer or all
readers of the object and all waiting processes in the waiting queue have
terminated.
Processes (whether transactions or not) can synchronize via monitor conditions, i.e., a process can wait on a monitor condition (a condition variable
in a monitor) or signal to a monitor condition. Emerald monitors use Hoare
semantics [Hoa74], i.e., if there are processes waiting on a signalled condition
variable the running process is suspended and a waiting process started.
Both the signal and wait statements have to be executed in the monitor
of the condition variable. That means that there can be no synchronization
between transactions.
Signaling between non-transaction processes and transaction processes
does not always make sense. The possible combinations of signaling between
CHAPTER 6. DISTRIBUTED TRANSACTIONS
80
Ta Pb denotes process b executing in transaction a
From\To
T1 P 2
T1 P 3
T2 P 4
P5
P6
T1 P2
√
√
T1 P 3
√
√
÷
(÷)
(÷)
÷
(÷)
(÷)
T2 P4
÷
÷
√
(÷)
(÷)
P5
?
?
?
√
√
P6
?
?
?
√
√
Table 6.1: Signaling from process to waiting process
processes and transactions are illustrated in table 6.1. Non-transaction processes (P5 and P6 in table 6.1) can synchronize as in the original Emerald
system. Processes in transactions (P2 , P3 , and P4 in table 6.1) can synchronize with other processes in the same transaction but not with processes in
other transactions.
Non-transaction processes could be allowed to signal conditions in monitors that are part of transactions. But if a transaction is restarted due to
failures or deadlocks, it will only receive the signal in one of its executions.
I.e., if the transaction is restarted, it could wait forever on the signal. Even if
the transaction is not restarted it cannot determine if it is safe to wait for a
process without breaking isolation. For example in figure 6.12 a transaction
is at some point in its execution waiting for input from a non-transaction
process. If the input is not ready it must wait on a condition ”inputq”. But
to determine if the input data already arrived and it missed the signal it
must check the variable ”inputReady”. But if ”inputReady” is false, the
transaction waits. The process will later produce the data and try to invoke
”inputIsReady”. But to ensure isolation it will wait for the transaction to
complete before accessing both the itran and dt objects.
Non-transaction processes cannot signal to waiting transaction processes
because they cannot access conditions in monitors locked by transactions.
We did consider allowing non-transaction processes invoking operations
with only signal operations to bypass the transactions concurrency control.
But this results in a complicated model that is difficult to implement.
The fields in table 6.1 that are marked “?” (signaling from transaction processes to non-transaction processes) denotes combinations that are
allowed but problematic because such signals cannot be rolled back. A solution could be to make non-transaction processes signaled by a transaction
become members of this transaction. If the transaction later fails, each converted process could be rolled back to the point where it converted.
CHAPTER 6. DISTRIBUTED TRANSACTIONS
81
If a transaction signals a condition-variable and a non-transaction process is
waiting for that variable, we cannot use Hoare semantics because then the
non-transaction process would execute in the monitor before the transaction
had committed. This would break the isolation property.
We therefore changed the effect of a signal statement so that a signal statement in a transaction does not immediately start a waiting nontransaction process and suspends itself, but moves the waiting process to the
queue of processes waiting for entry on the monitor. The effect is that the
process is no longer waiting on the monitor condition, but on the termination
of the transaction.
If a transaction fails after it has signaled a condition variable with a
waiting non-transaction process, the non-transaction process will continue
execution.
This is the only change in the semantics of the Emerald language.
An example of synchronizing between a transaction and a process
A process, called the parent process, creates a transaction. The transaction
is executed asynchronously. At some point the parent process wants to wait
for the transaction to terminate successfully.
The intuitive solution is to let the last command in the transaction be a
signal to a condition variable on which the parent process is waiting. But
the transaction might fail between the signaling and its termination. If we
had chosen the solution of letting non-transaction processes awakened by
transaction processes become a member of the transaction, the parent process
would join the transaction and there would be no problem.
In the current model where the process does not become a transaction, we
have to make a construction where the transaction signals a monitor condition and updates a variable, finished , before it terminates. If the transaction
fails it can be restarted. If it fails and is not restarted it is because the failure was handled by a failure-handler. This handler must awake the parent
process. finished will be false whenever the transaction is restarted.
This is not enough. The transaction can fail after it has signaled the
condition variable that awakens the parent process and then be restarted.
When the transaction fails, its locks are cleared and the parent process
can be scheduled even though the the transaction is restarted.
We can solve this by letting the parent process loop on the condition
until the monitor variable finished has been set. The parent process cannot
read the updated value finished before the transaction has been terminated
or restarted.
Figure 6.8 shows a parent process waiting for a transaction complete.
CHAPTER 6. DISTRIBUTED TRANSACTIONS
82
monitor
var finished : Boolean ← False
const TrEnd == condition.create
operation TrSignal
signal TrEnd
end TrSignal
operation trWait →[finished : Boolean]
if not finished then wait TrEnd
end trWait
end monitor
process
const myTransaction == object tr
transaction
finished ← true
...
trSignal
failure
...
end failure
end transaction
end tr
loop
exit when trWait
end loop
end process
Figure 6.8: A parent process waiting for a transaction to complete
The loop will iterate once each time the transaction fails after the trSignal
statement. Because the locks held by a victim of deadlock are released before
it is restarted it was necessary to consider restarts in the implementation
of the parent process. This is an example of how the decision of allowing
signaling from transaction to non-transaction processes can lead to problems
with atomicity. This problem could be avoided by making the awakened
process (the parent process) a part of the transaction as suggested above,
or by instead of scheduling processes waiting on condition variables when an
aborting transaction process exits a monitor, marking these waiting processes
with the OID of the aborting transaction and restarting them when that
CHAPTER 6. DISTRIBUTED TRANSACTIONS
83
transaction eventually exits the monitor after having successfully terminated.
6.4.3
Serializing transactions
A special but common case is executing two transactions one after another
(figure 6.4.3)
This is less straightforward in our model because transactions execute
asyncronously. For example in figure 6.11 t1$syn can have the value 1 or 2
after both transactions complete.
Having the creation and execution of one transaction wait for the completion of another transaction is an example of synchronizing between processes
and transaction (figure 6.13)
6.4.4
Special objects
Operations on the built-in I/O object types Instream, Outstream, and
Node are not considered as a part of a transaction because they cannot
automatically be rolled back. An alternative is to lock these objects as other
objects, but not roll them back on failure.
In the current implementation this can be obtained by simply implementing new types InstreamT, OutstreamT, and NodeT whose instances
references instances of the I/O objects and that have operations that map
directly on to the functions of the I/O objects. A transaction can then
have a failure handler that “manually” undoes its effect on the outside world
through the I/O objects; the failure handler can then return with a failure
and the transaction updates on Emerald objects would be rolled back and
the transaction restarted.
Another special object that avoids concurrency control is the Emerald
database query manager. Because the Emerald database manager is mutable (updates it own state), concurrency control on this object would mean
that Emerald database queries could not execute concurrently (all associative queries go through the Emerald query manager). It would be possible to
rewrite the query manager so that it becomes immutable—the query manager
object could then be treated as other objects.
6.5
Recovery
We use shadows and two phase commit to implement recovery [AD95]. Shadows are co-located with their original. That is, when an object that has been
written by a current transaction is moved to another node, its shadow is
CHAPTER 6. DISTRIBUTED TRANSACTIONS
84
PRC
WL
RL
RL
WL
Stack segment
RL
Stack segment involved in deadlock
Object with read lock (WL = Write Lock, RL= Read Lock)
wait for−− object to be invoked
Owner(s): OID’s of readers or writer
Reference between stacksegments of a process
Figure 6.9: Snapshot of Deadlock detection
CHAPTER 6. DISTRIBUTED TRANSACTIONS
85
begin
BEGIN TRANSACTION
cmd1.ops[]
END TRANSACTION
. . .
BEGIN TRANSACTION
cmd2.ops[]
END TRANSACTION
end
Figure 6.10: Two transactions execute one after the other
a ← object t1
Monitor
var syn : Integer
End Monitor
transaction
otop$syn←1
cmd1.ops[]
const b ← object t2
transaction
otop$syn←2
cmd2.ops[]
end transaction
end t2
end transaction
end t1
Figure 6.11: Wrong way
moved to the same node. When an object that has been written by a current
transaction is stored on disk, the shadow is also stored on the same disk.
CHAPTER 6. DISTRIBUTED TRANSACTIONS
86
const data == object dt
monitor
field inputReady : boolean ← false
end monitor
end dt
const it == object itran
monitor
var inputq : condition
operation inputIsReady
data$inputready ← true
signal inputq
end inputIsReady
operation waitForInput
if not data.getInputReady then
wait inputq
end if
end waitForInput
end monitor
transaction
...
waitForInput
...
end transaction
end itran
Figure 6.12: Process to transaction signaling (wrong)
6.6
Emerald Transaction Implementation
We have extended the Emerald compiler to generate code to support transactions. Time critical operations are handled completely by the generated
code, more complex and less time critical operations are handled by generating code that calls the run-time system
CHAPTER 6. DISTRIBUTED TRANSACTIONS
87
a ← object otop
Monitor
var syn : Condition
var done : Integer ← 0
operation synWait
wait syn
end synWait
operation synSig
signal syn
end synSig
End Monitor
process
const a ← object t1
transaction
done ← 1
synSig
cmd1.ops[]
end transaction
end t1
loop
exit when done == 1
synWait
end loop
b ← object t2
transaction
cmd2.ops[]
end transaction
end t2
end process
end otop
Figure 6.13: Serializing transactions in Emerald
6.7
Interaction between the compiler and the
run-time system
The Emerald compiler ensures that each operation/function in a monitor
upon entry checks if the monitor is free (using a lock-bit in the object) before
entering; if not, it traps to the run-time system and the process is placed in
CHAPTER 6. DISTRIBUTED TRANSACTIONS
88
a monitor queue. This facility has been extended to support transactions
(section 6.2). The compiler makes each operation/function in a monitor
check if the active process is a transaction. If it is a transaction, it checks if
it was the last transaction that accessed the object/monitor (using a “lasttransaction-here” field in the monitor). If this is the case, it continues with
the operation, otherwise it traps to the run-time system. If the current
process was not a transaction the normal monitor-check is performed.
This means that the overhead for non-transaction processes is just a few
instructions on each invocation of an operation/function in a monitor (8
instructions for a function on a SPARC-processor) and no overhead on operations/functions not in monitors.
Transactions have to trap to the run-time system the first time they access
an object (to acquire locks and, for writers, to back up the monitor variables)
and when they abort or commit. Transactions must use a few instructions on
each invocation of operations/functions in monitors (12/17 instructions for
a function/operation on a SPARC) and in some cases (readers from different
transaction invoking the same object interleavingly) a trap to the run-time
system.
6.8
Transactions and persistence/mobility
When a monitored object is moved to another node, its backup and set of
owners is also moved to the other node. When a monitored object is stored,
its backup must be stored. If a stacksegment moves to another node, it must
bring its locks with it.
6.9
Performance
We present performance numbers for non-persistent, distributed and nondistributed transactions. When evaluating the performance of the implementation of transactions, we are interested in the overhead imposed by
concurrency control on normal Emerald programs and the overhead of the
resulting system compared to a sequential, non-distributed implementation
in a common language like “C”.
The overhead compared to the original Emerald system can be divided
into a constant overhead on certain operations (invocations) and an overhead that depends on the behavior of the application programs (deadlock
detection, rollback, restarting of transactions, etc.)
CHAPTER 6. DISTRIBUTED TRANSACTIONS
program
First function invocation
First operation invocation
Subsequent function invocations(average of 1000000)
Subsequent operation invocations (average of 1000000)
Empty transaction creation(average of 1000)
One update transaction creation(average of 1000)
89
time/ µs
299
208
1.9
2.0
2015
2244
Table 6.2: Performance of constant transaction tasks (on a SPARC 10)
The constant overheads are measured on one node without persistence
because these overheads are insignificant compared to remote invocations.
Table 6.2 gives the performance figures for the basic transaction operations: invocation of an operation and a function in a monitor. The function/
operation has one integer as result/argument. The monitor contains one
integer variable. The empty transactions are transactions that do not invoke any objects, but just update a local variable and exit. The one-update
transactions are transactions that invokes an operation in one object and
then terminate.
The execution times are wall-clock times and cover the invocations—
the time to do the timing (55µs) and perform the loop (0.2s for 1000000
iterations) have been deducted.
To evaluate the performance of transactions that are more complex, but
not chaotic, we have made a small program that creates onum objects and
tnum transactions. Each transaction iterates inum times in a for-loop and
in each iteration it selects in a pseudo-random fashion one of the objects
and invokes first a function, then an operation, and then the function again
on that object. The function just returns an integer monitor variable, the
operation updates the monitor variable and runs iter iterations in a dummy
loop to simulate real work. The body in the loop is just complicated enough
to not be optimized away. To get an impression of the performance we
have translated the program to “C”. For the programs, that could produce
deadlocks, the times are averages of five executions.
The results can be seen in figure 6.3.
To evaluate the performance of distributed transactions we have timed
executions of the test program on one to four SPARC computers (figure 6.4).
The results show that there is a small overhead from using the original
Emerald system over a traditional application programming environment like
“C”. We believe this is a question of generating optimized code (the times
CHAPTER 6. DISTRIBUTED TRANSACTIONS
program
C program
(optimized -O3)
Original Emerald system
no concurrency
Original Emerald system
concurrency
New Emerald system, no concurrency
New Emerald system, concurrency
no concurrency control
New Emerald system
with concurrency control
iter
100000
0
0
1000
0
1000
0
100
1000
1000
90
onum deadlocks time/s
1000
– 124
1000
–
0.02
10000
–
0.02
10000
–
1.34
10000
–
0.094
10000
–
1.45
10000
–
0.24
10000
–
1.47
10000
–
1.49
10000
–
1.57
1000 10000
0 1000
0 10000
100000 1000
1.6
73.8
0.6
0
1.93
3.5
0.43
129
Table 6.3: Performance of transaction, tnum == 20, inum == 50
Nodes
1
2
3
4
Table 6.4:
deadlocks
0
1
0
0
time/s
5.7
7.0
5.4
3.9
Performance of distributed transactions
tnum==10, inum==10, onum=2000, iter =10000
for the Emerald system are close to the the times for the non-optimized
“C”-program). Adding concurrency gives a little extra overhead. The new
Emerald system is almost as efficient as the original for sequential program
and a little less efficient for concurrency involving processes.
The times for distributed transactions show a moderate speed-up for
transactions with execution times of about half a second; but also that deadlocks involving multiple nodes are relatively expensive.
CHAPTER 6. DISTRIBUTED TRANSACTIONS
6.10
91
Summary
We have added support for persistence and distributed multi-process transactions to the Emerald system without changing the semantics of the language
except for operations on monitor conditions.
Chapter 7
Queries
We propose a model for adding database querying capabilities to an objectoriented language.
In this chapter we focus not on the quality of query optimization, but on
the integration of dynamic query optimization in an object-oriented language.
In a system with no query optimizer, the application programmer can
express queries directly using library objects such as sets, indexes, and iterators.
In a system with a static query optimizer the compiler can compute the
query plan and generate code for the query at compile-time.
In an object-oriented system we can implement sets and indexes (chapter 8). We can implement iterators that can be put together to form any
query as described in [Gra93] (see figure 7.1).
Then we need a dynamic query optimizer that takes a query specified in
the query language and produces a query plan. For work on query optimization see [Fis95].
In relational database systems queries are usually pre-compiled for two
reasons ([CAI+ 81]):
1. Much work of parsing, name binding, access path selection can be done
once during pre-compilation, and not done on every execution of the
program.
2. The pre-compiled access object is smaller and can execute faster than
a general query interpreter.
In an object oriented database there is a third reason. The query can
perform executions, not only on pre-defined data structures, but also on
objects, that have already been compiled. This can be facilitated either by
pre-compilation and dynamic linking or reflection.
92
CHAPTER 7. QUERIES
93
User
SORT−iterator A.y
JOIN−iterator A.x=B.foo()
index: B.foo()
collection
A
collection
collection
B
C
Figure 7.1: Example: Iterators and indexes
In a system with a dynamic query optimizer query plans are validated
every time the query is executed. The old plan can be invalid because:
• An index used by the query plan has been deleted.
• A new index that can improve the performance of the query has been
created.
• Statistics on the content of the database imply that another query plan
would be better.
• Statistics on the location of objects in the database imply that another
query plan would be better.
If the old query plan is invalid, a new plan is produced. The new plan
must be compiled and linked at runtime.
7.1
The Query Model
We propose a query model for a system based on a compiled object-oriented
language and a runtime-system.
CHAPTER 7. QUERIES
7.1.1
94
Type System
The proposed query model can be used in a variety of existing or new object-oriented systems.
Our main interest is systems with strong typing. In systems without
strong typing we expect it to be simpler to implement the query model, but
we also expect it to be harder to implement it efficiently.
In systems with strong type-checking we need to assume certain properties
of the type system of the programming language.
We assume that the object-oriented type system has subtyping. That
means that a collection can contain objects of different subtypes of the
collection-type.
We assume that the type system has parametric polymorphy. This makes
it possible to implement generic collections and iterators. A generic collection implementation allows applications to create collections with collection
types that was unknown when the generic collection was compiled. i.e., new
Collection(myNewType). Similarly new JOINiterator(myNewType,fooType).
We assume that either the system has reflection or that we can modify
the compiler and run-time system.
We assume that either the system supports serialization. This can be
obtained by modifying the compiler and run-time system.
We assume static scoping, although our model might also work for languages with dynamic scoping.
Examples of object-oriented languages that do not have strong typing
are:
1. SmallTalk
2. Lisp based system. For example LOOM [YJM91].
Examples of strongly typed systems that meet our assumptions are:
1. Emerald
2. Java variants:
(a) Pizza [ORW98].
(b) PJama and GJ [ADJ+ 96].
7.1.2
Collections and Indexes
A collection is a set of objects of a given type, the collection-type—that is, all
objects in a collection must be of a type that conforms to the collection-type.
CHAPTER 7. QUERIES
95
A collection holds a number of objects and a number of indexes over these
objects.
Collections have operations for insertion, deletion, and generalized sequential scanning. Any object with a type that conforms to the type of a
collection can be used as a collection.
Paths of functions without arguments on objects in a collection can be
indexed. Functions can be indexed on both identity and value.
7.1.3
Query Language
The proposed query language has changed little since [Lar92a].
SELECT[C\rc WHERE P ] ≡ {rc | rc∈C ∧ P(rc) }
MAP [C\rc,f]
≡ {f(rc) | rc ∈C }
JOIN [C1 \rc1 , C2 \rc2 ,f,p] ≡
{f(rc1 ,rc2 ) | rc1 ∈C1 ∧ rc2 ∈C2 ∧ p(rc1 ,rc2 )}
C1 UNION C2
≡ {x | x ∈C1 ∨ x ∈C2 }
C1 INTERSECT C2
≡ {x | x ∈C1 ∧ x ∈C2 }
C1 MINUS C2
≡ {x | x ∈C1 ∧ ¬ x ∈C2 }
Aggregate functions:
SUM(C, zero-value) and COUNT(C)
Note that functions and predicates can be all legal expressions in the objectoriented programming language.
The following predicates are added to the programming language expressions:
FOREACH C\rc :: P ≡ ∀ rc (rc ∈C | P(rc))
EXISTS C\rc :: P
≡ ∃ rc (rc ∈C ∧ P(rc))
x in C
≡ x ∈C
This algebra is based on an algebra proposed in [ZM90, p. 28]. It is similar
to the relational algebra in e.g., [Dat90]. The DIVIDEDBY operator is
omitted but the functionality is kept through the quantifier predicates. The
JOIN operator is a generalized theta-join—for each pair in the Cartesian
product of the two collections, that satisfy the predicate, the returned value
is a user-defined function of the pair—it is not possible to concatenate two
objects as tuples are concatenated in the relational algebra, and it is inflexible
to use a standard function (e.g., a record with two fields).[Lar92a]
CHAPTER 7. QUERIES
96
All expressions are first class objects—also collection-expressions. For
example, it is possible to use collection-expressions as arguments to standard
programming language operations:
7.1.4
Interface between the Programming Language
and the Query Language
Database operations (queries, index maintenance etc.) are expressed as
operation-invocations on the pseudo-object, “edb”. “edb” is not an object.
It is a keyword in the query language.
Alternatively we could could use real objects:
Database db1 <- Database.create
Database db2 <- Database.create
db1.addCollection(myCollection)
db1.query[myCollection\mc WHERE mc.foo]
This way databases would be real objects. But operations on database
objects would still have to be treated specially to allow a query optimizer to
take advantage of current indexes. I.e. db1.query[...] cannot be a normal
invocation.
We base queries on physical iterators [Gra93]. The top-level iterator
delivers objects in the result-set to the application program. The application
program can do different things with the result of the query:
• Store the result in a data-structure (list, set, etc.)
• Print out the results.
• Present the results in graphical user interface, i.e. a scrollbar that can
show 20 results at a time.
• Filter the objects.
To allow application programs to use the results in a flexible and efficient
way we introduce accumulation objects. Accumulation objects can be seen as
user-defined iterators. They must implement an interface that has an insert
operation that takes objects of the collection-type of the result of the query.
We specify that collections must be subtypes of accumulation objects.
That makes it possible to store results in collections and later use the results
in new queries.
CHAPTER 7. QUERIES
97
Queries
Queries are expressed using the operation “query”. Objects selected by a
query are returned via an accumulation-object, passed as an argument to
the “edb.query”-operation. That is, a query in an application program has
the form:
edb.query[query, accumulation-object]
Collections in queries are programming language expressions. Therefore it
is not necessarily the same collections that are queried in each invocation
of the query. For example, in the following example both the collection
(“C.getElement[i]”) and the value (bound) change between executions of the
query:
for (i←0 : i < N : i ← i + 1)
bound←bound+inc
edb.query[SELECT[C.getElement[i]\theC WHERE
theC.f < bound ], accuObj ]
end for
Interface to the Runtime-Generator
When a query is executed the run-time generator receives a tree-representation of the query called the query-tree. The run-time generator optimizes
the query by manipulating the query-tree and then generates a query-object
that executes the query specified by the query-tree.
The query-tree is “cut off” at the following points: Object references and
object literals.
That is, in the query-tree given to the run-time generator, object references and object literals are represented as object references, not as a treerepresentations of the referenced object. This limits the optimizations that
the run-time generator can perform, but it simplifies the generation and manipulation of query-trees.
This means that the run-time generator can optimize expressions (programming language and query-algebra expressions) but not the body of operations/functions.
CHAPTER 7. QUERIES
98
Constant Sub-Expressions
Expressions in a query that are object references and expressions that do not
contain any range variables or programming language variables in the scope of
the query are constant within the query; therefore they are computed when
the query is executed. This limits the size of the query-tree and ensures
that constant sub-expressions are only computed once for each execution of
the query. E.g., in the query: “SELECT aCollection\x WHERE x.h=a.f[y]”,
“a.f[y]” is a constant sub-expression in the query since “a” and “y” are defined
outside the scope of the query.
Queries are incrementally compiled. This is done by generating source
code for a query object and then compiling this code. New query-objects are
dynamically linked so that they can be invoked by the run-time generator.
But query-objects must be able to access variables in the scope of the query.
This is handled in the following way: The compiler generates tree-representations of the expressions in the queries and determines all greatest constant
sub-expressions. The compiler produces code that calls the run-time generator with a query-tree where nodes for constant sub-expressions contain
the computed value of the sub-expression. The run-time generator can now
store all constant sub-expressions in the query server and generate code for a
query object that for each constant sub-expression has a constant initialized
to:
queryServer.take[queryNo].getev (numOfSubExpression)
Where numOfSubExpression is the number of the value within the query
(values are numbered by in-order traversal of the query-tree).
Updates
Updates are also handled by the “edb”-object. For example, indexes are
added as follows:
edb.addindex [Collection, identityfunctionName]
edb.addindex [Collection, functionName]
“functionName” is an expression of the type, string. The keyword identity
denotes an identity index.
Insertions and deletions are performed as invocations on collections:
ok ← theCollection.insert[theObject]
where ok is assigned true, if the insertion is successful.
CHAPTER 7. QUERIES
Attends
Result
m
99
m
1
Course
Number
m
1
1
Student
Teacher
Name
Status
Name
Status
GetMoney
0,1
0,1
1
1
Lunch
Account m
Coffee
Cheese
Sandwich
Bread
=1
Prices
Figure 7.2: The Student/Teacher/Lunch database
7.1.5
An Example
We have implemented a simple test database (Figure 7.2): Students attend
courses taught by teachers. Both students and teachers have a “Lunch”account. The database is implemented as a collection for each box in the
figure. “1:m” relationships (owner/member) are implemented by references
from members to owners. The dotted “1:0,1” relationships are however implemented as inverse relationships by having lunch-objects perform the operation: theCustomer.setLunch[self] when created with theCustomer as
an argument.
The query:
Which students have the same coffee consumption
as one of their teachers?
CHAPTER 7. QUERIES
100
can in the Emerald database be expressed as:
edb.query[
MAP[SELECT[
attends\f WHERE
f.getStudent.getLunch.getCoffee =
f.getCourse.getTeacher.getLunch.getCoffee]\r,
r.getStudent.getName],
accumulationObject]
Although this query is a join over all five collections in the database, it is
not necessary to express the joins explicitly. The SQL [Lar92a] formulation
of the query requires seven comparisons on foreign keys.
It is possible to have nested quantified expressions. For example:
Find students that attend all courses
can be expressed as:
edb.query[
MAP[SELECT[student\s WHERE FOREACH course\c ::
EXISTS attends\f ::
f.getStudent == s and
f.getCourse == c]\theStudent,
theStudent.getName],
ResultAccu]
7.2
Nested Database Operations
Consider a Supplier/Parts database[Dat90] designed so that each supplier has
a function, “getParts” that returns a collection of parts supplied by him/her.
The query, “For each supplier, how many red parts does this supplier supply?” can be expressed as:
MAP[S\rs, COUNT[SELECT[rs.getParts\rc WHERE
rc.getColor == color.red]]]
CHAPTER 7. QUERIES
101
The Emerald database does not support a schema, although our design does
not preclude adding one to the current implementation. Therefore it is unknown which functions are indexed on the collections (the “rs”-s) in the
SELECT-expression and it cannot be assumed that all rs-collections have
the same indexes. The result of this is that no index is used to optimize the
SELECT-expression.
By allowing object literals in queries to contain database operations (in
this case a query) the query above can be expressed as shown in figure 7.3.
const cdir == . . .
edb.query[MAP[S\rs, (object countObj
export function f → [cn : integer]
edb.query[SELECT[rs\c WHERE
c.getColor == color.red], cdir]
cn ← cdir.getElement
end f
end countObj).f], accu]
Figure 7.3: Number of red parts for supplier
Where cdir is an accumulation object, that counts the selected objects.
It is possible to have an arbitrary number of nested query layers. It is
possible to use all database operations in objects in queries (e.g., insertion
into collections and addition of indexes).
7.2.1
Advantages of Nested Database Operations
Nested database operations allow object constructors in queries to contain
queries. This is more powerful than allowing queries to contain references to
objects that contain database operations because:
• A query in an object can exploit range variables in surrounding queries.
• A query can reference variables from surrounding queries and objects.
It would be possible to simulate these effects by passing all variables and
range variables as parameters to operations in objects, declared outside the
query. But as for other programming languages this is a very awkward
method for non-trivial tasks.
CHAPTER 7. QUERIES
102
Besides, it is a goal to integrate the query language with the programming
language. Therefore the scope rules also apply in nested queries, so that
queries from an application programmer’s view are seen as normal invocations
on the pseudo-object “edb”.
Figure 7.4 demonstrates how nested database operations can be used to
perform complex operations on collections. The query takes a collection of
records. For each record the field “coll” is initialized to an empty collection
if it is nil. An index on the function “g” is added to all the “coll”-collections.
A collection of all the “coll”-collections is returned.
It would be possible to obtain the same effect without nested database
operations, using temporary collections; but that would result in a more
complicated query.
variable globalObj : mtype ← initval
..
.
edb.query[MAP[M\rm,
(object ob
export operation f → [newElem:mType]
if rm.getCcoll == nil then
rm.setCcoll ← collection.of[mType].create
rm.getColl.insert[GlobalObj]
end if
edb.query[SELECT[N\rn WHERE rn ! == rm.g], rm.getColl]
edb.addIndex[rm.getColl, "g"]
newElem ← rm
end f
end ob).f],
accu]
Figure 7.4: A Nested Query
As noted in section 7.2 nested database operations can be used to exploit
indexes. Another use of nested database operations is to control the distribution of the query. For example, in the query, “For each supplier, how many
red parts does this supplier supply?” in section 7.2 sub-queries for supplier
S42 can be made to execute on the computer, computer2 . This is shown in
figure 7.5.
CHAPTER 7. QUERIES
103
const cdir == . . .
edb.query[MAP[S\rs, (object countObj
export function f → [cn : integer]
if rs == S42 then
move self to computer2
end if
edb.query[SELECT[rs\c WHERE
c.getColor == color.red], cdir]
cn ← cdir.getElement
end f
end countObj).f], accu]
Figure 7.5: Number of red parts for supplier (II)
7.3
Architecture of the Runtime System
The architecture of the database system is illustrated in figure 7.6.
Users of the database write application programs that contain queries
expressed in the Emerald query language. When the application programs
are compiled, each query will generate a call to the run-time generator with
the query represented as a query-tree as an argument. Another argument to
the run-time generator is a unique query number (queryNo) obtained from
the query server. A query tree is a simple tree-representation of a query,
except that programming language expressions, that are independent of the
database, are folded.
The qs is a simple database that stores an ordered set of values (section 7.1.4) for each query.
The rtg optimizes the query, transforms the query-tree to code that computes the query, compiles this code, and executes it.
Figure 7.6 shows a snapshot of the database-system with three collections
(collection1 , collection2 , and collection3 ), two query-objects implementing
query number 17 and 42. Query number 17 has been compiled, while number
42 is being compiled or re-compiled. The three compiler circles represent the
same compiler. The figure shows that the application program can create
new collections by invoking the collection-class, perform operations directly
on the collections of objects, and perform queries that are handled by the
run-time generator.
CHAPTER 7. QUERIES
104
Application
collectionclass.of[e
Application
Source program
Collec−
tion
type].create
class
col
l
1.i
nse
rt[e
lem
databaseManager.eval[querytree,42]
]
col
l1.
d
Compiler
ele
database
manager
te[
ele
m]
]
[42
ake
er.t
serv
et<
ry
que
ng
i
t
i
wa
"]
"x
x[
Compiler
Compiler
de
in
etN
−g
Collec−
tion1
b<
−i
nd
ex
On
[fn
am
_O
Query
object 17
e]
Query
object 42
ex
getNindex["x"]$ind
find
n[...
]
tmp <− queryserver.take[42]
tmp$queryobject <− self
Collec−
tion2
Collec−
tion3
Query
Server
Figure 7.6: Overview of the database system
Thin lines represent invocations. Thick lines represent streams of source code (inand output to/from objects through Unix)
7.4
7.4.1
The Emerald Query Model
Indexes
Indexes are implemented as B-trees.
7.4.2
Query Architecture Background
In the first implementation of the EDB [Lar92a] the Emerald Compiler
[Hut87b] was not changed. Instead, EDB application programs were precompiled. The pre-compiler parsed the EDB queries and generated the calls
to the runtime-generator.
CHAPTER 7. QUERIES
105
In the current implementation the database facilities are completely integrated with the Emerald Compiler. This was done for the following reasons:
• It provides full type-checking of the application program. All type errors are detected when the application program is compiled (for queries
that do not use dynamic type checking).
The pre-compiler approach was also type-safe, but the query-objects
were first checked when the particular queries were executed. Because
query-objects are generated by the run-time generator, error messages
of type-errors in the query-objects, are of little use to the application
programmer.
Furthermore the EDB query language allows nested queries. Which
subset of nested sub-queries that are executed in a query depends on
the state of the database. With the pre-compiler approach only the
nested queries that are executed, are type-checked.
• Better Typing.
The compiler can take advantage of the type information for the database operations. For example consider at PLUS expression (PLUS is
a “union” operator; it is called PLUS because “union” is a keyword in
the Emerald language):
Collection1 UNION Collection2
The type for the UNION-expression must be a collection with an elementtype that is an super-type of both the element-types of Collection1 and Collection2.
In the current implementation if one of the element-types of one of
the collections is an super-type of the other, this elementtype is selected; otherwise nil is selected as super-type. But it would be possible
to compute the greatest lower bound of the element-types of the two
collections and use it as the type of the UNION expression.
• Full support for nested queries.
In the EDB query language, queries can contain expressions in the
Emerald language. Only a subset of the Emerald language in expression
in queries was understood by the pre-compiler in the original version
of the EDB. Particularly object literals were not supported.
In the integrated version, all EDB expressions (i.e. Emerald language
and Database expressions) are allowed.
CHAPTER 7. QUERIES
106
• Performance.
Fewer resources are required to compile an application program in the
current version than in the original implementation.
The performance of compilation is important not only for ad-hoc query
applications—The EDB allows nested queries; this means that queryobjects can contain queries and during the evaluation of one query
many queries may need to be compiled (or recompiled).
7.5
7.5.1
Architecture of the Current Version
Interface between the Emerald Language and the
Query Language
Database operations (queries, index maintenance etc.)
operation-invocations on the pseudo-object, “edb”.
7.5.2
are expressed as
Scope
The scope rules of Emerald (section 7.5.2) has been extended so that expressions in the query algebra opens a new scope that contains the range
variables.
The interface between the Emerald language and the query language is
treated specially—identifiers imported into the query using a statement in the
form, “edb.query[query-expression,accu]” are made constant each time the
query is executed. This makes it possible to (re)compile queries separately.
7.5.3
Incremental Compilation of Queries in Emerald
The run-time generator executes queries by generating query-objects that
have an operation, “ O”, that executes the query. The run-time generator
must be able to compile and execute these query-objects. A query is recompiled (i.e., a new query-object is generated) when the query is executed
and it is detected that an index used by the query-object has been deleted
or new indexes have been added to collections used in the query.
In Emerald, new objects can be compiled separately and linked dynamically. Objects can therefore be compiled by invoking the Emerald compiler
with the source code of the new objects. The run-time generator gets references to incrementally compiled objects by having these signaling their
existence when compiled and executed.
CHAPTER 7. QUERIES
107
The run-time generator traverses the query-tree, finds all Emerald constants and stores them in the query server under the key, “queryNo”. The
query has to be (re-)compiled if:
• The query has never been compiled before—there is no query-object
stored in the query server under the key “queryNo”.
• Indexes used by the query have been deleted.
• Indexes, that could be used, have been added to one or more of the
collections used in the query.
Query-objects contain a process, so that they can pass the query server a
reference to themselves and signal that the compilation has been completed.
If the query has to be (re-)compiled, the run-time generator waits on a
condition in the query server until the query-object has been compiled and
installed itself in the query server.
Now the query-object is “up to date” either because it has been compiled
or because it did not need to be re-compiled.
The query is executed by simply executing the operation, “ O”, in the
query object:
queryServer.take[queryNo].getQueryObject. O
In our implementation the task of incremental compilation is a bottleneck
because only one program can be compiled at a time. It would, however, be
simple to allow incremental compilation of query-objects to be concurrent.
7.5.4
Implementation of Nested Database Operations
A nested database operation is handled in the following way: The application program is compiled and each database operation is compiled into an
invocation of the run-time generator with a tree-representation of the database operation as an argument. When the program is executed the run-time
generator generates query-objects for the outer-most queries. Inner queries
(nested queries) are copied directly—that is, their query-trees are not translated to query-objects.
When a query-object is compiled, the next layer of queries is peeled off.
The compiled query-objects invoke the run-time generator when they execute
nested database operations and so on until the nested queries on the lowest
level are reached.
CHAPTER 7. QUERIES
108
Note that nested database operations can be executed many times. Each
time, the run-time generator re-compiles the code for that operation if relevant indexes have been added/deleted. This means that if the collectionvalues in the inner queries do not have indexes on the same functions on each
execution, the performance will suffer. It is the application programmer’s responsibility to take care of this. If sub-queries on the same level in a nested
query operate on collections with indexes on the same functions, indexes will
be exploited and no query or sub-query will have to be re-compiled unless
indexes are added or deleted.
7.5.5
Propagation of References in Nested Database
Operations
Consider an application program with nested database operations. A variableidentifier in one of the inner operations may refer to a variable in any of the
outer database-objects.
Each nested database operation is compiled separately and the value of
an identifier must be imported into the query-object for the query when the
query is executed (section 7.5.2).
This is handled in the following way:
When a database operation is executed a compiler-generated list of values
referenced from the database operation (including any database operations
nested in the query) are added to the query-tree. The run-time generator
stores all these values in the query server and produces a query-object. When
the query-object is executed, all values referenced from nested queries in the
query-object can be supplied by the query-object.
If an object does not contain database operations or references to rangevariables or mutable objects, it is just a constant sub-expression value (section 7.1.4).
7.6
Limitations/Further Work
In [Lar92a], a number of shortcomings of our work are discussed. Solutions
are given, but not implemented. In the following we mention the most important shortcoming.
7.6.1
Index Maintenance
In the implemented system indexes are not automatically updated. If an
object is to be updated, and this update affects the result of an indexed
CHAPTER 7. QUERIES
109
Task
Note
5000 insertions
collection.of[integer] 328
0.066 per insertion
collection.of[integer] 359
0.072 per insertion
5000 deletions
Selecting ≈ 10%
of 1000 random integers
collection creation
Add index first time
Remove index
Compile DB-program
Query, first execution
Time/s
Comment
0.24
of real
string.length
string.length
simple query
simple query
206
19.6
0.02
23.9
37.9
Table 7.1: Performance of basic tasks
function, the object must be deleted from the collection with this function,
updated, and inserted again. The problem with updates is that the value
of a function in an object can depend on functions in other objects. The
solution is to maintain a data-structure, that for each index on each function,
registers which mutable objects the function depends on. The compiler is
then changed so that each time an object is updated, all affected objects
are re-placed in relevant indexes. If all objects in such a data-structure are
marked with a special bit, the overhead on non-database operation can be
minimized.
7.7
Performance
In [Lar92a], the implemented database was found to be efficient compared
to Commercial Ingres. The Student/Teacher/Lunch-database was used as
a test-database. The cardinality of the collections is stated in table 7.3.
Relations between objects in different collections was generated based on a
simple random-number generator. All performance figures are measured as
wall clock time on an unloaded machine.
In table 7.2 (110) some essential performance figures are shown: The
times for the Emerald database were measured on a Sun SLC Sparc, while
the Ingres times were measured on a HP9000s300. Because both database
systems used a local disk, the results are comparable.
The overhead on the first execution of a query (and the first executions
after additions/deletions of relevant indexes) is on the order of 40 seconds
CHAPTER 7. QUERIES
110
Task
Note
Which students have the
same coffee consumption
as one of their teachers?
Find courses attended by
Niels Elgaard Larsen
No index used
Find students that attend all courses
Index on
“attends.getStudent,
student.getName”
Index on
“attends.getStudent”
Emerald Ingres
Time/s Time/s
65.7
136.5
1.6
3.5
478
53
Table 7.2: Performance of selected tasks
Collection cardinality
Student
1162
Teacher
26
Course
49
Attends
3996
Lunch
1188
Table 7.3: Population of test-database
(table 7.1) (109). Compiling an application program accounts for at least
25 seconds (more for larger programs). This makes ad hoc queries possible,
even though this is not a primary goal in the design of the system.
Chapter 8
Distributed Method Indexing
Object-oriented databases that support associative queries on objects (not
just abstract data-types) need indexes on these objects to efficiently perform
queries.
We consider an object-oriented database that support associative queries
on sets of objects—called collections. A collection can store objects of a
given type, the “collection-type” and objects of subtypes of this type. The
database is all objects that belong to at least one collection.
In general a query can contain arbitrary expressions containing references
to objects inside and outside the database. (see figure 8.1)
We consider a simple query: A search in one collection for objects that
have a given value for a given function. In the Emerald query language this
is expressed as:
SELECT[C\rc where rc.f = y]
To improve the performance of this query an index on the function “f” on
objects in the collection “C” is created. We denote this object, “C.f”
There are 4 cases:
1. Indexes on constants and local variables
2. Indexes on functions that depend on mutable objects
3. Indexes on functions that only depend on the local state and transitively immutable objects
4. Indexes on functions that only depend on the local state
111
CHAPTER 8. DISTRIBUTED METHOD INDEXING
112
C1
Fa
Fb
C3
Fd
C2
Fc
Figure 8.1: An example. Indexes on C1.Fa, C1.Fb, C2.Fa, C2.Fc, and C3.Fd
The dotted arrows represent references to objects inserted in collections, full arrows
represent invocations performed to compute the key that the objects are indexed
under. Dotted lines represent borders between three computers.
We consider the relations between indexing and persistence (see chapter 6). We use the Emerald transaction model—A transaction is a group of
one or more processes. A process does not have to be part of a transaction.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
113
A process that is part of a transaction, we call a “transaction process”. A
process that is not part of a process we call an “ordinary process”.
8.1
Indexes on constants and local variables
This is equivalent to relational database systems. A constant is a key and a
variable is a non-key attribute.
For example in figure 8.2 the name of a teacher is constant but the salary
can be changed.
object aTeacher1
const name : String
field salary : integer
end aTeacher1
Figure 8.2: Teacher type
The difference from relational database systems is that in relational database systems, all updates to the database are performed by the database
system. In the Emerald system all updates to variables are performed in the
same way. It does not matter if the variable is located in an object that is
used in the database, i.e., is a member of a collection.
It is not possible to determine, which objects could become members of
collections. It is therefore necessary to give all objects a way to invalidate
relevant indexes.
8.2
Indexes on functions that only depend on
the local state
For example some teachers could be represented by an object (figure 8.3)
This object has the same type as “aTeacher1”. But an index on “salary”
on a set including aTeacher2 would be invalidated if either salary per lesson
or lessons per month were invalidated.
In general an index can be invalidated by the update of one of many
variables in an object and many operations in the object can update these
variables. It would be very expensive to check for index-invalidation on each
CHAPTER 8. DISTRIBUTED METHOD INDEXING
114
object aTeacher2
const name : String
var salary per lesson : integer
var lessons per month : integer
export function salary → [sal : integer]
sal ← salary per lesson * lessons per month
end salary
end aTeacher2
Figure 8.3: Teacher type 2
variable-update—Each check takes several clock cycles1 and many updates
such as assignments to integer variables or arrays take only a couple of clockcycles. This means that programs that make many updates to variables in
objects, could run many times slower.
Not all programs make frequent updates of object variables. Typically
heavy computations update function variables or make use of recursion–i.e.,
only update the state of their own stack which does not require an index
invalidation check.
A more conservative approach would be at compile-time to determine
which functions read which variables and which operations in an object transitively update which variables. Then it can be deducted which operations
should invalidate indexes on which functions.
An even more conservative approach would be to let any update of a
variable in an object invalidate all indexes on all functions in the object.
8.3
Indexes on functions that only depend
on the local state and transitively immutable objects
Teachers can be divided into groups, that all have the same salary as in
figure 8.4.
salary table is an immutable object, shared by many teacher -objects.
When using the first conservative approach from section 8.2, this makes
1
On multi-scalar architectures such as the alpha architecture it could be possible to
add certain check-instructions at no cost. But we have not found a way to add a complete
check at no cost.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
115
object aTeacher3
var level : integer
export function salary → [sal : integer]
sal ← salary table.getMonthly[level]
end salary
end aTeacher3
Figure 8.4: Teacher, grouped by salary
it more difficult to determine which operations read which variables. I.e., in
the example, salary table may not have been implemented when aTeacher3 is
compiled. It is therefore not known if salary table.getMonthly[level ] actually
reads level .
8.4
Indexes on functions that depend on mutable objects
Suppose salary table is a mutable object (figure 8.5).
object salary table
field salaryUnit : integer
export function getMonthly[level : integer] → [monthly : integer]
if level = 1 then
monthly = SalaryUnit * 10
else if level = 2 then
monthly = SalaryUnit * 12
end if
end getMonthly
end salary table
object aTeacher4
var level : integer
export function salary → [sal : integer]
sal ← salary table.getMonthly[level]
end salary
end aTeacher4
Figure 8.5: Teacher object, depending on mutable salary object.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
116
An index on salary on a set including aTeacher4 will be invalidated if either
salary index in salary table or level in aTeacher4 is updated.
In general an index can be invalidated by updates in any of the objects
that were used to compute each key.
The first three kinds of indexes (section 8.1, 8.2, and 8.3) can be handled
with well-known indexing-techniques (see [BK89]). We call such functions
simple indexing functions.
The fourth, which we call complex indexing functions require more complicated and expensive techniques
8.5
Overhead
There are two major considerations with regard to overhead:
8.5.1
Non-database applications should not pay for database applications
When an object is compiled it is not known if it will ever be used in the
computation of an index. An overhead on invocations of operations in objects
will therefore apply to all objects. Because it is not acceptable for “nondatabase” applications to pay a large overhead, the overhead for objects that
have never been used in the computation of indexes, must be very small.
8.5.2
Objects with simple indexing functions should
not pay for objects with complex indexing functions
Collection-types are abstract types. This means that objects in the same
collection can have different implementations of the same functions.
We expect that most runtime invocations of indexed functions, will be
on simple indexing functions. But when creating an index, it cannot be
guaranteed that an object with a complex indexing function will not be
inserted into the index.
We therefore need an algorithm that performs almost as well as algorithms for indexing abstract data-types on indexes on collections that contain
only objects with simple indexing functions, but also allow complex indexing
functions. This can be obtained statically or dynamically.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
8.5.3
117
Static classification of indexes
For some indexes it is possible to determine that all indexed functions are
simple. It is therefore possible to use an efficient, well-known algorithm to
maintain the index. But if this cannot be determined, a more expensive,
general algorithm must be used.
One way of classifying indexes this way is to extend the type system to
also deal with simple versus complex functions. For example the Emerald
system uses a contra-variant type system. The subtyping relations require in
addition to the usual rules about arguments and parameters [Car97] that a
type declared immutable does not conform to a type not declared immutable.
The type system could be extended so that functions could be declared simple
and that a type T1 is a subtype of T2 only if for each simple function in T2 the
corresponding function in T1 is also simple. It can statically be determined
if “simple” declarations of functions are correct—I.e., if the body of local
functions only invokes local functions on object-variables of types that are
immutable.
This would make it possible to use traditional indexes for indexes on
local functions and general indexes on global functions. This approach does,
howwver, have several drawbacks:
1. The database is meant to be fully integratable with the application programming language. The distinction between local and global functions
does not seem useful outside the database system.
2. We would still like to have only a small overhead for indexes that contain only a few objects with global index-functions.
8.6
The Dependency graph
A dependency graph is a graph that stores information about which objects,
if updated, will invalidate which index entries. The index entries are represented by KeyVal objects. The two main issues are the logical structure of
the graph and the representation.
8.6.1
Structure
For each index entry there is a set of objects that can invalidate the entry.
Therefore the dependency graph could be a set of sets. The advantage is that
it takes only one invocation to start the index invalidation when an object
CHAPTER 8. DISTRIBUTED METHOD INDEXING
118
is updated. The disadvantage is that several index entries can be dependent
on the same object and this object can be dependent on other objects.
For example if N index entries are dependent on an object D which is
dependent on M other objects, this requires M × N edges in the graph.
If instead dependencies between objects are represented, the dependency
graph is a set of trees, called dependency trees, although there can be reference between the trees.
In the example above M + N edges would be required. The disadvantage
is that an overhead proportional to the depth of the tree is introduced for
invalidating an index entry.
We expect that dependency trees will be the most efficient, but we have
implemented both structures.
An example
Teachers teach a number of courses. A course can be taught by a number of
teachers. Figure 8.6 shows the representation of three teachers and 3 courses.
Teacher
Larsen
Course
101
Teacher
Jensen
Course
102
Teacher
Hansen
Course
103
Figure 8.6: Representation of teachers and courses
Teachers have a function close colleagues. When invoked on a teacherobject close colleagues return the number of other teachers that teach a
course that the teacher teaches. For example Larsen.close colleagues returns
one, because Jensen teaches course 102 along with Larsen.
The actual calls are show in figure 8.7.
The dependency graph is show in figure 8.8
Teachers also have a function colleagues. When invoked on a teacherobject colleagues returns the number of other teachers that transitively teach
a course that the teacher teaches. I.e., Larsen.close colleagues returns two,
CHAPTER 8. DISTRIBUTED METHOD INDEXING
Teacher
Larsen
Course
101
Teacher
Jensen
Course
102
119
Teacher
Hansen
Course
103
Figure 8.7: Calls in Larsen.close colleagues
Elem
Key
1
Teacher
Larsen
Course
101
Course
102
Figure 8.8: Dependency graph for close colleagues
because both Larsen and Jensen teach course 102 and Jensen and Hansen
teach course 103.
There are at least two possible representations of the dependency graph:
set-of-trees and set-of-sets. The set-of-trees representation is shown in figure 8.9.
Now suppose “Jensen.colleagues” is also inserted into the index. The new
set-of-trees representation is shown in figure 8.10.
Figure 8.10 shows that the set-of-tree representation can lead to cycles.
We can handle this in the following ways:
• Allow cycles in the dependency graph. When an object in the graph
is updated, a graph-traversal algorithm that works with cycles in the
CHAPTER 8. DISTRIBUTED METHOD INDEXING
elem
120
2
Teacher
Larsen
Course
101
Teacher
Jensen
Course
102
Course
103
Figure 8.9: Set-of-tree dependency graph
Elem Key
Teacher
Larsen
Course
101
Elem Key
Teacher
Hansen
Teacher
Jensen
Course
102
Course
103
The dashed lines are dependencies added when Jensen was added to the index.
Figure 8.10: Set-of-tree dependency graph for Larsen and Jensen
graph must be used. This can be done by marking each object in the
dependency graph with a unique mark. The unique mark (opposed
to just a mark bit) is necessary because we assume that multiple processes in the same transaction can manipulate the dependency graph
concurrently.
• Detect and prevent cycles when the graph is updated. When cycles are
detected the graph is restructured to avoid cycles. A simple algorithm
is: If a new edge from object A to object B introduces a cycle, replace
CHAPTER 8. DISTRIBUTED METHOD INDEXING
121
it with edges from A to all the parents of B. For each of these edges the
algorithm is used again.
This avoids cycles but introduces a new problem: Updates could generate deadlocks if transactions are performing the updates. But if transactions perform the updates, they can also resolve deadlocks.
The set-of-set representation does not lead to deadlock (figure 8.11).
Elem Key
Teacher
Larsen
Course
101
Elem Key
Teacher
Jensen
Course
102
Teacher
Hansen
Course
103
The dashed lines are
dependencies added when Jensen was added to the index.
Figure 8.11: Set-of-set dependency graph for Larsen and Jensen
8.7
Designing indexes
An index on a function is a pre-computation of a selection on the function
[Gra93]. I.e., a map: e→f(e) for all elements, “e” in the collection. The naive
solution is therefore that when an object “o” is inserted into a collection, for
each index, “f” on the collection, to insert the pair “(o,n(o))” into the indexmap representing the index, where “n(o)” is a reference to the position in
the index-data-structure for “f” for the key “f(o)”.
When an object “o” is updated this can be detected (e.g., using an
object-fault mechanism) [Jul88] and all positions in indexes representing the
“(o,f(o))” can be retrieved from the index-map, and all indexes can be updated.
We propose a design that is based on the following principles:
CHAPTER 8. DISTRIBUTED METHOD INDEXING
122
1. There should be a clean interface between the implementation of indexes, the underlying system and the runtime database-system
2. As much as possible should be written in a high-level object oriented
system.
The design is divided in these parts:
8.7.1
KeyVal objects
The KeyVal objects are the interface between the index implementations and
the rest of the system.
A KeyVal object has two fields: Key and Val. Both are references to
objects. The Val object is a member of a set on which an index is created.
Key is the result of the indexing function invoked on Val. There is a special
facility that allows index implementations to signal to the underlying system,
that they are computing Key. KeyVal objects also implement an update
operation. update operations on KeyVal objects are invoked “magically” by
the underlying system and means that the Key’s might not be valid.
8.7.2
The DB-bit
The compiler generates code so that all mutable objects have a “DB” bit,
initially set to false. While a process/thread is computing a Key in a KeyVal
object, all mutable objects accessed by this process/thread, get their DB-bit
set to true. A relevant object is inserted into the object’s updatevector .
In the Emerald system, when a key is being computed, the ElemKey objects call the kernel before and after the computation of the key. The kernel
registers if a process is currently generating a key, in a process specific variable, called the computingKey variable, in the stack-segments of the process.
At the end of each monitored function (not operation), computingKey
is checked. If a key is being computed, an indexDependency kernel call is
executed. computingIndex inserts the invoked object into the dependency
graph (see section 8.6.1).
This causes overhead on invocation of functions in monitors. The overhead is three alpha instructions.
For certain structures of the dependency tree computingKey is also checked
at the start of each monitored function. This is necessary to build treestructures and avoid cycles.
This causes overhead on invocations of functions in monitors. The overhead is three alpha instructions.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
123
At the end of each monitored operation (not function), DBbit is checked.
If it is set, a indexUpdate kernel call is executed. indexUpdate executes the
update operation in the invoked object. This causes overhead on invocations
of functions in monitors. The overhead is three alpha instructions.
8.7.3
Update-objects and update-vectors
In the database version of the Emerald system all mutable objects have
an update-operation and an update-vector . An updatevector is a vector of
update-objects. Update-objects are objects that conform to the type:
type bupd
operation update
end bupd
The update operation invokes update on all objects in the updatevector .
updatevector is initially nil. It is allocated by the kernel the first time an
object is inserted. If it overflows, it is expanded by the kernel.
When an operation in the monitor of an object is invoked, the DB-bit is
checked. If the DB-bit is set, the update operation is invoked. The system
of update-vectors forms a set of trees with KeyVal objects as roots. KeyVal
objects already have an update operation that is invoked.
8.8
Deletions
When an object is deleted from a collection, the object’s entries in all indexes
on the collection is deleted. But the KeyVal object is still there.
If an index key is invalidated and recomputed, a new KeyVal object is
generated, but the old one is still there. If instead the Key and Val were
updated in the original KeyVal object it could result in false invalidations.
A KeyVal object that is not in use is still the root of a dependency tree.
This is problematic for two reasons:
• We cannot rely on garbage collection, because the leaves in the dependency graph could be alive.
• If an object in an unused dependency graph is updated, resources are
wasted.
We handle these problems by lazy deletion (see figure 8.14). All update
methods return a boolean. It is true if the dependency tree is in use. Update
operations on KeyVal objects return false if the object is not in use.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
124
Update operations on other objects call update on all objects in the updatevector . Objects whose update-invocations return false are deleted from
the updatevector . If all update-invocations return true (the updatevector is
empty), the update operation returns false.
8.9
Synchronization
Method Indexing is designed for use in database systems with concurrent
transactions.
8.9.1
One process, one transaction
When an object in the dependency graph is updated, the index is updated
before the execution of the transaction continues.
If the re-computation of a key involves a query that uses the invalidated
index, inconsistency could occur.
8.9.2
Different transactions
Deadlocks can occur. They are handled by the transaction system (6.4.1).
8.9.3
Multiple processes, one transaction
If a transaction has multiple processes that concurrently update data and
query the same data using indexes then extra synchronization is needed.
The reason is that indexes are redundant. Method indexing does not require
extra synchronization compared to traditional value-indexing.
8.10
Performance
Overhead for non-DB invocations
Monitored function invocation: 3/6 alpha instructions, 3 on entry if dependency tree, and 3 on exit.
Measured: 36/70% for dummy global invocations
Monitored operation invocation
3 alpha instructions on exit.
Measured: 40% for dummy global invocations
Update-vectors are only necessary in mutable objects that have functions.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
Index1
Elem Key
125
Index2
Elem Key
Figure 8.12: A dependency graph
Objects that have been used to compute a “key” transitively reference “KeyVal”
objects.
8.11
Concurrency control and indexes
When transactions update data in the database, the relevant indexes are also
updated. Therefore access to indexes must be controlled in the same way as
real (not derived) data.
In traditional database systems all updates are performed by the database
system. Updates are typically high-level update operations, that updates
many records—E.g., the UPDATE operation in SQL.
The DBMS handles both indexing and concurrency control. To improve
efficiency, extra facilities are often provided. I.e., an update operation in
a transaction can lock a whole index or part of an index (an interval in a
hash-table or a subtree in a B-tree implementation).
CHAPTER 8. DISTRIBUTED METHOD INDEXING
Index1
Elem
Key
126
Index2
Elem Key
Figure 8.13: A dependency graph with DB-flags Update-vectors
In the Emerald database, objects are updated individually. This means
that there is no advantage of handling indexes as a special case.
Instead indexes are built on top of concurrency control (see chapter 6).
In the B-tree implementation each node in the tree is implemented as an
object with functions/operations to search in its sub-tree, delete elements
etc. Transactions that search using an index will therefore get read-locks
on parts of the index because the search functions are functions and not
operations. Insertions and deletions can lead to restructuring of a part of the
tree. Transactions that insert or delete, acquire write-locks on the part of
the tree that is restructured, because they perform operations on the nodeobjects. This can lead to deadlocks, that are handled by the transaction
system.
The advantage is that insertions and deletions in multiple transaction can
CHAPTER 8. DISTRIBUTED METHOD INDEXING
127
Index1
Elem Key
Elem Key
U
Update
Figure 8.14: A lazy deletion
be executed concurrently. It does pose some restrictions on the implementation of indexes and collections For example, it is not possible to keep track
of the number of objects in a collection without counting them each time the
number is used. A count variable in the root node that is incremented on
each insertion would reduce the level concurrency to that of a readers-writers
protocol.
We expect that it would be possible to avoid this type of deadlock without
reducing concurrency by altering the tree-operations to always write (invoke
an operation on) parent nodes before writing their child nodes. If the tree is
distributed, this could lead to more communication.
Updates can be performed by deleting the entry from the tree, updating
the key and inserting the index into the tree. More efficient algorithms exist
CHAPTER 8. DISTRIBUTED METHOD INDEXING
128
but they can also lead to deadlocks.
Although objects in the database can be updated without using database
facilities, the runtime system might recognize a query as an update of a
number of objects. The query generator could then apply a more coursegrained concurrency control, e.g. by generating code that acquires write-locks
on the relevant indexes before executing the rest of the update. Such writelocks on indexes can be obtained by invoking a dummy monitored operation
on the root node of a tree implementation or on the object that is the interface
to the index.
8.11.1
Using the index facilities without using transactions
If the indexing facilities are used with ordinary processes or multiple processes
in the same transaction, an alternative index implementation must be used.
We call these indexes, serial indexes. Serial indexes implement a readerswriters protocol. This limits concurrency but guarantees the consistency of
the B-trees.
It also guarantees the consistency of queries on indexes. For instance, if
we have a teacher t1 and a collection of teachers, teachers with an index on
salary the scenario in figure 8.15 could occur:
Time Process 1
Process 2
1 t1.salary ← 18000
2 t1.salary ← 20000
3
s ← teacher1.salary
4
edb.query[SELECT[teachers\t WHERE
t.salary > 19000], . . . ]
Figure 8.15: Indexes without transactions
Process 2 expects the result of the query to include t1 , because it already
knows the result of the latest assignment to t1 . After the two assignments
at time 1 and 2, the index on salary must be updated. The index is updated
by Process 1. While Process 1 performs the update of the index, it holds the
monitor of t1 . Therefore at time 3, Process 2 cannot assign the new salary
to s before the index is updated and the result of the query at time 4 will
include t1.
There are four combinations of processes and transactions (figure 8.16)
CHAPTER 8. DISTRIBUTED METHOD INDEXING
Ordinary processes
Ordinary processes and transactions
Processes in different transactions
Processes of which some are in the
same transaction
129
works with serial collections
impossible
ok
works with serial collections
Figure 8.16: Indexing and transactions
8.12
Optimization of the invalidation scheme
The simplest way to handle the invalidation of an index entry is to delete
the entry, compute the new key and insert the new (object,key) entry into
the index. For certain index implementations (trees) efficient algorithms for
updates of keys exist.
A natural optimization is to compute the new key and do nothing more
if the new key is identical to the old key.
A more radical solution is to just mark the index entry as invalid. When
the entry is used, the entry key must be computed and the entry relocated
in the index. For typical queries such as “SELECT[Collection
c WHERE c.indexFunction=42]” all invalid entries must be recomputed and
relocated for each query. For other queries (e.g. membership and existence
queries) some or all of the invalid entries might not have to be recomputed.
The advantages are:
1. More efficient algorithms can be used if a substantial fraction of the
index entries are invalid.
In traditional database systems updates are performed in special updateoperations that each can update many objects. The system can then
validate the relevant indexes based on all the updates of an updateoperation. In our system updates can be performed on one object at
a time. Lazy invalidation is a way to take advantage of the index
invalidation techniques used in traditional systems.
2. The integration of application and database programs, means that application programs might use object variables as temporary variables.
These variables could be used by an indexed function. If such an object
variable were updated in a loop and no database queries were initiated
from the loop, then lazy invalidation could reduce overhead.
CHAPTER 8. DISTRIBUTED METHOD INDEXING
130
The disadvantage is that it complicates concurrency control. If a process
in a transaction updates an object and this leads to an invalidation of an
index entry, then the transaction must perform a write to set the invalidation
mark. If a process in a second transaction performs a query that uses this
index, this transaction must perform the computation of the new key and
the relocation of the index entry. With our model for concurrency control of
indexes, the second transaction would block and wait for first to complete.
Chapter 9
Evaluation and Conclusion
Our goal was to propose a model for integrating advanced database facilities
into existing object-oriented systems that supports mobility as to provide a
system that fully integrates applications and database operations.
To show our thesis we have implemented a working prototype.
We have demonstrated that a database query language can be fully integrated in an object-oriented language and runtime system. In particular we
demonstrated that it can be done in a type-safe way.
We provided a model for transactions with mobile objects and implemented a working prototype with a low overhead for applications that do
not use transactions.
We provided a model and working prototype for indexing methods on
mobile objects.
9.1
Distributed Method Indexing
We have proposed a model for Distributed Method Indexing. We have shown
how to integrate Distributed Method Indexing into existing object-oriented
systems.
In our model, Distributed Method Indexing is orthogonal to
• Transactions
• Queries
• Persistence
We have implemented a prototype that demonstrates that Distributed
Method Indexing can be implemented efficiently:
131
CHAPTER 9. EVALUATION AND CONCLUSION
132
• The overhead is low for applications that do not take advantage of
Distributed Method Indexing—Even for non-persistent applications.
• It is possible to use Distributed Method Indexing to implement certain
tasks in a simple and efficient way.
• Is is possible to create distributed method indexes that will make the
the whole system perform poorly—E.g. by indexing objects that are
dependent on objects that are updated often.
This can happen because in an object-oriented system, the value of an
expression can depend on any number of objects, whereas in a traditional RDBMS, where sets of attributes are indexed, an update to one
attribute in one tuple in one relation will only invalidate indexes on
that relation and only indexes that include that attribute.
This means that in an object-oriented database system with Distributed
Method Indexing great care must be taken when adding indexes. But
such “bad” indexes would not have been possible in a relational system
or in an OODBMS without Distributed Method Indexing.
9.2
Distributed Transactions
We have proposed a model for distributed transaction for systems with finegrained mobile objects. The model is based on two phase locking, two phase
commit, and Chandy and Misra’s dead-lock detection algorithm.
We have implemented a prototype that supports distributed transactions.
9.3
Queries
We have proposed a model for queries in an object-oriented system. The
model integrates an object oriented query language and an object oriented
programming language.
9.4
Performance Overheads
We started (page 1) by saying that if an application does not make use of
a facility, our system should not incur a significant overhead compared to a
system that did not support that facility.
We will discuss the overheads that we introduced:
CHAPTER 9. EVALUATION AND CONCLUSION
133
One language, Emerald All applications have to be written in Emerald.
We use the version of Emerald that compiles programs to machine code
(as opposed to the version that compiles to byte code). This version is
fairly efficient. But because Emerald is a research project and this is a
proof-of-concept project, the code optimizations of Emerald compiler
is of lower quality than e.g. widespread C-compilers.
The choice of Emerald did however introduce two fundamental performance overheads:
• Different applications can be written optimally in different systems. E.g. many Fortran applications will perform less if written
in any other language. Therefore requiring all applications in the
system to be written in the same language does mean that some
applications will perform less than optimal. If the applications
were written in different languages they would, however, incur a
performance overhead when communicating.
• To support mobility the Emerald compiler generates code that for
each invocation checks if the invoked object is local
If the compiler can determine that the called object is always local,
the code for the check is not generated. It is therefore possible to
write a non-distributed application and not incur the overhead of
these checks.
Our libraries and the part of our system written in Emerald can,
however, make use of mobility and therefore contains the invocation checks. This means that an application that uses indexes
or queries, but not mobility, incurs an overhead in the form of
invocation-checks.
Transactions Support for transactions adds an overhead for all invocations
on operations/functions in monitors because we need to add code that
checks if the current process is part of a transactions. On the Alpha
architecture the overhead is two instructions.
Persistence Persistence adds no significant overhead to the system because
it uses existing facilities for mobility.
Method indexing Adds an overhead of a few instructions on invocation of
functions in monitors and a few more if building trees. The overhead
on operations in monitors are a few instructions.
CHAPTER 9. EVALUATION AND CONCLUSION
134
In addition to the overhead on invocations, the size of monitor and stacksegments is increaced by ca 20 bytes by both the support for transactions
and method indexing.
9.5
Future work
To demonstrate the performance of the prototype extensive performance
measurements on modern equipment is needed.
To evaluate the usability of the proposed models we would in particular
investigate:
• Caching and I/O scheduling for persistence.
• The use of transactions and non-transactions processes in the same
application.
• The use of mobility in queries and query optimization.
• The use of method indexing.
Since the start of this project, more work has been done on virtual machines, reflection, persistence, and program loading. An obvious area for
future work is to implement our models on more modern and widespread
platforms, such as Java and .Net.
Bibliography
[ABC+ 83]
M. P. Atkinson, P. J. Baley, K. J. Chisholm, P. W. Cockshot,
and R. Morrison. An approach to persistent programming. The
computer journal, 26(4):360–365, 1983.
[AD95]
Rakesh Agrawal and David J. Dewi. Integrated concurrency
control and recovery mechanisms: design and performance evaluation. ACM Transactions on Database Systems, 10(4):529–564,
December 1995.
[ADJ+ 96]
Malcolm P. Atkinson, Laurent Daynes, Mick J. Jordan, Tony
Printezis, and Susan Spence. An orthogonally persistent java.
SIGMOD Record, 25(4):68–75, 1996.
[BDrMN73] Graham M. Birtwistle, Ole-Johan Dahl, Bjørn Myhrhaug, and
Kristian Nygaard. SIMULA BEGIN. Studentlitteratur, Lund,
Sweden, 1973. Published in the U.S.A. by Auerbach Publishers
Inc., Philadelphia, PA.
[BG81]
Philip A. Bernstein and Nathan Goodman. Concurrency control in distributed database systems. ACM Computing Surveys,
13(2):185–221, June 1981.
[BHJ+ 87]
Andrew Black, Norman Hutchinson, Eric Jul, Henry Levy, and
Larry Carter. Distribution and abstract types in Emerald.
IEEE Transactions on Software Engineering, 13(1):65–76, January 1987.
[BK89]
Elisa Bertino and Won Kim. Indexing techniques for queries
on nested objects. IEEE Transactions on knowledge and data
engineering, 1(2):196–225, June 1989.
[CAI+ 81]
Donald D. Chamberlin, Morton M. Astrahan, W. Frank King
III, Raymond A. Lorie, James W. Mehl, Thomas G. Price, Mario
Schkolnick, Patricia G. Selinger, Donald R. Slutz, Bradford W.
135
BIBLIOGRAPHY
136
Wade, and Robert A. Yost. Support for repetitive transactions
and ad hoc queries in system r. ACM Trans. Database Syst.,
6(1):70–94, 1981.
[Car97]
Luca Cardelli. Type systems. In Allen B. Tucker, editor, The
Computer Science and Engineering Handbook. CRC Press, Boca
Raton, FL, 1997.
[CR05a]
William R. Cook and Siddhartha Rai. Safe query objects: statically typed objects as remotely executable queries. In ICSE
’05: Proceedings of the 27th international conference on Software engineering, pages 97–106, New York, NY, USA, 2005.
ACM Press.
[CR05b]
William R. Cook and Carl Rosenberger. Native queries for persistent objects. Technical report, The University of Texas at
Austin and db4objects Inc., August 2005.
[Dat90]
C. J. Date. An Introduction to Database Systems, volume I.
Addison-Wesley, 5 edition, 1990.
[DD85]
James Donahue and Alan Demers. Data types are values.
ACM Transactions on Programming Languages and Systems,
7(3):426–445, July 1985.
[Fis95]
Jörg Fischer. Query optimization for the emerald database.
Master’s thesis, T U Braunschweig and Københavns Universitet,
1995. Diplomarbeit and Speciale.
[GR83]
Adele Goldberg and David Robson. Smalltalk-80: the language
and its implementation. Addison-Wesley Publishing Company,
Reading, Massachusetts, 1983.
[GR93]
Jim Gray and Andreas Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.
[Gra93]
Goetz Graefe. Query evaluation techniques for large databases.
ACM Computing Surveys, 25(2):73–170, June 1993.
[Hoa74]
C. A. R. Hoare. Monitors: an operating system structuring
concept. Commun. ACM, 17(10):549–557, 1974.
[HR83]
T. Haerder and A. Reuter. Principles of transaction-oriented
database recovery. ACM Computing Surveys, 15(4):287–317,
December 1983.
BIBLIOGRAPHY
137
[HRB+ 87]
Norman C. Hutchinson, Rajendra K. Raj, Andrew P. Black,
Henry M. Levy, and Eric Jul. The Emerald Programming Language Report. Technical Report 87-10-07, Department of Computer Science, University of Washington, Seattle, Washington,
October 1987. Also available as DIKU Report no. 87/22, Dept.
of Computer Science, University of Copenhagen, Copenhagen,
Denmark and as TR no. 87-29, Dept. of Computer Science, University of Arizona, Tucson, Arizona. (Revised August 1988).
[Hut87a]
Norman C. Hutchinson. Emerald: A Language to Support Distributed Programming. In Mario R. Barbacci, editor, Proceedings from the Second Workshop on Large-Grained Parallelism,
pages 45–47. Carnegie-Mellon University Software Engineering
Institute, Pittsburgh, PA, November 1987. This appears in Special Report CMU/SEI-87-SR-5.
[Hut87b]
Norman C. Hutchinson. Emerald: An Object-Based Language
for Distributed Programming. PhD thesis, Department of Computer Science, University of Washington, Seattle, Washington,
January 1987. Technical Report 87-01-01.
[IBM]
IBM. DB2 Application Development Concepts.
[JLHB88]
Eric Jul, Henry Levy, Norman Hutchinson, and Andrew Black.
Fine-Grained Mobility in the Emerald System. ACM Transactions on Computer Systems, 6(1):109–133, February 1988.
An extended abstract appeared in Proceedings of the Eleventh
ACM Symposium on Operating System Principles.
[Jul88]
Eric Jul. Object Mobility in a Distributed Object-Oriented System. PhD thesis, Department of Computer Science, University
of Washington, Seattle, Washington, 1988. UW Technical Report no. 88-12-6, also DIKU report 89/1.
[KKD89]
Won Kim, Kyung-Chang Kim, and Alfred Dale. Indexing techniques for obejct-oriented databases. In Object-Oriented Concepts, Databases, and Applications, chapter 15. ACM press,
1989.
[Kna87]
Edgar Knapp. Deadlock detection in distributed databases.
ACM Computing Surveys, 19(4):303–328, December 1987.
[Lar92a]
Niels Elgaard Larsen. An object-oriented database in emerald.
Master’s thesis, DIKU, 1992.
BIBLIOGRAPHY
138
[Lar92b]
Niels Elgaard Larsen. An object-oriented database in Emerald. Master’s thesis, DIKU, Department of Computer Science,
University of Copenhagen, 1992.
[Lis88]
Barbara Liskov. Distributed programming in ARGUS. Communications of the ACM, 31(3):301–312, March 1988.
[Mos86]
J. Eliott B. Moss. An introduction to nested transactions. Technical Report 80-41, COINS, sept 1986.
[Nau60]
Peter Naur. Report on the algorithmic language ALGOL 60.
Communications of the ACM, 3(5):299–314, May 1960.
[ORW98]
Martin Odersky, Enno Runne, and Philip Wadler. Two ways to
bake your pizza - translating parameterised types into java. In
Generic Programming, pages 114–132, 1998.
[RSPML78] Daniel J. Rosenkrantz, Richard E. Stearns, and II Philip
M. Lewis. System level concurrency control for distributed database systems. ACM Trans. Database Syst., 3(2):178–198, 1978.
[RTL+ 91]
Rajendra K. Raj, Ewan D. Tempero, Henry M. Levy, Andrew P.
Black, Norman C. Hutchinson, and Eric Jul. Emerald: A
general-purpose programming language. Softw., Pract. Exper.,
21(1):91–118, 1991.
[SCB+ 86]
Craig Schaffert, Topher Cooper, Bruce Bullis, Mike Kilian, and
Carrie Wilpolt. An introduction to Trellis/Owl. In Proceedings
of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 17–29, October 1986.
ACM SIGPLAN Notices 21(11):9-16, November 1986.
[WRS]
S. Wheater, F. Ranno, and S. Shrivastava. A corba compliant
transactional workflow system for internet applications.
[YJM91]
John Yen, Hsiao-Lei Juang, and Robert MacGregor. Using
polymorphism to improve expert system maintainability. IEEE
expert: intelligent systems and their applications, 6(2):48–55,
1991.
[ZM90]
Stanley B. Zdonik and David Maier, editors. Readings in ObjectOriented Database Systems. Morgan Kaufmann, 1990.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement