# Vector Time and Causality among Abstract Events in Distributed Computations Twan Basten

Vector Time and Causality among Abstract Events in Distributed Computations ? Twan Basten1, Thomas Kunz2, James P. Black2, Michael H. Con2, and David J. Taylor2 Department of Mathematics and Computing Science, Eindhoven University of Technology, Eindhoven, The Netherlands 2 Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada 1 Abstract An important problem in analyzing distributed computations is the amount of information. In event-based models, even for simple applications, the number of events is large and the causal structure is complex. Event abstraction can be used to reduce the apparent complexity of a distributed computation. This paper discusses one important aspect of event abstraction: causality among abstract events. Following Lamport [24], two causality relations are dened on abstract events, called weak and strong precedence. A general theoretical framework based on logical vector time is developed in which several meaningful timestamps for abstract events are derived. These timestamps can be used to eciently determine causal relationships between arbitrary abstract events. The class of convex abstract events is identied as a subclass of abstract events that is general enough to be widely applicable and restricted enough to simplify timestamping schemes used for characterizing weak precedence. We explain why such a simplication seems not possible for strong precedence. Key words: Distributed systems { Event abstraction { Causality { Precedence relation { Partial order { Vector time { Logical time 1 Introduction A distributed application consists of a number of autonomous sequential processes, cooperating to achieve a common goal. Cooperation includes both communication and synchronization, and is achieved by exchanging messages. A distributed computation is modeled as a set of events. An event represents some activity performed by some process and is considered to take place at an instant in time. Typically, the lowest-level observable events, or primitive events, are computations local to processes and interprocess-communication events. What is important in an event-based view of distributed computations is how events are causally related to each other. Causality can be expressed in terms of precedence. Sending a message, for example, always precedes receiving the message. However, sending a message might be unrelated to a write action on a local le in another process. Neither event precedes the other and they are said to be concurrent. In [23], Lamport argues that causality among primitive events is a partial order. To determine causal relationships between events, logical-timestamp schemes have been proposed [16, 23, 27]. Logical time has been used for many dierent purposes: implementing causal broadcasts [9], measuring concurrency [10], detecting global predicates [14, 26], implementing distributed breakpoints [4, 19], computing consistent global snapshots [27], and visualizing program ? This work was supported in part by the Natural Sciences and Engineering Research Council of Canada. 1 behavior [32]. A good starting point for an introduction to several of these issues that play an important role in distributed computing is [3]. Experience shows that even for simple distributed applications, the amount of behavioral information is very large, and the causality structure is very complex. It is well known that human beings have diculties managing too much information at once. Therefore, in analyzing distributed applications, it is desirable to reduce the amount of information that must be considered at a single point in time and, thus, to reduce the apparent complexity of a computation. A powerful way to achieve such a reduction is abstraction. This paper focuses on one type of abstraction, namely event abstraction. Primitive events are grouped together into high-level abstract events, hiding their internal structure and creating an abstract view of the computation. Given a hierarchy of abstract views of program behavior, a distributed application can be analyzed at dierent levels of abstraction. As Schwarz and Mattern have observed [30], to date, there has been no sound treatment in the literature of causality and logical time for arbitrary abstract events. Therefore, in this paper, we study vector time, which is one particular type of logical time, and causality among abstract events. Our goal is to present a general theoretical framework that is useful for a wide variety of applications using vector time. Following Lamport [24], two precedence relations on abstract events are dened, called weak and strong precedence. Together they capture all important aspects of causality among abstract events. The main contribution of this paper is that timestamp schemes and accompanying precedence tests are derived to eciently determine causal relationships between abstract events. Each timestamp scheme is formally proven correct. Using timestamps, program behavior can be visualized at any level of abstraction while faithfully depicting causal relations between abstract events. The main motivation for this paper comes from the area of analyzing and debugging distributed programs, in particular, visualizing abstract views of distributed computations. Although we do not think that the theoretical framework presented in this paper is restricted to this particular application, Section 2 discusses some results achieved in this area to show the practical applicability of the theory. The remainder of the paper is organized as follows. Section 3 presents a formal model of distributed computations and recalls some basic denitions and results about logical vector time. Section 4 discusses causality among abstract events. The weak and strong precedence relations on abstract events are dened. Section 5 explains the basic issues that are important when timestamping abstract events. In general, at least two timestamps are necessary to characterize precedence relations between arbitrary abstract events. In Section 6, two timestamps and two precedence tests that characterize strong precedence among arbitrary abstract events are derived. Section 7 gives timestamps and precedence tests for determining weak precedence relations among abstract events. Section 8 deals with an important subclass of abstract events, called convex abstract events. It is shown that for this class of abstract events, a single timestamp is sucient to characterize weak precedence relations. Unfortunately, no such result seems to exist for the strong precedence relation. Finally, Section 9 summarizes the results. 2 Motivating Example The research presented in this paper originated in the area of monitoring, analyzing, and debugging distributed applications, in particular, the visualization of the causality structure of a distributed computation. For this purpose, an event-based representation of distributed computations is most convenient, whereas for other purposes, such as for example distributed-predicate detection, a statebased representation is more appropriate. We do not discuss the advantages and disadvantages of both representations in detail. In [17], timestamps and causality relations are dened for a state2 based representation of computations. The two representations yield very similar formulas. The results presented in this paper can easily be translated to a state-based representation of distributed computations. Our approach to debugging and analyzing distributed applications is one of post-mortem analysis, which conceptually consists of the following three steps. First, a minimum amount of event information is collected with the least possible perturbation of the distributed computation. Second, vector timestamps for the collected events are calculated separately. Finally, the causal relationships between events in the computation can be visualized and analyzed. This approach guarantees that the program behavior is inuenced as little as possible by the monitoring and analysis process. In the actual debugger [32], steps are not as clearly separated as described above. Only timestamps needed for visualizing the part of the computation under consideration are calculated. A checkpoint mechanism is used to allow a fast reconstruction of timestamps for other parts of the computation when needed. One of the main features of the debugger is that it allows the user to construct abstract visualizations of program behavior consisting of abstract events and causal relations between these events. For this purpose, it is important to have a faithful representation of causal relations among abstract events as well as an ecient way of determining and visualizing such relations. The remainder of this paper studies these two aspects in more detail. First, however, we show the abstract visualization of a small distributed computation to motivate the use of event abstraction and to show an actual implementation of the results presented in this paper. The computation visualized here is an execution of the boundedbuffer application described in the Hermes tutorial [31]. It implements a simple bounded buer for text strings and conceptually consists of two processes. Process boundedbuffer implements the bounded buer, and process bbintf provides a line-oriented user interface to the bounded buer. In the sample computation, one string is put into the buer, fetched from the buer and displayed, after which the computation is terminated. During the execution, four additional processes are created and a number of processes within the Hermes runtime system are used. The execution creates an event le containing 1874 primitive events from a total of 126 processes. (The rst 120 processes form the standard Hermes runtime system.) Figure 1: The boundedbuffer computation. Figure 1 depicts part of the sample computation, using the display provided by the original Hermes debugger [32]. The display is similar to a standard process-time diagram with slightly more 3 information about an execution, such as event types (indicated by the symbol used to draw a primitive event) and an approximation of the process states (indicated by the line style). Since even for this small example the numbers of processes and events are already very large, Figure 1 shows only a subset of all processes and events, namely that part of the computation where a string is entered and put into the buer. Figure 2 shows a visualization of the computation that starts with the same events as the ones shown in Figure 1 at an intermediate event-abstraction level. In general, abstract events contain primitive events from dierent processes. We therefore chose the following representation to visualize abstract events. An abstract event is depicted by an open, vertical rectangle, stretching over the range of all processes involved. The intersection of this rectangle with a process is drawn as a lled square if primitive events from this process are constituents of the abstract event (see also [21]). Figure 2: An intermediate event-abstraction view of the boundedbuffer computation. The rst abstract event corresponds to the action \put a string into the buer." The second abstract event corresponds to getting the next command from the user. The third abstract event fetches a string from the buer and displays it. The fourth obtains the next user command, and the fth abstract event contains the termination activities. Because of the use of event abstraction, Figure 2 depicts a much larger sequence of the execution history than Figure 1. Moreover, it depicts the computation in meaningful \units of work," facilitating understanding the program behavior. In Figure 2, two abstract events are connected if and only if the leftmost event weakly precedes the rightmost event. As explained in more detail in Section 4, this means that part of the rst abstract event causally aects part of the second event. In our opinion, such causal relationships are very important in debugging distributed applications, which is conrmed by, for example, [20]. The abstract visualization shown in Figure 2 is built using the timestamp scheme of Section 8. The above example was deliberately kept simple. However, we have traced distributed applications that generate many thousands of events and built event-abstraction hierarchies with many hundreds of abstract events. We are currently in the process of applying the visualization tool to long-running distributed applications, exploring issues such as managing the growth of the trace les. A more detailed explanation of the current state of the implementation of our debugger for distributed programs can be found in [22]. 4 3 Basic Denitions and Results In this paper, a distributed system is a collection of many loosely-coupled machines. These machines do not share any system resources and are only connected by a communication network. Communication channels may be lossy and delivery order may or may not be guaranteed. A distributed application is a set of independent, cooperating program modules. Information is exchanged only by message passing. Both synchronous and asynchronous communication are allowed. It is assumed that communication is point-to-point. However, it is straightforward to extend the results to multicast and broadcast schemes. At runtime, program modules are instantiated as processes which do not share memory. For the sake of simplicity, it is assumed that the number of processes is xed and known in advance. Each process performs a local computation. A distributed computation is the collection of all local computations. 3.1 Distributed Computations The model of distributed computations used in this paper is based on the notion of primitive events. Primitive events are considered to be atomic. Therefore, a primitive event is modeled as if it occurred instantaneously. A distributed computation is a pair (E; ), where E is the set of primitive events and is an irreexive partial order on the set of events that models causal precedence. The detailed model given in the remainder of this subsection is based mainly on previous denitions of Mattern [27, 28], Charron-Bost [10, 11, 12], and Fidge [16]. Their denitions are, in turn, based on the \happens before" relation introduced by Lamport [23]. The set of primitive events E is the union of N mutually disjoint sets of events, E0; : : :; EN ,1, where N is the number of processes. Each of these sets represents a local computation. It is assumed that E is nite. This is not a real restriction since this paper discusses timestamp schemes and in practice, only nite (prexes of) computations can be timestamped. The set of process identiers, f0; : : :; N , 1g, is denoted P . As mentioned, both synchronous and asynchronous communication are allowed. Every communication is modeled by a send event and a corresponding receive event. The sets of send and receive events are denoted by S and R respectively. The two sets are disjoint subsets of the set of events E . A relation , S R relates send events to receive events. This relation is left- and right-unique. Furthermore, it is required that for every receive event in R, there is a corresponding send event in S . The absence of the converse condition means that messages might be lost or might still be in transit. A subset of ,, ,s , denotes the set of synchronous message communications. For every i 2 P , the set Ei is totally ordered by a relation i . This models the fact that processes are sequential. The relation l is dened as the union of all i . It expresses the local ordering of events. The precedence relation that models the causal ordering of events is dened as the smallest transitive relation that satises the following two conditions. C1 The relation l [ , is a subset of . C2 For every (s; r) 2 ,s and e 2 E n fs; rg, e s , e r and s e , r e. The denitions given so far model a distributed computation if and only if the precedence relation is an irreexive partial order. The relation extends the \happens before" relation as dened by Lamport [23] to synchronous communication in a natural way. Condition C2, originally given by Fidge [16], means that a synchronous communication can be interpreted as if it occurred atomically. That is, no other events 5 can occur causally between the two events participating in a synchronous communication. Distinguishing a send and a receive event such that the send event precedes the corresponding receive conforms to physical reality: a synchronous communication is initiated by one process and received by the other after a small but non-zero delay. We prefer this model of synchronous communication over another model of synchronous communication in the literature [12, 13, 16], which models a synchronous communication as two unrelated events; this model has the disadvantage that a synchronous communication cannot be distinguished from a pair of concurrent events. Let denote the reexive closure of the precedence relation . The relation can be used to express concurrency among events. Two events e0 ; e1 2 E are concurrent if and only if e0 6 e1 and e1 6 e0. That is, two events are concurrent if and only if they are unrelated by (the reexive closure of) the precedence relation. Using the denitions above, it is possible to formalize the notion of cuts. A cut is the eventbased equivalent of a global state. Formalizing the notion of cuts is useful to better understand the causality structure of a distributed computation. The following denitions and theorems are due to Mattern [27]. Denition 3.1. (Cut) A set C E is called a cut of E if and only if for all events e0 2 C and e1 2 E , e1 l e0 ) e1 2 C . A cut is said to be left-closed under l. The set of all cuts is denoted by Cl . Theorem 3.2. (Structure of cuts) The set of all cuts of a distributed computation, with the ordering dened by the subset relation , is a complete lattice. The inmum and supremum of sets of cuts are dened by set intersection and set union respectively. In distributed computing, the subset of consistent cuts is of particular interest. Consistent cuts characterize the set of global states that might actually occur during a distributed computation. Denition 3.3. (Consistent cut) A set C E is called a consistent cut of E if and only if for all events e0 2 C and e1 2 E , e1 e0 ) e1 2 C . A consistent cut is left-closed under . The set of all consistent cuts of a distributed computation is denoted by C . Theorem 3.4. (Structure of consistent cuts) The set of consistent cuts, with the ordering dened by , is a complete lattice. 3.2 Vector Time For many applications in distributed computing, it is useful to have a characterization of causality. Since the precedence relation is a partial order, it is not possible to use physical time or any other totally ordered set as a characterization. For this reason, Mattern [27] and Fidge [16] independently introduced partially ordered vector time. Vector time extends the idea of logical clocks introduced by Lamport [23]. In this subsection, we summarize some denitions and results given by Mattern [27, 28] and Schwarz and Mattern [30]. (Note that [28] is a revised version of [27]. The reason for mentioning it here is its clear presentation; it also contains some results which are not present in [27].) The denitions and results given in this subsection form the theoretical framework which is needed to prove the correctness of the timestamp schemes and accompanying precedence tests for abstract events given in Sections 6 through 8. The intuition of vector timestamps can be best explained using the reexive variant of the precedence relation . An event e0 causally precedes another event e1 if and only if all predecessors of e0 6 are also predecessors of e1 , where predecessors are dened by means of . That is, e0 precedes e1 if and only if the cut containing all predecessors of e0 is a subset of the cut containing all predecessors of e1 . The idea behind timestamps is to associate with each event e a value T:e, the timestamp of e, and, in addition, to dene a relation on timestamps in such a way as to ensure that for any e0; e1 2 E , e0 e1 , T:e0 T:e1. The intent is to make relatively inexpensive to calculate, thus avoiding expensive set-inclusion calculations. Figure 3 illustrates this interpretation of precedence between two events. Denitions of the function p, which denes the cut containing all predecessors of an event, and T , the timestamp of an event, are given below. Event e0 precedes event e1 since all the predecessors of e0 are also predecessors of e1 . pe T:e0 0 e0 pe1 e1 T:e1 Figure 3: Precedence between primitive events. Denition 3.5. (Causal past [10, 28, 30]) The function p: : E ,! 2E denes the causal past of an event as follows. For any e 2 E , pe = fe0 2 E j e0 eg. Note that pe is a consistent cut. Denition 3.6. (Strict causal past) The function p-: : E ,! 2E denes the strict causal past of an event. For any e 2 E , p- e = fe0 2 E j e0 eg. Note that p-e = pe n feg. The causal past in some process i of an event is the set of all its predecessors in i. Denition 3.7. (Causal past in a process [10]) For any i 2 P , the function pi: : E ,! 2Ei denes the causal past in process i of an event. For any e 2 E , pi e = pe \ Ei = fe0 2 Ei j e0 eg. Denition 3.8. (Strict causal past in a process) For any i 2 P , the function p-i : : E ,! 2Ei denes the strict causal past in process i of an event. For any e 2 E , p-i e = p- e \ Ei = fe0 2 Ei j e0 eg. Note that p-i e = pie n feg; if e 62 Ei, then p-i e = pie. The following corollaries are a direct result of the above denitions. Corollary 3.9. For any events e0; e1 2 E , e0 e1 , pe0 pe1. Corollary 3.10. For any events e0; e1 2 E , e0 e1 , pe0 p-e1 , pe0 pe1. Corollary 3.11. For any process i 2 P and events e0 2 Ei and e1 2 E , e0 e1 , pie0 pie1. Corollary 3.12. For any process i 2 P and events e0 2 Ei and e1 2 E , e0 e1 , pie0 p-i e1. Observe that Corollary 3.12 does not have a counterpart in terms of only the causal past in a process, as Corollary 3.10 has. For any process i 2 P and events e0 and e1 as above, e0 e1 6) pi e0 pi e1 . It is easy to choose e0 and e1 such that pi e0 = pi e1 . The introduction of the causal past is sucient to formalize the notion of vector timestamps. A vector timestamp of size N is assigned to every event such that component i 2 P of the timestamp is equal to the number of predecessors of the event in process i. 7 Denition 3.13. (Timestamp function [11, 28]) The function T : E ,! INN denes a timestamp for every event as follows. For any event e 2 E and process i 2 P , T:e:i = jpiej. The vector representation of timestamps is possible only because the number of processes is known. However, vector representation of timestamps is not essential. If the number of processes is not known, the timestamp of an event can be dened as a set of pairs, where each pair consists of a process identier and the corresponding timestamp component [16]. An example of the assignment of vector timestamps to events is given in Figure 4, showing a standard process-time diagram. Horizontal lines represent processes. Time increases from left to right. Events are depicted as dots. Arrows represent the communication relation ,. A synchronous communication is represented by a vertical arrow, an asynchronous communication by a slanted arrow. Note that in Process P1 , the increment of the third component of the vector timestamp from 0 to 1 and from 1 to 2 is the result of the synchronous communication between P1 and P2 . P0 P1 P2 (1,0,0) (2,0,0) (3,0,0) (0,1,1) (2,3,3) (2,2,2) (0,0,1) (0,1,2) (0,1,3) Figure 4: Timestamping events in a distributed computation. The following two well-known theorems show that timestamps can be used to determine the causal relation between primitive events. For two vectors t0 ; t1 2 INN , we dene t0 t1 if and only if t0 :i t1 :i for all i, 0 i < N , and t0 < t1 if and only if t0 t1 and t0 6= t1 . Theorem 3.14. (Precedence test [27, 28, 30]) For any events e0; e1 2 E , e0 e1 , T:e0 T:e1 and e0 e1 , T:e0 < T:e1. This precedence test formalizes the visualization of precedence given in Figure 3. It provides an ecient way to determine precedence among primitive events; at most N integer comparisons are necessary. Precedence can be determined even more eciently if it is known in which process an event occurs. Theorem 3.15. (Precedence test [27, 30]) For any i 2 P and events e0 2 Ei and e1 2 E , e0 e1 , T:e0:i T:e1:i. This theorem shows that only one integer comparison is needed to decide whether an event precedes another event if the process in which the event occurs is known. For the example of Figure 4, the validity of the precedence tests given in Theorems 3.14 and 3.15 is easily checked. Theorem 3.15 does not have a counterpart for the irreexive precedence relation . Corollary 3.12 suggests a solution to this problem. We introduce another timestamp function, dened in terms of the strict causal past of events. Denition 3.16. (Timestamp function T -) The function T - : E ,! INN denes a timestamp for every event as follows. For any event e 2 E and process i 2 P , T - :e:i = jp-i ej. 8 Corollary 3.17. ( For any i; j 2 P and any event e 2 Ei, for j = i T -:e:j = T:e:i , 1; T:e:j; otherwise This corollary shows that the timestamp T - of an event can be easily calculated from the timestamp T , provided that it is known in which process the event occurs. The introduction of timestamp T and Corollary 3.12 yield the following theorem. Theorem 3.18. (Precedence test) For any i 2 P ; e0 2 Ei; and e1 2 E , e0 e1 , T:e0:i T -:e1:i. The precedence test in this theorem is a characterization of the irreexive precedence relation such that only one integer comparison is needed to determine whether an event precedes another event. In order to use the test, it is necessary to know in which process events occur. Therefore, by Corollary 3.17, timestamp T - can be simply calculated from timestamp T ; it is not necessary to keep track of two sets of timestamps. We have introduced T - for the reason of clarity. It proves to be convenient in Section 6. (Note that it is also possible to formulate a variant of the precedence test in Theorem 3.14 in terms of T - . However, such a test is not of any practical use.) For implementation purposes, it is important to know that vector timestamps can be calculated algorithmically during or after the execution of a distributed program. There exists a well-known, straightforward algorithm based on counters. In [30], Schwarz and Mattern present this algorithm and they discuss techniques to implement vector timestamps eciently. In the remainder of this subsection, the notion of global time is formalized. Global-time vectors have a structure that is isomorphic to the structure of cuts. Denition 3.19. (Global time of a cut [28]) Function T : Cl ,! INN denes the global time of a cut. For any cut C , component i, with 0 i < N , of the time vector is dened as T :C:i = jC \ Ei j. The set of global-time vectors of a computation, fT :C j C 2 Cl g, is denoted Tl . The following corollaries follow immediately from the denitions given so far. Corollary 3.20 states that the timestamp of an event reects its causal past (see Figure 3). Corollary 3.20. [28] For any event e 2 E , T :pe = T:e. Corollary 3.21. For any event e 2 E , T :p-e = T -:e. Function T is an isomorphism between the two lattices (Cl ; ) and (Tl ; ), i.e., for any cuts C0; C1 2 Cl , C0 C1 , T :C0 T :C1. This yields the following theorems. Theorem 3.22. (Structure of time vectors [28]) The set of time vectors, Tl , with the ordering dened by , forms a complete lattice. It is isomorphic to the lattice (Cl ; ). Theorem 3.23. (Structure of consistent time vectors [27]) The set of consistent time vectors T = fT :C j C 2 Cg, with the ordering , forms a complete lattice. It is isomorphic to (C; ). The last two results are established by dening an isomorphism between the lattices of cuts and time vectors. The inmum and supremum of a set of global-time vectors corresponding to a set C of cuts, are therefore implicitly dened as T :(\ c : c 2 C : c) and T :([ c : c 2 C : c). Let the quantier SUP (and the corresponding binary operator \sup") on time vectors be dened as the componentwise maximum, and the quantier INF (and binary operator \inf") as the componentwise minimum. It follows from Denitions 3.1 (Cut) and 3.19 (Global time) that T :(\ c : c 2 C : c) = (INF c : c 2 C : T :c) and T :([ c : c 2 C : c) = (SUP c : c 2 C : T :c). In other words, the inmum and supremum of sets of time vectors are dened by INF and SUP respectively. 9 4 Abstract Events and Causality An abstract description of program behavior is a set of abstract events plus a characterization of causality among abstract events. In a hierarchy of such abstract descriptions, an abstract event is described uniquely by its constituents in the previous level. However, to avoid recursive denitions and inductive proofs, in the following, abstract events are represented by non-empty sets of primitive events. Obviously, the latter representation can always be derived from the former. Denition 4.1. (Abstract event) An abstract event is a non-empty set of primitive events. The causality structure in an abstract description of program behavior is dened in terms of precedence. An important question is what is a meaningful denition of precedence among abstract events. In Section 3.1, where precedence among primitive events has been discussed, we have already seen a characteristic property of a precedence relation, namely that it is a(n irreexive) partial order. This implies that the precedence relation has the desirable properties of anti-symmetry (or asymmetry) and transitivity. It also provides a very natural way to express concurrency among events. Two events are concurrent if and only if they are unrelated by the precedence relation. Intuitively, a precedence relation on abstract events should also have these properties. An important observation when trying to dene causality among abstract events is that, as opposed to primitive events, abstract events are no longer atomic. Abstract events are composed of primitive events (or lower-level abstract events). It seems natural to dene causality among abstract events in terms of the causal relations between their primitive events. Let us return to the intuition behind the precedence relation on primitive events. One plausible interpretation of this relation is that an event precedes another event if and only if the latter causally depends on the completion of the former. In terms of abstract events this can be phrased as follows. An abstract event precedes another abstract event if and only if all the primitive events in the rst abstract event precede all the primitive events in the other one. Only then is it guaranteed that the second abstract event cannot start before the rst one is completed. This leads to the following denition of a precedence relation on abstract events, rst dened by Lamport in [24]. Denition 4.2. (Strong precedence relation on abstract events) For any abstract events A and B , A B , (8 a : a 2 A : (8 b : b 2 B : a b)): Property 4.3. The strong precedence relation is an irreexive partial order on abstract events. Proof. It is easy to verify that the strong precedence relation is irreexive and transitive. 2 Note that it is essential that the above denition is formulated in terms of the irreexive precedence relation on primitive events. If it had been dened in terms of the reexive precedence relation, the relation would no longer be irreexive, which can be seen by considering a singleton abstract event. Also note that no matter what variant of the precedence relation on primitive events is used, the strong precedence relation is not reexive. At a rst glance, the strong precedence relation might seem to satisfy the properties of the precedence relation on primitive events mentioned above. However, this is not the case. It does not conform to the intuitive meaning of concurrency. Concurrency between primitive events is dened as their being unrelated by the precedence relation. If a similar denition is used for abstract events, two abstract events can be concurrent while some primitive events in one abstract event precede some primitive events in the other, which is clearly counterintuitive. 10 This observation inspires another denition of causality among abstract events. An abstract event precedes another abstract event if and only if some primitive events in the former precede some primitive events in the latter. This denition conforms to an interpretation of the precedence relation on primitive events that is subtly dierent from the interpretation given above, namely that a primitive event precedes another primitive event if and only if it can causally aect the other event. Seen in the light of our earlier observation that abstract events are no longer atomic, this second denition of causality among abstract events should not come as a surprise. In the words of Lamport [24]: \Nonatomicity introduces the possibility that an operation execution A can inuence an operation execution B without preceding it; it is necessary only that some action of A precede some action of B . Hence in addition to the precedence relation [: : :], one needs an additional relation [: : :] \can aect," where A can aect B means that some action of A precedes some action of B." Denition 4.4. (Weak precedence relation on abstract events) For any abstract events A and B , A ! B , (9 a : a 2 A : (9 b : b 2 B : a b)): Note that the weak precedence relation allows a natural denition of concurrency among abstract events: Two abstract events are concurrent if and only if they are unrelated by the weak precedence relation. Only then is there absolutely no causal relation between two concurrent abstract events. Note that this would not be true if weak precedence had been dened in terms of the irreexive variant of the precedence relation on primitive events. Unfortunately, the weak precedence relation also has a drawback. Figure 5 shows that the weak precedence relation is neither an irreexive partial order nor a reexive one. In Figure 5(a), A ! B and B ! A, which is a violation of the asymmetry requirement of an irreexive partial order. Note that the asymmetry requirement is a direct consequence of the irreexivity and transitivity requirements. Since clearly A is not equal to B, the fact that A ! B and B ! A also contradicts the anti-symmetry requirement of a reexive partial order. In Figure 5(b), A ! B and B ! C . However, we do not have A ! C , which implies that the weak precedence relation is also not transitive. C B B A A (a) (b) Figure 5: The weak precedence relation is not a partial order on abstract events: (a) A violation of the asymmetry resp. anti-symmetry requirement; (b) a violation of the transitivity requirement. Although the strong and weak precedence relations each have shortcomings, their combination seems to be a good characterization of precedence among abstract events. It is possible to express that an abstract event as a whole precedes another abstract event. It is possible to express that part of an abstract event precedes part of another abstract event. Finally, concurrency among 11 abstract events can be expressed in a natural way. The conclusion that weak and strong precedence are meaningful indeed is supported by the fact that (variants of) both weak and strong precedence appear in many places in the literature. As already mentioned, the denitions given above are taken from the work of Lamport [24, 25], in which an extensive motivation for both relations is given. A variant of the weak precedence relation, applied in the context of debugging distributed programs, appears in [20]. As already mentioned in Section 2, we also believe that weak precedence plays an important role in this area. Another variant of the weak precedence relation appears in the area of distributed databases [29], where abstract events correspond to transactions. Restricted versions of both weak and strong precedence formulated in terms of states instead of events appear in [15, 17, 18]. They are restricted in the sense that, in the terminology of this paper, each abstract event can contain primitive events from only a single process. A dierent approach is taken in [1, 2]. In these papers, the notions of atoms and molecules as abstract events are introduced, as well as precedence relations on such abstract events. Without going into detail, we will explain what we believe to be a serious shortcoming of this approach to event abstraction. Consider the example of Figure 5(b). In the terminology of [1, 2], abstract events A, B, and C are all atoms. Since the precedence relation in [1, 2] is transitive, this implies that A precedes C , although none of the primitive events in A is related to any of the events in C . In our opinion, this is undesirable. An abstract representation of program behavior should not suggest causal relations that are not present when considering a lower level of abstraction. Note that it follows immediately from the denitions given above that both weak and strong precedence satisfy this requirement. The price that we have paid is that weak precedence is not a partial order. So far, we have only mentioned work that explicitly addresses the question of causality among (relatively) general abstract events. Some papers describing event abstraction simply ignore the issue of causality [6, 7, 19]. Others limit their attention to abstract events with specic structural properties [8, 33]. While this allows them to prove certain desirable properties for their abstract events, it severely limits the modeling power. 5 Timestamping Abstract Events 5.1 Timestamps and Precedence Tests for Abstract Events This subsection discusses some basic issues with respect to timestamps and precedence tests for abstract events. Two criteria for timestamps and precedence tests are introduced: eciency and hierarchical applicability. Furthermore, it is argued that, in general, one timestamp is not sucient to determine precedence among abstract events. The basic work on timestamping abstract events with vector timestamps is the paper of Haban and Weigel [19]. However, as mentioned, this work lacks a good analysis of causality among abstract events. The causality relation among abstract events is dened implicitly by their timestamps. As shown by Schwarz and Mattern in [30], this leads in some cases to counterintuitive and undesirable precedences among abstract events. They believe that the reason for these counterintuitive precedences is the fact that abstract events are assigned only a single timestamp, which denies their non-atomic nature. We agree with this conclusion and, below, we give an example showing that indeed at least two timestamps are needed to faithfully characterize precedence among arbitrary abstract events. Furthermore, Sections 6 and 7 show that each of the two precedence relations dened in the previous section can be characterized by two timestamps. Since the two characterizations share one timestamp, three timestamps are sucient to characterize the combination of weak and strong precedence among arbitrary abstract events. Hence, Sections 6 and 7 present a solution|or 12 at least a partial one|to one of the open problems stated in [30], namely that of assigning meaningful timestamps to arbitrary abstract events. The main reason that we did not encounter the problems stated by Schwarz and Mattern is that we separated the issues of specication and detection of abstract events, on the one hand, and assigning timestamps to abstract events, on the other hand. In our work, timestamps of abstract events are not derived from their specications, as in the work of Haban and Weigel, but solely from the timestamps of their constituents. For a more detailed discussion of specifying, detecting, and timestamping abstract events, as well as an overview of some related work, see [30]. Before explaining our criteria for timestamps and precedence tests in more detail and substantiating our claim that a single timestamp is not sucient to characterize causality among arbitrary abstract events, we make one assumption about precedence tests and a few assumptions about timestamps for abstract events. A very obvious, but nonetheless important assumption about any test for strong or weak precedence among abstract events is that it should be a correct and complete characterization of (strong or weak) precedence. That is, a causal relation between two abstract events is derivable from a precedence test if and only if it is derivable from the corresponding denition (Denition 4.2 or 4.4). In particular, we do not consider any tests which are not complete. That is, we do not consider tests that do not yield all causal relations among abstract events. This assumption does not dier from the assumption made at the beginning of Section 3.2 for precedence tests for primitive events. For timestamps, we make the following four assumptions. First, for the sake of simplicity, any timestamp should contain at most one integer entry per process.1 Second, when calculating a timestamp for an abstract event, no information about any primitive event is used other than the process in which the event occurs, whether it is part of the abstract event, and its timestamp(s). Third, the only information about a process that may be used is whether (part of) the abstract event occurs in it. Finally, all abstract events are assigned timestamps that are calculated by means of the same timestamping algorithm(s). The last three assumptions ensure that no specic information about the structure of a distributed computation, its processes, or its primitive events is used. Thus, any timestamping scheme discussed in this paper is as general as possible and applicable to a wide variety of applications. Timestamps and precedence tests not satisfying the above assumptions are not discussed in this paper. However, timestamps and tests that do satisfy the assumptions can be better or worse than others. Therefore, we introduce the following two criteria to evaluate timestamps and precedence tests. The rst criterion is eciency. Timestamps and precedence tests must be reasonably ecient in storage and computation time. That is, storage and computation time must be similar to storage and computation time needed for determining precedence between primitive events. Given the rst assumption for timestamps above, the amount of storage needed for a precedence test is determined by the number of timestamps needed. Computation time is determined by the algorithms used to calculate timestamps and the number of comparisons needed to determine a causal relation between any two abstract events given their timestamps. The second criterion is hierarchical applicability. In a hierarchy of true abstract descriptions of program behavior, a characterization of causality should not depend on any other levels than the level immediately below. This means that it must be possible to calculate a timestamp for an Since the timestamps in any timestamp scheme are countable, it is always possible to encode timestamps by single integers. However, we assume that the timestamps are derived from the timestamps of primitive events in a reasonable way, which excludes complex encodings into the integers. Moreover, such an encoding would complicate the comparison of timestamps for the purpose of determining causality, making any precedence test virtually impossible to use in practice. 1 13 abstract event from the timestamps of its constituents in the previous level. It also means that precedence tests must be dened in terms of the timestamps in the level being described. If tests depend on lower levels, determining precedence between two abstract events becomes computationally expensive, because the abstraction hierarchy must be traversed to a level that contains the desired information. Although for reasons of mathematical simplicity we have dened an abstract event as a set of primitive events and not as a set of lower-level abstract events, for most of the denitions and results in the remaining sections, it is clear whether they satisfy the second criterion. If not, some additional explanation is given. The example in Figure 6 shows that under the above assumptions, for both weak and strong precedence, at least two timestamps are needed to properly characterize precedence relations among arbitrary abstract events. Note that this implies a lower bound on the amount of storage needed for a characterization of precedence, which is twice as high as the amount of storage needed to characterize precedence among primitive events. B b0 b1 a0 a1 A Figure 6: Why at least two timestamps are necessary. It is obvious that in the simple sequential computation of Figure 6, consisting of only a single process, both A ! B and B ! A. The rst assumption for timestamps above implies that any timestamp for abstract events is simply an integer. In order to reect the two weak causal precedences between A and B with only a single integer timestamp for each abstract event, the timestamps for A and B should be equal. However, under the above assumptions, it is not dicult to see that any reasonable timestamp for A is always smaller than the same timestamp for B .2 Hence, one needs at least two timestamps to characterize weak precedence among abstract events. For strong precedence, the reasoning is similar. Obviously, A 6 B and B 6 A. That is, A and B are unrelated by the strong precedence relation. To reect this with only a single timestamp, the timestamps for A and B must be unordered, which is impossible under the above assumptions. One might argue that the example of Figure 6 is a degenerate case, but it is not hard to construct more complex examples for distributed computations with abstract events having constituents in more than one process. 5.2 Basic Denitions and Results This subsection introduces some new denitions and adapts some previous denitions to abstract events. It also presents some basic results that are useful in the next sections. Denition 5.1. (Location set) The location set of a primitive or abstract event is dened by a function l: : E [ 2E ,! 2P as the set of processes in which the event occurs. For any e 2 E , It is possible to assign equal timestamps to A and B that satisfy all four assumptions: For example, any abstract event gets timestamp 481. However, this timestamp does not characterize weak precedence correctly for abstract events other than A and B . It is also possible to assign timestamps to A and B satisfying all the assumptions such that the timestamp of A is larger than the timestamp of B . For example, if for any abstract event C in the computation of the example, timestamp T is dened as 481 , (MAX c : c 2 C : T:c), then T:A > T:B . Apart from the fact that this timestamp does not solve the problem, we do not think it is reasonable. Note that Section 6 shows that (MAX c : c 2 C : T:c) is a reasonable timestamp to determine precedence in the example computation. 2 14 le = fi 2 P j e 2 Eig. For any A E , lA = fi 2 P j A \ Ei 6= g. Note that the location set of a primitive event is always a singleton. As mentioned before, a property of primitive events that no longer holds for abstract events is atomicity. This is expressed by the following two functions. Denition 5.2. (Beginning and end of an abstract event) The beginning of an abstract event A is dened by a function b:c : 2E ,! 2E as bAc = fa0 2 A j :(9 a1 : a1 2 A : a1 a0)g. The end of an abstract event A is dened by a function d:e : 2E ,! 2E as dAe = fa0 2 A j :(9 a1 : a1 2 A : a0 a1)g. Note that to determine precedence among abstract events, it is sucient to consider the beginning and end of the events instead of the events in their entirety. The following two denitions lift the notions of causal past and strict causal past to abstract events. The question is when a primitive event is an element of the (strict) causal past of an abstract event. The most general answer to this question leads to the denition of the causal past: A primitive event is in the causal past of an abstract event if and only if it precedes any of the primitive events in the abstract event. That is, the causal past of an abstract event is dened as the union of the causal pasts of all its primitive events. The second, more restricted answer to the above question yields the denition of the strict causal past: A primitive event is an element of the strict causal past of an abstract event if and only if it precedes all primitive events in the abstract event. The strict causal past of an abstract event is dened as the intersection of all strict causal pasts of its primitive events. Using the strict causal past of primitive events instead of their causal past yields a more natural result. If the latter had been used, the causal past of an abstract event might overlap with the abstract event itself. Denition 5.3. (Causal past of an abstract event) The causal past of an abstract event is dened by a function p: : 2E ,! 2E as follows. For any A E , pA = ([ a : a 2 A : pa). Denition 5.4. (Strict causal past of an abstract event) The strict causal past of an abstract event is dened by a function p- : : 2E ,! 2E as follows. For any A E , p- A = (\ a : a 2 A : p- a). Note that p- A is not necessarily equal to pA n A. Denition 5.5. (Causal past of an abstract event in a process) For any i 2 P , the function pi: : 2E ,! 2Ei denes the causal past in process i of an abstract event as follows. For any A E , piA = pA \ Ei = ([ a : a 2 A : pia). Denition 5.6. (Strict causal past of an abstract event in a process) For any i 2 P , the function p-i : : 2E ,! 2Ei denes the strict causal past in process i of an abstract event as follows. For any A E , p-i A = p- A \ Ei = (\ a : a 2 A : p-i a). The following two corollaries are a direct result of the previous denitions. They show that the denitions of the beginning and end of abstract events are intuitively correct. They also show that the causal past and strict causal past are useful notions in reasoning about causality among abstract events, just as they proved to be useful in characterizing precedence among primitive events. Corollary 5.7 states that the past of the end of an abstract event corresponds to the past of the completed event. Corollary 5.8 shows that each event that precedes all events in the beginning of an abstract event precedes all the events in the abstract event. 15 Corollary 5.7. For an abstract event A, pdAe = pA. Corollary 5.8. For an abstract event A, p-bAc = p-A. The next corollary gives an expression for the global time of the causal past of an event in some process in terms of the global time of its causal past. It is a direct consequence of Denition 5.5 (Causal past of an abstract event in a process) and Denition 3.19 (Global time of a cut). Corollary 5.9.( For any process i 2 P and abstract event A, for j = i T :piA:j = T0;:pA:i; otherwise We have a similar result for the global time of the strict causal past of an abstract event in some process. Corollary 5.10. ( For -any process i 2 P and abstract event A, for j = i T :p-i A:j = T0;:p A:i; otherwise The last two results of this section give expressions for the global time of the causal past and strict causal past of abstract events. The global time of the causal past of an abstract event is equal to the componentwise maximum of the timestamps of its constituents; the global time of its strict causal past is equal to the componentwise minimum. Property 5.11. For any abstract event A, i) T :pA = (SUP a : a 2 A : T:a) ii) T :p-A = (INF a : a 2 A : T -:a) Proof. We prove only Property i). The proof of Property ii) is similar. T :pA = f Denition 5.3 (Causal past of an abstract event) g T :([ a : a 2 A : pa) = f Theorem 3.23 (Structure of consistent time vectors) g (SUP a : a 2 A : T :pa) = f Corollary 3.20 g (SUP a : a 2 A : T:a) 2 6 Characterizations of Strong Precedence This section presents two timestamps and two precedence tests for the strong precedence relation on abstract events. One test does not use information about the location set of abstract events; the other does and is, therefore, more ecient. Consider the following derivation. Note that in this derivation, the introduction of timestamp T - for primitive events is very convenient. 16 AB , f Denition 4.2 (Strong precedence) g (8 a : a 2 A : (8 b : b 2 B : a b)) , f Corollary 3.10 g (8 a : a 2 A : (8 b : b 2 B : pa p-b)) , f Denition 5.3 (Causal past) g (8 b : b 2 B : pA p- b) , f Denition 5.4 (Strict causal past) g pA p-B , f Theorem 3.23 (Structure of consistent time vectors) g T :pA T :p-B , f Property 5.11 g (SUP a : a 2 A : T:a) (INF b : b 2 B : T -:b) Summarizing, A B , (SUP a : a 2 A : T:a) (INF b : b 2 B : T -:b): This result shows that it is useful to extend the two timestamp functions T and T - on primitive events to abstract events as follows. Denition 6.1. (Timestamps for abstract events) The functions T; T - : 2E ,! INN dene timestamps for abstract events as follows. For any A E , T:A = (SUP a : a 2 A : T:a) and T -:A = (INF a : a 2 A : T -:a). Timestamp T is an ecient encoding of the causal past and, hence, the end of an abstract event (see also Corollary 5.7). Timestamp T - is an encoding of the strict causal past of an abstract event. It represents the cut containing all events that precede the beginning of the abstract event and, hence, all the primitive events in the abstract event (Corollary 5.8). In case of a hierarchy of abstract descriptions of program behavior, the associativity of the quantiers SUP and INF implies that the timestamps of an abstract event are equal to the supremum and inmum of the timestamps of its constituents in the previous level of abstraction, which means that they satisfy the requirement of hierarchical applicability. Another criterion for timestamps is that their construction must be ecient. Given an abstract event A which consists of k primitive or lower-level abstract events, calculating T:A needs (k , 1) N (binary) max operations on integers; calculating T - :A takes (k , 1) N min operations. Hence, the construction of both timestamps is indeed ecient. The introduction of the timestamps and the derivation above yield the following precedence test on abstract events. Figure 7 visualizes the meaning of both timestamps for abstract events and the precedence test. Theorem 6.2. (Precedence test) For any abstract events A and B, A B , T:A T -:B. pA A T:A T - :B p-B B AB Figure 7: The meaning of the timestamps and the precedence test for abstract events. 17 As a simple example, consider the computation of Figure 6. The timestamps for events A and B are as follows: T -:A = 0, T:A = 3, T -:B = 1, and T:B = 4. It follows that A 6 B, because T:A 6 T -:B. Since T:B 6 T -:A, it also follows that B 6 A. Note that T:A is smaller than T:B; the same is true for timestamp T - , which conforms to the observation made in Section 5.1 that any reasonable timestamp for event A is always smaller than the corresponding timestamp for B . Since the timestamps T and T - for abstract events satisfy the requirement of hierarchical ap- plicability, the precedence test of Theorem 6.2 also does. Concerning the eciency of the test, the following can be said. First, it uses two dierent timestamps, which is the minimum number of timestamps. Hence, the required amount of storage is minimal. Second, we have already seen that the construction of T and T - is very ecient. Finally, the maximum number of integer comparisons needed to determine whether some abstract event A precedes some abstract event B is equal to the number of processes N . To illustrate the meaning of this upper bound, simply checking whether all primitive events in A precede all primitive events in B leads to, using the precedence test of Theorem 3.14, an upper bound of jAj jB j N integer comparisons. Keeping track of the beginning and end of A and B and only comparing primitive events in the end of A to events in the beginning of B leads to jdAej jbB cj N comparisons which implies an upper bound of jlAj jlB j N . However, calculating the beginning and end of an abstract events is computationally expensive, so the last test is not as ecient as it may seem. Summarizing, the test of Theorem 6.2 satises the two criteria for precedence tests. It is hierarchically applicable and its eciency is very reasonable. However, the test in Theorem 6.2 does not use the location set of an abstract event. It is possible to reduce the maximum number of integer comparisons if this information is available. Assume A and B are abstract events. The derivation below yields a precedence test using location information. The second step might need some extra explanation. Let a and b be events in A and B respectively; let a occur in process j . If a b, then Corollary 3.10 implies that pa p-b and, hence, that for all i 2 P , pi a p-i b. On the other hand, if we know that for all i 2 lA, pia p-i b, then obviously pj a p-j b, which by Corollary 3.12 implies that a b. The other steps are all very similar to the derivation above. AB , f Denitions 4.2 (Strong precedence) g (8 a : a 2 A : (8 b : b 2 B : a b)) , f Corollaries 3.10 and 3.12 g (8 i : i 2 lA : (8 a : a 2 A : (8 b : b 2 B : pi a p-i b))) , f Denition 5.5 (Causal past in a process) g (8 i : i 2 lA : (8 b : b 2 B : pi A p-i b)) , f Denition 5.6 (Strict causal past in a process) g (8 i : i 2 lA : pi A p-i B ) , f Theorem 3.23 (Structure of time vectors) g (8 i : i 2 lA : T :piA T :p-i B ) , f Corollaries 5.9 and 5.10 g (8 i : i 2 lA : T :pA:i T :p- B:i) , f Property 5.11; Denition 6.1 (Timestamps T and T -) g (8 i : i 2 lA : T:A:i T - :B:i) This derivation leads to the following precedence test for strong precedence among abstract events, which is a generalization of the precedence test for primitive events given in Theorem 3.18. Theorem 6.3. (Precedence test) For any abstract events A and B, A B , (8 i : i 2 lA : T:A:i T -:B:i). 18 Obviously, this test also satises the two criteria for timestamps and precedence tests. As promised, it is even more ecient in terms of the maximum number of integer comparisons than the previous test of Theorem 6.2. The maximum number of integer comparisons needed to determine whether abstract event A precedes abstract event B is equal to jlAj. Of course, the price to be paid for the improvement in the number of comparisons is that one has to keep track of the location information for primitive and abstract events. To conclude, the formal framework presented in the previous sections leads in a fairly straightforward way to two characterizations of strong precedence that satisfy the two criteria of Section 5. One characterization uses location information, whereas the other does not. The question of which precedence test is most useful can be answered only in the context of a particular application. 7 Characterizations of Weak Precedence 7.1 Another Timestamp for Abstract Events In this subsection, we give two characterizations of weak precedence. The rst one is the result of a straightforward derivation and uses the timestamp T introduced in the previous section. It does not use location information for events. Unfortunately, this test does not satisfy the two criteria for precedence tests. The second half of this subsection introduces a new timestamp for abstract events and a precedence test in terms of this timestamp. This second test does satisfy the criteria. However, both the new timestamp and the precedence test depend on location information for events. In the framework developed so far, we have not been able to nd a test for weak precedence that does not use location information. In the next subsection, we return to this point. Let A and B be two abstract events. Consider the following derivation. A!B , f Denitions 4.4 (Weak precedence) and 5.2 (Beginning/end of abstract events) g (9 a : a 2 bAc : (9 b : b 2 dB e : a b)) , f Corollary 3.9 g (9 a : a 2 bAc : (9 b : b 2 dB e : pa pb)) , f Denition 5.3 (Causal past); Corollary 5.7 g (9 a : a 2 bAc : pa pB )) For any primitive event a, pa pB , f Theorem 3.23 (Structure of consistent time vectors) g T :pa T :pB , f Corollary 3.20; Property 5.11; Denition 6.1 (Timestamp T ) g T:a T:B This derivation leads to the following precedence test for weak precedence, illustrated in Figure 8. Theorem 7.1. (Precedence test) For any abstract events A and B, A ! B , (9 a : a 2 bAc : T:a T:B). The test in Theorem 7.1 has two disadvantages. First, it is not very ecient. In the worst case, the timestamp of each primitive event in the beginning of abstract event A must be compared to the timestamp of the other abstract event, yielding jbAcj N comparisons which implies an upper bound 19 A A!B B T:B Figure 8: The meaning of the precedence test for weak precedence. of jlAj N comparisons. Second, the test depends on the primitive level of the computation. Two or more abstract events cannot be merged into a higher-level abstract event without using information from the primitive level to compute the beginning of the newly formed abstract event. Therefore, the test does not satisfy either of the criteria for precedence tests. Note that the test in Theorem 7.1 uses only a single timestamp for abstract events. However, this does not contradict our conclusion of Section 5 that at least two timestamps are needed to determine precedence among abstract events. The use of timestamps from the primitive level of the computation is an implicit second timestamp. The most important contribution of the above derivation and the resulting precedence test of Theorem 7.1 is that they illustrate the following interesting point. There is an asymmetry in the way the beginning and the end of the abstract events are used. The beginning of an abstract event is used explicitly. The end of an abstract event is encoded nicely in its timestamp T . If the asymmetry can be resolved, this might lead to a precedence test that does not depend on the primitive level of the computation. So the question is whether we can nd an encoding of the beginning of an abstract event. Note that timestamp T - is too restrictive. It is possible to formulate a precedence test in terms of T - (and T ) that only yields a causal relation between two abstract events if there is such a relation, but that does not always give a relation if there is one. As mentioned in Section 5.1, we do not consider such precedence tests in this paper. We have not been able to answer the above question in the current framework without using location information. In the next subsection, the framework is extended with so-called reversed vector time which can be considered as the dual of vector time. With this extension, it is possible to nd an encoding of the beginning of an abstract event, similar to the encoding of the end by timestamp T , that does not use location information. In the remainder of this section, we derive another timestamp for abstract events and a precedence test for weak precedence in terms of this timestamp. The new timestamp as well as the precedence test use information about the location set of abstract events; they (partially) resolve the asymmetry between the use of the beginning and the end of abstract events in the test of Theorem 7.1. It yields a precedence test which is independent of the primitive level of the computation and which is more ecient than the test of Theorem 7.1. Let A and B be abstract events. It follows immediately from Denitions 5.1 (Location set) and 4.4 (Weak precedence) that A ! B , (9 i : i 2 lA : (9 a : a 2 A \ Ei : (9 b : b 2 B : a b))): Figure 8 may be helpful in understanding the following derivation, which is numbered for the purpose of future reference. Derivation 7.2. Let i be a process in lA. (9 a : a 2 A \ Ei : (9 b : b 2 B : a b)) , f Corollary 3.11 g 20 (9 a : a 2 A \ Ei : (9 b : b 2 B : pi a pi b)) f Ei is totally ordered by ; Denition 3.7 (Causal past in a process) g (9 b : b 2 B : (\ a : a 2 A \ Ei : pi a) pi b) , f Denition 5.5 (Causal past in a process) g (\ a : a 2 A \ Ei : pi a) pi B , f Theorem 3.22 (Structure of time vectors) g T :(\ a : a 2 A \ Ei : pia) T :piB , f Lemma 7.3 (see below); Corollary 5.9 g (MIN a : a 2 A \ Ei : T:a:i) T :pB:i , f Property 5.11; Denition 6.1 (Timestamp T ) g (MIN a : a 2 A \ Ei : T:a:i) T:B:i , Lemma 7.3. For any abstract event ( A, process i 2 lA, and process j 2 P , a : a 2 A \ Ei : T:a:i); for j = i T :(\ a : a 2 A \ Ei : pia):j = (MIN 0; otherwise Proof. First, (observe that for any i 2 P and any event e 2 E , for j = i T :pie:j = 0T;:pe:i; otherwise The following derivation proves the desired result. T :(\ a : a 2 A \ Ei : pia):j = f Theorem 3.22 (Structure of time vectors) g (INF a : a 2 A \ Ei : T :pi a):j = ( f Denition (Componentwise minimum); observation above g (MIN a : a 2 A \ Ei : T :pa:i); for j = i 0; otherwise = ( f Corollary 3.20 g (MIN a : a 2 A \ Ei : T:a:i); for j = i 0; otherwise 2 Summarizing, the above derivations yield the following test: A ! B , (9 i : i 2 lA : (MIN a : a 2 A \ Ei : T:a:i) T:B:i ). This result suggests the introduction of another timestamp for abstract events. The components corresponding to a process in the location set of an abstract event must be equal to the minimum calculated above; the other components can be chosen arbitrarily, since they are not used in the test. In order to dene the new timestamp on abstract events, we also dene it on primitive events. Denition 7.4. (Timestamp T w ) The function T w : E [ 2E ,! INN denes a timestamp for primitive and(abstract events as follows. For any e 2 E , T:e:i; if e 2 Ei T w :e:i = 1 ; otherwise For any A E , T w :A = (INF a : a 2 A : T w :a). 21 The value of all the components of timestamp T w corresponding to processes outside the location set of a primitive event is set to innity. The reason for this is mathematical convenience. In an actual implementation, one would choose some large integer value, for example, MaxInt. It follows from the following derivation that timestamp T w satises the above requirement concerning timestamp components corresponding to a process in the location set of some abstract event. Derivation 7.5. For any abstract event A and process i 2 lA, T w :A:i = f Denition 7.4 (Timestamp T w ); Denition INF(Componentwise minimum) g (MIN a : a 2 A : T w :a:i) = f Algebra g (MIN a : a 2 A \ Ei : T w :a:i)min (MIN a : a 2 A n Ei : T w :a:i) = f Denition 7.4 (Timestamp T w ) g (MIN a : a 2 A \ Ei : T:a:i) The denition of T w immediately yields that for processes outside the location set of A, the corresponding component of T w :A equals innity. The following precedence test follows from the calculations above and the introduction of the new timestamp. Theorem 7.6. (Precedence test) For abstract events A and B, A ! B , (9 i : i 2 lA : T w :A:i T:B:i): Let us consider again the example of Figure 6. For abstract events A and B , timestamp T w is equal to 1 and 2 respectively. Since T:A equals 3 and T:B equals 4, it follows from the above precedence test that A ! B and B ! A. It remains to be veried whether this test satises the two criteria for timestamps and precedence tests. It follows from the associativity of the operator INF that, in a hierarchy of abstract descriptions of program behavior, for any abstract event, timestamp T w can be calculated from the timestamps of its constituents in the level immediately below. Hence, the timestamp and the test satisfy the criterion of hierarchical applicability. Concerning the eciency of the test, the following can be said. As for the two characterizations of strong precedence, it is necessary to maintain two timestamps for every abstract event. Note that it is not necessary to maintain a second set of timestamps for primitive events. For primitive events, timestamp T w can be calculated immediately from timestamp T , provided, of course, that it is known in what process the events occur. However, in order to use the new timestamp in a precedence test for abstract events, this information is necessary anyway, so it is not a real restriction. The construction of the new timestamp T w is as ecient as the construction of T and T - . As for the test in Theorem 6.3, the maximum number of integer comparisons needed to determine whether some abstract event A precedes another event is jlAj. 7.2 Reversed Vector Time This section presents a weak-precedence test for abstract events that does not use location information for events. For this purpose, we extend the framework developed so far with the causal future of events and reversed vector time. These notions are the duals of the causal past and ordinary vector time, respectively. They lead to an encoding of the beginning of an abstract event in terms of a reversed vector timestamp which is similar to the encoding of the end of an abstract event in terms of its timestamp T . The notion of the causal future of an event was already mentioned in [28]. However, the idea to use it as the basis for a timestamp is new. A drawback of reversed vector 22 time is that it is only suitable for post-mortem analysis of distributed computations. The whole set of primitive events is needed to calculate reversed timestamps. Thus, precedence tests in terms of reversed vector time are computationally more expensive than any of the tests presented so far. However, as explained in Section 2, in applications such as distributed debugging, the restriction to post-mortem analysis is not necessarily a limitation. More practical experience with event abstraction is needed to determine whether reversed vector time is a practically useful notion. For now, its main contribution is a better insight into the meaning of causality among abstract events. It also completes the results presented in this paper in the sense that it yields a precedence test for weak precedence which does not use location information and which satises the criterion of hierarchical applicability. A more extensive treatment of reversed vector time than presented in this subsection can be found in [5]. An application of reversed vector time to distributed breakpoints is described in [4]. The notions of causal future and reversed vector time are based on the successor relation , which is dened as the dual of . That is, for any e0 ; e1 2 E , e0 e1 if and only if e1 e0 . The local successor relation l is the dual of l . The relations and l are the reexive closures of and l respectively. The following denition introduces the dual notion of cuts. Denition 7.7. ((Consistent) successor cut) A set C E is called a successor cut, or -cut, of E if and only if for all events e0 2 C and e1 2 E , e1 l e0 ) e1 2 C . A -cut is left-closed under l. The set of all -cuts is denoted by Cl . Set C is called a consistent -cut of E if and only if for all events e0 2 C and e1 2 E , e1 e0 ) e1 2 C . A consistent -cut is left-closed under . The set of all consistent -cuts is denoted by C . The next corollary states the obvious relation between cuts and -cuts. Corollary 7.8. For any C E , C 2 Cl , E n C 2 Cl and C 2 C , E n C 2 C. The counterpart of the causal past of events is the so-called causal future. Denition 7.9. (Causal future) Function f : : E [ 2E ,! 2E denes the causal future of primitive and abstract events as follows. For any e 2 E , f e = fe0 2 E j e0 eg. For any A E , f A = ([ a : a 2 A : f a). Note that f e and f A are consistent -cuts. Denition 7.10. (Causal future in a process) For any process i 2 P , function fi: : E ,! 2E denes the causal future in process i of an event. For any e 2 E , fie = fe0 2 Ei j e0 eg. Since we are only interested in nite (prexes of) computations, it is appropriate to dene the following. Denition 7.11. (Reversed vector time of a successor cut) Function T R : Cl ! INN denes the reversed vector time of a -cut. For any -cut C , component i, where 0 i < N , of the reversed time vector is dened as T R :C:i = jC \ Ei j. The following corollary is a direct result of this denition, the denition of the time of a cut (3.19), and Corollary 7.8. It states the relation between vector time and reversed vector time. The binary operator \," on vectors denotes componentwise subtraction. Time vector E is a constant vector whose ith component is equal to the number of events in process i, i.e., jEij. Corollary 7.12. For any successor cut C 2 Cl , T :(E n C ) = E , T R:C . For any cut C 2 Cl , T R:(E n C ) = E , T :C . 23 Finally, we dene reversed vector timestamps for primitive events. Denition 7.13. (Timestamp function T R) The function T R : E ,! INN denes a timestamp in reversed vector time for primitive events. For any e 2 E and i 2 P , T R:e:i = jfiej. The function T R encodes exactly the set of timestamps that is obtained by applying a timestamp algorithm for ordinary vector timestamps while traversing event information backwards. As mentioned, this is computationally expensive and it is restricted to post-mortem analysis of distributed computations. Also, it is necessary to maintain two sets of timestamps for primitive events, which requires substantial extra storage. (Recall that, for primitive events, timestamps T - and T w can be easily expressed in terms of timestamp T , which means that it is not necessary to store them separately.) The following results are a direct consequence of the duality of the relations and . Therefore, they are given without proof. Corollary 7.14. For any event e 2 E , T R:f e = T R:e. Corollary 7.14 shows that the reversed timestamp of an event encodes its causal future. Corollary 7.15 is the dual of Corollary 5.7. It states that the beginning of an abstract event and the event itself share the same causal future, which is a hint that the causal future is useful for determining precedence relations among abstract events. Corollary 7.15. For an abstract event A, f bAc = f A. The following property gives a simple expression in terms of reversed vector time for the beginning of an abstract event. Property 7.16. For any abstract event A, T R:f A = (SUP a : a 2 A : T R:a). The following derivation shows the meaning of precedence between abstract events in terms of causal past and causal future. Let A and B be abstract events. Informally, A weakly precedes B if and only if the causal future of A shares some events with the causal past of B . Figure 9 claries the derivation. A!B , f The relation is reexive and transitive; Denition 4.4 (Weak precedence) g (9 e : e 2 E : (9 a : a 2 A : a e) ^ (9 b : b 2 B : e b)) , f Denitions 7.9 (Causal future) and 3.5 (Causal past) g (9 e : e 2 E : (9 a : a 2 A : e 2 f a) ^ (9 b : b 2 B : e 2 pb)) , f Denitions 7.9 (Causal future) and 5.3 (Causal past) g (9 e : e 2 E : e 2 f A ^ e 2 pB ) , f Denition of set intersection g f A \ pB =6 , f Set calculus; f A E and pB E g E n f A 6 pB , f Corollary 7.8; Theorem 3.23 (Structure of consistent time vectors) g T :(E n f A) 6 T :pB , f Corollary 7.12 g E , T R:f A 6 T :pB 24 , f Property 7.16; Property 5.11; Denition 6.1 (Timestamp T ) g E , (SUP a : a 2 A : T R:a) 6 T:B Property 7.16 gives an expression for the reversed time of the beginning of an abstract events. Expression E , (SUP a : a 2 A : T R :a) is an expression for the beginning of an abstract event in terms of ordinary vector time. It is not meaningful to compare times in the two dierent representations of time. The above result leads to the introduction of the following timestamp for abstract events. Denition 7.17. (Reversed timestamp of an abstract event) Function T R : 2E ,! INN denes the reversed timestamp of an abstract event. For any abstract event A, T R:A = (SUP a : a 2 A : T R :a). T:B A!B pB B A fA E , T R :A Figure 9: Weak precedence and reversed vector time. The introduction of the reversed vector timestamp yields the following precedence test, which is illustrated in Figure 9. Theorem 7.18. For any abstract events A and B, A ! B , E , T R:A 6 T:B. Consider the computation of Figure 6 one last time. Reversed timestamp T R:A is equal to 4 and, hence, E , T R:A equals 0; reversed timestamp T R:B equals 3, which implies that E , T R :B is equal to 1. Recall that T:A and T:B are equal to 3 and 4 respectively. As before, this yields that A ! B and B ! A. Note that the fact that T R :A is larger than T R :B does not contradict our argument of Section 5.1 that any reasonable timestamp for A is always smaller than the same timestamp for B . Timestamps T R :A and T R:B are times in reversed vector time. The corresponding timestamps in ordinary vector time, E , T R:A and E , T R :B , conrm the argument of Section 5.1. The precedence test of Theorem 7.18 is independent of the computation at the level of primitive events and does not use location information for primitive events. In this sense, it lls a gap which was left open in the previous subsection. As before, it is not dicult to see that it satises the criterion of hierarchical applicability. In terms of integer comparisons, it is reasonably ecient. Checking whether an abstract event precedes some other abstract event requires at most N comparisons. As already explained, it is more expensive in storage and computation time than any of the other tests given so far. It is also restricted to post-mortem analysis. Its main contribution is that it has a very clear intuitive meaning: Event A weakly precedes event B if and only if A begins before B ends. 8 Timestamping Convex Abstract Events 8.1 Convex Abstract Events Up to this point, no restrictions have been imposed on the structure of abstract events. However, applications do not necessarily use arbitrary subsets of events. In this section, the subclass of convex 25 abstract events is studied. The main result is that for mutually disjoint, convex abstract events, a single timestamp is sucient to characterize weak precedence. This result has been used in the implementation of the tool that was used to visualize the sample computation in Section 2. Denition 8.1. (Convex abstract events) An abstract event A is called convex if and only if (8 a0; a1; e : a0 ; a1 2 A ^ e 2 E : a0 e ^ e a1 ) e 2 A). Convexity is a meaningful requirement for abstract events for the following reason. For a convex abstract event A, there is no (primitive or abstract) event in the previous level of abstraction that is not a constituent of A but that depends on the completion of part of A such that, in turn, the completion of A depends on . In other words, there is no outside interference; a convex abstract event describes a complete unit of work. Convexity is useful as well. First, convex abstract events are easier to recognize automatically than arbitrary abstract events, because it is not necessary to lter out interfering events. Second, they are more general and therefore more widely applicable than, for example, contractions [8, 13]. A contraction is an abstract event whose internal structure is restricted in such a way that it may be considered to occur atomically. Third, since there are no interfering events, convex abstract events are considerably easier to display than arbitrary abstract events, which is very important in an application such as distributed debugging. Finally, this section shows that determining weak precedence relations among mutually disjoint, convex abstract events requires less timestamping eort than determining weak precedence relations among arbitrary abstract events. For disjoint, convex abstract events, a single timestamp proves to be sucient to characterize the weak precedence relation. Although weaker conditions than convexity and disjointness might exist, convexity and disjointness are sucient. For most applications, mutual disjointness of abstract events is not a real restriction. On the contrary, it is often a useful requirement. For example, in distributed debugging, it is not meaningful if an abstract view of a computation has overlapping abstract events. There does not seem to be any other meaningful class of abstract events that is as general as the class of convex abstract events and that combines so many useful properties. Hence, in this section, we investigate timestamps and precedence tests for convex abstract events. The example of Figure 6 in Section 5.1 already shows that for arbitrary non-convex abstract events, a single timestamp is not sucient to characterize weak precedence. Note that abstract events A and B are indeed not convex. To show that convexity alone is not a sucient condition, consider the same computation, but with abstract events A0 = fa0; b0; a1g and B 0 = fb0; a1; b1g. It is clear that A0 and B 0 are convex. However, a single timestamp is still not sucient. Since both A0 ! B0 and B0 ! A0, the timestamps for A0 and B0 should be equal. Given the assumptions of Section 5.1, this is not possible, because, as for A and B , any reasonable timestamp for A0 is smaller than the same timestamp for B 0 . Unfortunately, for the strong precedence relation, a single timestamp does not seem to be sucient, as can be explained by means of the example in Figure 10. A a0 b0 a1 b1 B Figure 10: Why a single timestamp appears to be insucient for characterizing strong precedence among disjoint, convex abstract events. 26 Obviously, abstract events A and B are convex. Furthermore, because events a0 and b1 are concurrent, A and B are unrelated by the strong precedence relation. Hence, if abstract events have only a single timestamp to characterize strong precedence, the timestamps of A and B must also be unrelated. Assuming that the only operators available to calculate timestamps are minimum and maximum operators such as \min," \max," \inf," and \sup," in combination with the assumptions about timestamps mentioned in Section 5.1, any reasonable timestamp assigned to A is always smaller than or equal to the timestamp of B . Hence, the timestamps are not unrelated and, thus, we may conclude that a single timestamp cannot be sucient. Of course, there may be some ingenious timestamping scheme which is sucient to characterize strong precedence among convex abstract events by only a single timestamp, but such a timestamp, in all likelihood, would have to be fundamentally dierent from the ones discussed here. This example raises the question of what conditions on abstract events are needed to characterize strong precedence by only a single timestamp. Although this is an interesting question, we do not try to answer it in this paper, but leave it for future work. 8.2 Characterizing Weak Precedence among Convex Abstract Events It is a nice exercise for the reader to give examples showing that any one of the timestamps for abstract events given so far, T , T - , T w , and T R cannot be used to characterize weak precedence among mutually disjoint, convex abstract events. Instead, we introduce a new timestamp T c which is a combination of T and T w . Denition 8.2. (A single timestamp for convex abstract events) The function T c : E [ 2E ,! INN denes a timestamp for primitive and abstract events as follows. For any 2 E [ 2E , T c: = T: inf T w :. For processes inside the location set of a convex abstract event, timestamp T c more or less conforms to the beginning of the event. For processes outside the location set, it represents the end. This is formalized in the following two corollaries, which follow immediately from the denitions of T , T w , and T c . Note that the corollaries are true for arbitrary primitive or abstract events. Corollary 8.3. For any primitive or abstract event 2 E [ 2E and any process i 2 l, T c::i = T w ::i. Corollary 8.4. For any primitive or abstract event 2 E [ 2E and process i 62 l, T c ::i = T::i. It follows from these corollaries and the denition of T w (Denition 7.4) that for primitive events, timestamp T c is equal to timestamp T . Timestamp T c can be used to formulate the following precedence test for mutually disjoint, convex abstract events. Theorem 8.5. (Precedence test for disjoint, convex abstract events) For any disjoint, convex abstract events A and B , A ! B , (9 i : i 2 lA : T c :A:i T c :B:i). Proof. The implication from right to left follows immediately from Theorem 7.6, Corollaries 8.3 and 8.4, and the observation that for any abstract event C , by denition, T c :C is always at most T:C . The other implication is more involved. A!B 27 ) f Derivation 7.2 g (9 i : i 2 lA : (9 b : b 2 B : (\ a : a 2 A \ Ei : pia) pib)) ) f Set calculus g (9 i : i 2 lA n lB : (9 b : b 2 B : (\ a : a 2 A \ Ei : pia) pib)) _ (9 i : i 2 lA \ lB : (9 b : b 2 B : (\ a : a 2 A \ Ei : pi a) pi b)) Let i be a process in lA n lB . (9 b : b 2 B : (\ a : a 2 A \ Ei : pi a) pi b) ) f Derivation 7.2; Denition 7.4 (Timestamp T w ) g T w :A:i T:B:i ) f Denition 8.2 (Timestamp T c); Corollaries 8.3 and 8.4 g T c:A:i T c:B:i Let i be a process in lA \ lB . (9 b : b 2 B : (\ a : a 2 A \ Ei : pi a) pi b) ) f Ei is totally ordered; B \ Ei =6 ; B is convex g (9 b : b 2 B \ Ei : (\ a : a 2 A \ Ei : pia) pi b) ) f B is convex; A and B disjoint g (\ a : a 2 A \ Ei : pi a) (\ b : b 2 B \ Ei : pib) ) f Similar to Derivation 7.2; Denition 7.4 (Timestamp T w ) g T w :A:i T w :B:i ) f Denition 8.2 (Timestamp T c); Corollary 8.3; i 2 lA \ lB g T c:A:i T c:B:i Hence, A ! B ) (9 i : i 2 lA : T c :A:i T c :B:i), which completes the proof. Note that both convexity and disjointness are indeed used in the proof, although only convexity of B is needed. 2 Timestamp T c cannot be used to formulate a precedence test for disjoint, convex abstract events that does not use location information. Consider again the computation in Figure 5(a). Since A ! B , we would like to have that T c :A T c :B . That is, the timestamps should reect the weak precedence relation between A and B . However, it is not dicult to see that for abstract event A, T:A = T w :A = (1; 2). Hence, also T c :A = (1; 2). For abstract event B , T:B = T w :B = T c :B = (2; 1), which means that T c :A 6 T c :B . Note that we do have that T c :A:1 T c :B:1, which conforms to the precedence test of Theorem 8.5 that does use location information. It is an interesting question whether there exists a precedence test for weak precedence formulated in terms of a single timestamp that does not use location information. The above example suggests that the answer is negative. Since both A ! B and B ! A, any such timestamping scheme should assign equal timestamps to A and B. It is very unlikely that such a scheme exists which is also correct for any other computation. It remains to be shown that the test of Theorem 8.5 satises the two criteria for precedence tests. Since timestamp T c is dened in terms of T and T w , this is not immediately clear. It would be inecient if it would be necessary to maintain both T and T w for all abstract events. Even worse, it would invalidate our claim that a single timestamp is sucient to determine weak precedence among abstract events. Therefore, we give an inductive denition of timestamp T c that is independent of T and T w . For this purpose, assume we have an abstraction hierarchy where co A denotes that primitive or abstract event is a constituent of abstract event A in the abstraction level immediately below the level containing A. Denition 8.6. (Inductive denition of T c) The function T ci : E [ 2E ,! INN denes a timestamp for primitive and abstract events as follows. For any e 2 E , 28 T ci :e = T:e. For any A E(, : co A ^ i 2 l : T ci ::i); T ci :A:i = (MIN (MAX : co A : T ci ::i); for i 2 lA otherwise Property 8.7. T c = T ci . Proof. We already observed that for primitive events, T c is equal to T . Hence, it follows from the denition of T ci that for primitive events, T c is equal to T ci . It follows from Corollaries 8.3 and 8.4, Denitions 6.1 (Timestamps T ) and 7.4 (Timestamp T w ), and Derivation 7.5 that for any abstract event A, ( a : a 2 A \ Ei : T:a:i); for i 2 lA T c:A:i = (MIN (MAX a : a 2 A : T:a:i); otherwise ci By means of induction, it is not dicult to show that T :A:i can be rewritten to this expression as well. Hence, also for abstract events T c is equal to T ci , which concludes the proof. (See [5] for the details of the induction proof.) 2 Denition 8.6 and Property 8.7 show that timestamp T c and, hence, the precedence test of Theorem 8.5 satisfy the criterion of hierarchical applicability. In addition, they show that indeed a single timestamp is sucient to characterize weak precedence. For primitive events, it is sucient to maintain timestamp T . For abstract events timestamp T c is sucient. Denition 8.6 gives an ecient algorithm to compute T c , which uses the same number of min/max operations as needed for the construction of any of the other timestamps for abstract events given in this paper. The maximum number of integer comparisons to check whether a convex abstract event A weakly precedes another disjoint, convex abstract event B is equal to jlAj. 9 Conclusions In this paper, we have studied causality among abstract events and its characterization in terms of vector time. As for primitive events, causality among abstract events can be expressed by means of precedence relations. Following Lamport [24], in Section 4, we introduced two precedence relations on abstract events, namely strong precedence and weak precedence. An abstract event strongly precedes another abstract event if and only if all its constituents precede all constituents of the other event. That is, the abstract event as a whole precedes the other abstract event as a whole. The strong precedence relation on abstract events has the nice property that it is a partial order. However, it is not well suited to express concurrency among abstract events or to express the fact that only part of some abstract event precedes part of some other abstract event. The weak precedence relation complements the strong precedence relation in the sense that it expresses that part of an abstract event causally aects part of another event. It also allows for a natural denition of concurrency. Unfortunately, the weak precedence relation is not a partial order. The combination of strong and weak precedence seems to be a proper characterization of causality among abstract events (see also [24]). In Section 5, we explained the main goal of this paper, namely nding characterizations of strong and weak precedence in terms of vector time. The characterization must make it possible to determine causal relationships between two abstract events eciently in a hierarchy of abstract descriptions of program behavior. We have argued that, for both weak and strong precedence, a single timestamp cannot be sucient (see also [30], where this conjecture is made as well). 29 In Sections 6 and 7, we have studied characterizations of strong and weak precedence among abstract events. Both precedence tests using location information of events and tests not using such information have been given. The following table gives an overview of the results. Strong precedence Location-independent precedence test Location-dependent precedence test Thm 6.2 (T ,T -) Thm 6.3 (T ,T -) Weak precedence Thm 7.18 (T ,T R) Thm 7.6 (T ,T w ) It proved to be relatively straightforward to arrive at the results for strong precedence. Strong precedence can be characterized eciently by means of two timestamps, T and T - , both with and without using location information. Timestamp T is an encoding of the end of an abstract event. Timestamp T - of an abstract event represents the set of primitive events preceding all constituents of the abstract event. It proved to be more dicult to achieve equivalent results for weak precedence. Using location information, it is possible to dene a timestamp T w , whose components correspond more or less to the beginning of an abstract event. In combination with T , timestamp T w gives an ecient characterization of weak precedence. In order to nd a representation for weak precedence not using location information, we had to introduce yet another timestamp, namely T R . This so-called reversed timestamp has a serious drawback. Since it is the dual of timestamp T , it is calculated by applying a timestamp algorithm while traversing event information backwards. This means that it is restricted to post-mortem analysis of distributed computations. While this may not be a problem for some applications, it may be for others. Despite this drawback, timestamp T R is useful in obtaining a better understanding of causality among abstract events. It is a very intuitive encoding of the beginning of an abstract event, similar to the way timestamp T encodes the end. It is an interesting open problem whether it is possible to characterize weak precedence without using location information in a way that is suitable for on-the-y analysis of distributed computations. The results of this paper show that two timestamps are sucient to characterize the strong or weak precedence relation on abstract events in isolation. Note, however, that three timestamps are needed to characterize the combination of strong and weak precedence. Either T , T - , and T w are needed when location information is available, or T , T - , and T R when location information is not available. The question of which timestamps and which precedence tests should be used can only be answered in a specic context. Note that some uses might even require (slightly) dierent formalizations of precedence. Although the results of this paper are then no longer directly applicable, the framework is general enough that it may be adapted to other precedence relations. Finally, in Section 8 we studied convex abstract events. The class of convex abstract events is a meaningful class of events that is widely applicable and restricted enough to simplify timestamping. A single timestamp is sucient to characterize weak precedence among mutually disjoint, convex abstract events, provided at least that location information is available. An example showing an implementation of this result in the context of distributed debugging is discussed in Section 2. Unfortunately, for strong precedence there is no such result. A simple example shows that it is unlikely that a single timestamp can be found characterizing strong precedence among mutually disjoint, convex abstract events. This raises the interesting question of what restrictions on abstract events are necessary to characterize strong precedence among abstract events by only a single timestamp. A related question is what restrictions are sucient to allow a characterization of strong or weak precedence by means of only a single timestamp while not using location information. Summarizing, the results presented in this paper are a step towards the solution of one of the 30 open problems stated in [30], namely that of assigning meaningful timestamps to arbitrary abstract events. Some questions have been answered; some others have been raised. Acknowledgments. We are grateful to the anonymous referees of an earlier version of this paper, whose comments improved our insight in the matter of causality among abstract events. This led to substantial changes and, more important, to substantial simplications of the theory presented in this paper. References 1. M. Ahuja, A.D. Kshemkalyani, and T. Carlson. A basic unit of computation in distributed systems. In IEEE Proceedings of the 10th. International Conference on Distributed Computing Systems, pages 12{19, Paris, France, May/June 1990. IEEE Computer Society Press, Los Alamitos, CA. 2. M. Ahuja and S. Mishra. Units of computation in fault-tolerant distributed systems. In IEEE Proceedings of the 14th. International Conference on Distributed Computing Systems, pages 626{633, Poznan, Poland, June 1994. IEEE Computer Society Press, Los Alamitos, CA. 3. O . Babaoglu and K. Marzullo. Consistent global states of distributed systems: Fundamental concepts and mechanisms. In S.J. Mullender, editor, Distributed Systems (2nd. edition), chapter 4, pages 55{96. Addison{Wesley, 1993. 4. T. Basten. Breakpoints and time in distributed computations. In G. Tel and P.M.B. Vitanyi, editors, Distributed Algorithms, 8th. International Workshop, WDAG '94, Proceedings, volume 857 of Lecture Notes in Computer Science, pages 340{355, Terschelling, The Netherlands, September/October 1994. Springer{Verlag, Berlin, Germany, 1994. 5. T. Basten, T. Kunz, J.P. Black, M.H. Con, and D.J. Taylor. Time and the order of abstract events in distributed computations. Computing Science Note 94/06, Eindhoven University of Technology, Department of Mathematics and Computing Science, Eindhoven, The Netherlands, February 1994. 6. P.C. Bates. Debugging heterogeneous distributed systems using event-based models of behavior. ACM Transactions on Computer Systems, 13(1):1{31, February 1995. 7. P.C. Bates and J.C. Wileden. High-level debugging of distributed systems: The behavioral abstraction approach. The Journal of Systems and Software, 3(4):255{264, December 1983. 8. E. Best and B. Randell. A formal model of atomicity in asynchronous systems. Acta Informatica, 16:93{124, 1981. 9. K. Birman, A. Schiper, and P. Stephenson. Lightweight causal and atomic group multicast. ACM Transactions on Computer Systems, 9(3):272{314, 1991. 10. B. Charron-Bost. Combinatorics and geometry of consistent cuts: Application to concurrency theory. In J.-C. Bermond and M. Raynal, editors, Distributed Algorithms, 3rd. International Workshop, WDAG '89, Proceedings, volume 392 of Lecture Notes in Computer Science, pages 45{56, Nice, France, September 1989. Springer{Verlag, Berlin, Germany. 11. B. Charron-Bost. Concerning the size of logical clocks in distributed systems. Information Processing Letters, 39:11{16, July 1991. 12. B. Charron-Bost, F. Mattern, and G. Tel. Synchronous, asynchronous, and causally ordered communication. Distributed Computing, 9(4):173{191, February 1996. 13. W.-H. Cheung. Process and event abstraction for debugging distributed programs. PhD thesis, University of Waterloo, Department of Computer Science, Waterloo, Ontario, Canada, 1989. Also appeared as CCNG Technical Report T-189, 1989. 31 14. R. Cooper and K. Marzullo. Consistent detection of global predicates. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, pages 163{173, Santa Cruz, CA, May 1991. The proceedings appeared also as ACM SIGPLAN Notices, 26(12), December 1991. 15. C.J. Fidge. Partial orders for parallel debugging. ACM Sigplan Notices, 24(1):183{194, January 1989. 16. C.J. Fidge. Logical time in distributed computing systems. IEEE Computer, 24(8):28{33, August 1991. 17. E. Fromentin and M. Raynal. Local states in distributed computations: A few relations and formulas. ACM Operating Systems Review, 28(2):65{72, 1994. 18. E. Fromentin and M. Raynal. Characterizing and detecting the set of global states seen by all observers of a distributed computation. In IEEE Proceedings of the 15th. International Conference on Distributed Computing Systems, pages 431{438. IEEE Computer Society Press, Los Alamitos, CA, 1995. 19. D. Haban and W. Weigel. Global events and global breakpoints in distributed systems. In Proceedings of the 21st. Annual Hawaii International Conference on System Sciences, Volume II, pages 166{175, Kailua-Kona, Hawaii, January 1988. 20. J. Kundu and J.E. Cuny. A scalable, visual interface for debugging with event-based behavioral abstraction. In Frontiers '95. Proceedings of the 5th. Symposium on the Frontiers of Massively Parallel Computation, pages 472{479, 1995. 21. T. Kunz. Visualizing abstract events. In Proceedings of the 1994 CAS Conference, pages 334{343, Toronto, Ontario, Canada, November 1994. IBM Canada Ltd. Laboratory, Centre for Advanced Studies. 22. T. Kunz, J.P. Black, D.J. Taylor, and T. Basten. Target-system-independent visualizations of complex distributed-application executions. The Computer Journal, special issue on software engineering for distributed systems, 1997. To appear. 23. L. Lamport. Time, clocks and the ordering of events in a distributed system. Communications of the ACM, 21(7):558{565, July 1978. 24. L. Lamport. On interprocess communication, part I: Basic formalism. Distributed Computing, 1:77{85, 1986. 25. L. Lamport. On interprocess communication, part II: Algorithms. Distributed Computing, 1:86{101, 1986. 26. K. Marzullo and L.S. Sabel. Ecient detection of a class of stable properities. Distributed Computing, 8:81{91, 1994. 27. F. Mattern. Virtual time and global states of distributed systems. In M. Cosnard et al., editor, Parallel and Distributed Algorithms, International Workshop, Proceedings, pages 215{226, Gers, France, October 1988. Elsevier Science Publishers B.V., Amsterdam, North-Holland, The Netherlands, 1989. 28. F. Mattern. On the relativistic structure of logical time in distributed systems. Bigre, 78:3{ 20, March 1992. Proceedings of the workshop: Datation et Contr^ole des Executions Reparties, December 1991, Rennes, France. This paper is also available at URL: http://www.informatik.thdarmstadt.de/VS/Publikationen/. 29. S. Pilarski and T. Kameda. Checkpointing for distributed databases: Starting from the basics. IEEE Transactions on Parallel and Distributed Systems, 3(5):602{610, 1992. 30. R. Schwarz and F. Mattern. Detecting causal relationships in distributed computations: In search of the holy grail. Distributed Computing, 7(3):149{174, March 1994. 31. R.E. Strom, D.F. Bacon, A.P. Goldberg, A. Lowry, B. Silvermann, D. Yellin, J. Russell, and S. Yemini. Hermes: Unix user's guide, version 0.8alpha. Technical report, IBM T.J.Watson Research Center, Yorktown Heights, NY, March 1992. 32 32. D.J. Taylor. A prototype debugger for Hermes. In Proceedings of the 1992 CAS Conference, Volume I, pages 29{42, Toronto, Ontario, Canada, November 1992. IBM Canada Ltd. Laboratory, Centre for Advanced Studies. 33. D. Zernik, M. Snir, and D. Malki. Using visualization tools to understand concurrency. IEEE Software, 9(3):87{92, May 1992. 33

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement