alt link

A T T E N T I O N , I N T E N T I O N S , A N D T H E S T R U C T U R E O F D I S C O U R S E

Barbara J. G r o s z

Artificial Intelligence Center and

Center for the Study of Language and Information

SRI International

Menlo Park, CA 9 4 0 2 5

C a n d a c e L . S i d n e r

B B N Laboratories Inc.

C a m b r i d g e , M A 0 2 2 3 8

In this paper we explore a new theory of discourse structure that stresses the role of purpose and processing in discourse. In this theory, discourse structure is composed of three separate but interre- lated components: the structure of the sequence of utterances (called the linguistic structure), a struc- ture of purposes (called the intentional structure), and the state of focus of attention (called the

attentional state).

The linguistic structure consists of segments of the discourse into which the

utter- ances

naturally aggregate. The intentional structure captures the discourse-relevant purposes, expressed in each of the linguistic segments as well as relationships among them. The attentional state is an abstraction of the focus of attention of the participants as the discourse unfolds. The attentional state, being dynamic, records the objects, properties, and relations that are salient at each point of the discourse. The distinction among these components is essential to provide an adequate explanation of such discourse phenomena as cue phrases, referring expressions, and interruptions.

The theory of

attention, intention,

and aggregation of utterances is illustrated in the paper with


number of example discourses. Various properties of discourse are described, and explanations for the behavior of cue phrases, referring expressions, and interruptions are explored.

This theory provides a framework for describing the processing of utterances in a discourse.

Discourse processing requires recognizing how the utterances of the discourse aggregate into segments, recognizing the intentions expressed in the discourse and the relationships among intentions, and track- ing the discourse through the operation of the mechanisms associated with attentional state. This processing description specifies in these recognition tasks the role of information from the discourse and from the participants' knowledge of the domain.


This paper presents the basic elements of a c o m p u t a - tional theory of discourse structure that simpfifies and expands upon previous work. By specifying the basic units a discourse comprises and the ways in which they can relate, a proper account of discourse structure provides the basis for an account of discourse meaning.

An account of discourse structure also plays a central role in language processing because it stipulates constraints on those portions of a discourse to which a n y given utterance in the discourse must be related.

An account of discourse structure is closely related to two questions: What individuates a discourse? W h a t makes it coherent? T h a t is, faced with a sequence of utterances, how does one k n o w whether they constitute a single discourse, several (perhaps interleaved) discourses, or none? As we develop it, the theory of discourse struc- ture will be seen to be intimately connected with two nonlinguistic notions: intention and attention. Attention is an essential factor in explicating the processing of utterances in discourse. Intentions play a primary role in explaining discourse structure, defining discourse coher- ence, and providing a coherent conceptualization of the term "discourse" itself.

Copyright 1986 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not m a d e for direct commercial advantage and the


reference and this copyright notice are included o n the first page. T o copy otherwise, or to republish, requires a fee a n d / o r specific permission.

0 3 6 2 - 6 1 3 X / 8 6 / 0 3 0 1 7 5 - 2 0 4 5 0 3 . 0 0

Computational Linguistics, Volume 12, Number 3, ~luly-September 1986 1 7 5

Barbara J. Grosz and Candace L. Sidner

Attention, Intentions, and the Structure of Discourse

The theory is a further development and integration of two lines of research: work on focusing in discourse

(Grosz 1978a, 1978b, 1981) and more recent work on intention recognition in discourse (Sidner and Israel

1981; Sidner 1983; 1985; Allen 1983, Litman 1985;

Pollack 1986). Our goal has been to generalize these constructs properly to a wide range of discourse types.

Grosz (1978a) demonstrated that the notions of focusing and task structure are necessary for understanding and producing task-oriented dialogue. One of the main generalizations of previous work will be to show that discourses are generally in some sense "task-oriented," but the kinds of " t a s k s " that can be engaged in are quite varied - some are physical, some mental, others linguis- tic. Consequently, the term " t a s k " is misleading; we therefore will use the more general terminology of

intentions (e.g., when speaking of discourse purposes) for most of what we say.

Our main thesis is that the structure of any discourse is a composite of three distinct but interacting components:

• the structure of the actual sequence of utterances in the discourse;

• a structure o f intentions;

• an attentional state.

The distinction among these components is essential to an explanation of interruptions (see Section 5), as well as to explanations of the use of certain types of referring expressions (see Section 4.2) and various other expressions that affect discourse segmentation and struc- ture (see Section 6). Most related work on discourse structure (including R e i c h m a n - A d a r 1984, Linde 1979,

Linde and G o g u e n 1978, C o h e n 1983) fails to distin- guish among some (or, in some cases, all) of these components. As a result, significant generalizations are lost, and the computational mechanisms proposed are more complex than necessary. By carefully distinguish- ing these components, we are able to account for signif- icant observations in this related work while simplifying both the explanations given and computational mech- anisms used.

In addition to explicating these linguistic phenomena, the theory provides an overall f r a m e w o r k within which to answer questions about the relevance of various segments of discourse to one another and to the overall purposes of the discourse participants. Various proper- ties of the intentional c o m p o n e n t have implications for research in natural-language processing in general. In particular, the intentions that underlie discourse are so diverse that approaches to discourse coherence based on selecting discourse relationships from a fixed set of alter- native rhetorical patterns (e.g., H o b b s 1979, M a n n and

T h o m p s o n 1983, Reichman 1981) are unlikely to suffice.

The intentional structure introduced in this paper depends instead on a small n u m b e r of structural relations that can hold between intentions. This study also reveals several problems that must be confronted in expanding speech-act-related theories (e.g., Allen and Perrault

1980, C o h e n and Levesque 1980, Allen 1983) f r o m coverage of individual utterances to coverage of extended sequences of utterances in discourse.

Although a definition of discourse must await further development of the theory presented in this paper, some properties of the p h e n o m e n a we want to explain must be specified now. In particular, we take a discourse to be a piece of language behavior that typically involves multi- ple utterances and multiple participants. A discourse m a y be produced by one or more of these participants as speakers or writers; the audience m a y comprise one or m o r e of the participants as hearers or readers. Because in multi-party conversations m o r e than one participant m a y speak (or write) different utterances within a segment, the terms speaker and hearer do not differen- tiate the unique roles that the participants maintain in a segment of a conversation. We will therefore use the terms initiating conversational participant (ICP) and other

conversational participant(s) (OCP) to distinguish the initi- ator of a discourse segment f r o m its other participants.

The ICP speaks (or writes) the first utterance of a segment, but an OCP m a y be the speaker of some subse- quent utterances. By speaking of ICPs and OCPs, we can highlight the purposive aspect of discourse. We will use the terms




only when the particular speaking/hearing activity is important for the point being made.

In most of this paper, we will be concerned with developing an abstract model of discourse structure; in particular, the definitions of the c o m p o n e n t s will abstract away f r o m the details of the discourse participants.

W h e t h e r one constructs a c o m p u t e r system that can participate in a discourse (i.e., one that


a language user) or defines a psychological theory of language use, the task will require the appropriate projection of this abstract model onto properties of a language user, and specification of additional details (e.g., specifying m e m o - ry for linguistic structure, means for encoding attentional state, and appropriate representations of intentional structure). We do, however, address ourselves directly to certain processing issues that are essential to the compu- tational validity of the [abstract] model and to its utiliza- tion for a language-processing system or psychological theory.

Finally, it is important to note that although discourse


is a significant, unsolved problem, we will not address it in this paper. An adequate theory of discourse meaning needs to rest at least partially on an adequate theory of discourse structure. Our concern is with provid- ing the latter.

The next section examines the basic theory of discourse structure and presents an overview of each of the c o m p o n e n t s of discourse structure. Section 3 analyzes two sample discourses - a written text and a fragment of task-oriented dialogue - f r o m the perspec- tive of the theory being developed; these two examples are also used to illustrate various points in the remainder of the paper. Section 4 investigates various processing


Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

issues that the theory raises. The following two sections describe the role of the discourse structure c o m p o n e n t s in explaining various properties of discourse, thereby corroborating the necessity of distinguishing a m o n g its three components. Section 7 describes the generalization from utterance-level to discourse-level intentions, estab- lishes certain properties of the latter, and contrasts them with the rhetorical relations of alternative theories.

Finally, Section 8 poses a number of outstanding research questions suggested by the theory.


Discourse structure is a composite of three interacting constituents: a linguistic structure, an intentional struc- ture, and an attentional state. These three constituents of discourse structure deal with different aspects of the utterances in a discourse. Utterances - the actual saying or writing of particular sequences of phrases and clauses


are the linguistic structure's basic elements. Intentions of a particular sort and a small n u m b e r of relationships between them provide the basic elements of the inten- tional structure. Attentional state contains information about the objects, properties, relations, and discourse intentions that are most salient at any given :point. It is an abstraction of the focus of attention of the discourse participants; it serves to summarize information from previous utterances crucial for processing subsequent ones, thus obviating the need for keeping a complete history of the discourse.

Together the three constituents of discourse structure supply the information needed by the CPs to determine how an individual utterance fits with the rest of the discourse - in essence, enabling them to figure out why it was said and what it means. The context provided by these constituents also forms the basis for certain expec- tations about what is to come; these expectations play a role in accommodating new utterances. The attentional state serves an additional purpose: namely, it furnishes the means for actually using the information in the other two structures in generating and interpreting individual utterances.


The first c o m p o n e n t of discourse structure is the struc- ture of the sequence of utterances that comprise a discourse. 1 Just as the words in a single sentence form constituent phrases, the utterances in a discourse are naturally aggregated into discourse segments. The utter- ances in a segment, like the words in a phrase, serve particular roles with respect to that segment. In addition, the discourse segments, like the phrases, fulfill certain functions with respect to the overall discourse. Although two consecutive utterances m a y be in the same discourse segment, it is also c o m m o n for two consecutive utter- ances to be in different segments. It is also possible for two utterances that are nonconsecutive to be in the same segment.

The factoring of discourses into segments has been observed across a wide range of discourse types. G r o s z

(1978a) showed this for task-oriented dialogues. Linde

(1979) found it valid for descriptions of apartments;

Linde and G o g u e n (1978) describe such structuring in the Watergate transcripts. R e i c h m a n - A d a r (1984) observed it in informal debates, explanations, and thera- peutic discourse. C o h e n (1983) found similar structures in essays in rhetorical texts. Polanyi and Scha (1986) discuss this feature of narratives.

Although different researchers with different theories have examined a variety of discourse types and found discourse-level segmentation, there has b e e n very little investigation of the extent of agreement about where the segment boundaries lie. There have been no psycholog- ical studies of the consistency of recognition of section boundaries. However, M a n n (Mann et al. 1975) asked several people to segment a set of dialogues. H e has reported [personal communication] that his subjects segmented the discourses approximately the same; their disagreements were about utterances at the boundaries of segments. 2 Several studies of spontaneously produced discourses provide additional evidence of the existence of segment boundaries, as well as suggesting some of the linguistic cues available for detecting boundaries. Chafe

(1979, 1980) found differences in pause lengths at segment boundaries. Butterworth (1975) found speech rate differences that correlated with segments; speech rate is slower at start of a segment than toward the end.

The linguistic structure consists of the discourse segments and an embedding relationship that can hold between them. As we discuss in Sections 2.2 and 5, the embedding relationships are a surface reflection of relationships a m o n g elements of the intentional structure.

It is important to recognize that the linguistic structure is not strictly decompositional. An individual segment m a y include a combination of subsegments and utterances only in that segment (and not m e m b e r s of any of its embedded subsegments). Both of the examples in Section

3 exhibit such nonstrict decompositionality. Because the linguistic structure is not strictly decompositional, various properties of the discourse (most notably the intentional structure) are functions of properties of individual utter- ances and properties of segments.

There is a t w o - w a y interaction b e t w e e n the discourse segment structure and the utterances constituting the discourse: linguistic expressions can be used to convey information about the discourse structure; conversely, the discourse structure constrains the interpretation of expressions (and hence affects what a speaker says and how a hearer will interpret what is said). N o t surprising- ly, linguistic expressions are a m o n g the primary indica- tors of discourse segment boundaries. The explicit use of certain words and phrases (e.g.,

in the first place)

and more subtle cues, such as intonation or changes in tense and aspect, are included in the repertoire of linguistic devices that function, wholly or in part, to indicate these

Computational Linguistics, Volume 12, Number 3, July-September 1986 177

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

boundaries (Grosz 1978a, Reichman-Adar 1984, Cohen

1983, Polanyi and Scha 1983, Hirschberg and Pierre- humbert 1986). Reichman (1981) discusses some words that function in this way and coined the term

clue words.

We will use the term

cue phrases

to generalize on her observation as well as many others because each one of these devices cue the hearer to some change in the discourse structure.

As discussed in Section 6, these linguistic boundary markers can be divided according to whether they explic- itly indicate changes in the intentional structure or in the attentional state of the discourse. The differential use of these linguistic markers provides one piece of evidence for considering these two components to be distinct.

Because these linguistic devices function explicitly as indicators of discourse structure, it becomes clear that they are best seen as providing information at the discourse level, and not at the sentence level; hence, certain kinds of questions (e.g., about their contribution to the truth conditions of an individual sentence) do not make sense. For example, in the utterance


Jane swims every day,



indicates an inter- ruption of the main flow of discourse rather than affect- ing in any way the meaning of

Jane swims every day.

Jane's swimming every day could hardly be fortuitous.

Just as linguistic devices affect structure, so the discourse segmentation affects the interpretation of linguistic expressions in a discourse. Referring expressions provide the primary example of this effect. 3

The segmentation of discourse constrains the use of referring expressions by delineating certain points at which there is a significant change in what entities

(objects, properties, or relations) are being discussed.

For example, there are different constraints on the use of pronouns and reduced definite-noun phrases within a segment than across segment boundaries. While discourse segmentation is obviously not the only factor governing the use of referring expressions, it is an impor- tant one.


A rather straightforward property of discourses, namely, that they (or, more accurately, those who participate in them) have an overall purpose, turns out to play a funda- mental role in the theory of discourse structure. In particular, some of the purposes that underlie discourses, and their component segments, provide the means of individuating discourses and of distinguishing discourses that are coherent from those that are not. These purposes also make it possible to determine when a sequence of utterances comprises more than one discourse.

Although typically the participants in a discourse may have more than one aim in participating in the discourse

(e.g., a story may entertain its listeners as well as describe an event; an argument may establish a person's brilliance as well as convince someone that a claim or allegation is true), we distinguish one of these purposes as foundational to the discourse. We will refer to it as the

discourse purpose

(DP). From an intuitive perspective, the discourse purpose is the intention that underlies engaging in the particular discourse. This intention provides both the reason a discourse (a linguistic act), rather than some other action, is being performed and the reason the particular content of this discourse is being conveyed rather than some other information. F o r each of the discourse segments, we can also single out one intention

- the discourse s e g m e n t purpose

(DSP). F r o m an intuitive standpoint, the DSP specifies how this segment contrib- utes to achieving the overall discourse purpose. The assumption that there are single such intentions will in the end prove too strong. However, this assumption allows us to describe the basic theory more clearly. We must leave to future research (and a subsequent paper) the exploration and discussion of the complications that result from relaxing this assumption.

Typically, an ICP will have a number of different kinds of intentions that lead to initiating a discourse. One kind might include intentions to speak in a certain language or to utter certain words. Another might include intentions to amuse or to impress. The kinds of intentions that can serve as discourse purposes or discourse segment purposes are distinguished from other intentions by the fact that they are intended to be recognized (cf. Allen and Perrault 1980, Sidner 1985), whereas other intentions are private; that is, the recognition of the DP or DSP is


to its achieving its intended effect.

Discourse purposes and discourse segment purposes share this property with certain utterance-level intentions that Grice (1969) uses in defining utterance meaning

(see Section 7).

It is important to distinguish intentions that are intended to be recognized from other kinds of intentions that are associated with discourse. Intentions that are intended to be recognized achieve their intended effect only if the intention is recognized. F o r example, a compliment achieves its intended effect only if the inten- tion to compliment is recognized; in contrast, a scream of


typically achieves its intended effect (scaring the hearer) without the hearer having to recognize the speak- er's intention.

Some intention that is private and not intended to be recognized may be the primary motivation for an ICP to begin a discourse. F o r example, the ICP may intend to impress someone or may plan to teach someone. In neither case is the ICP's intention necessarily intended to be recognized. Quite the opposite m a y be true in the case of impressing, as the ICP may not want the OCP to be aware of his intention. When teaching, the ICP may not care whether the OCP knows the ICP is teaching him or her. Thus, the intention that motivates the ICP to engage in a discourse may be private. By contrast, the discourse segment purpose is always intended to be recognized.


Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

DPs and DSPs are basically the same sorts of intentions. If an intention is a DP, then its satisfaction is a main purpose of the discourse, whereas if it is a DSP,

"then its satisfaction contributes to the satisfaction of the

DP. The following are some of the types of intentions that could serve as DP/DSPs, followed by one example of each type.

1. Intend that some agent intend to perform some phys- ical task. Example:

Intend that Ruth intend to f i x the flat tire.

2. Intend that some agent believe some fact. Example:

Intend that Ruth believe the campfire has started.

3. Intend that some agent believe that one fact supports another. Example:

lntend that Ruth believe the smell o f smoke provides evidence that the campfire is started.

4. Intend that some agent intend to identify an object

(existing physical object, imaginary object, plan, event, event sequence). Example:

Intend that Ruth intend to identify my bicycle.

5. Intend that some agent know some property of an object. Example:

Intend that Ruth know that my bicy- cle has a flat tire.

We have identified two structural relations that play an important role in discourse structure: dominance and


A n action that satisfies one intention, say DSP1, may be intended to provide part of the satisfaction of another, say DSP2. When this is the case, we will say that DSP1

contributes to

DSP2; conversely, we will say that DSP2



( o r



DSP1). The dominance relation invokes a partial ordering on DSPs that we will refer to as the domi- nance


For some discourses, including task-or- iented ones, the order in which the DSPs are satisfied may be significant, as well as being intended to be recog- nized. We will say that DSP1 satisfaction-precedes DSP2

(or, DSP1


DSP2) whenever DSP1 must be satisfied before DSP2. 4

Any of the intentions on the preceding list could be either a DP or a DSP. Furthermore, a given instance of any one of them could contribute to another, or to a different, instance of the same type. For example, the intention that someone intend to identify some object might dominate several intentions that she or he know some property of that object; likewise, the intention to get someone to believe some fact might dominate a number of contributing intentions that that person believe other facts.

As the above list makes clear, the range of intentions that can serve as discourse, or discourse segment, purposes is open-ended (cf. Wittgenstein 1953: para- graph 23), much like the range of intentions that underlie more general purposeful action. There is no finite list of discourse purposes, as there is, say, of syntactic catego- ries. It remains an unresolved research question whether there is a finite description of the open-ended set of such intentions. However, even if there were finite descriptions, there would still be no finite list of intentions from which to choose. Thus, a theory of discourse structure cannot depend on choosing the

DP/DSPs from a fixed list (cf. Reichman-Adar 1984,

Schank et al. 1982, Mann and T h o m p s o n 1983), nor on the particulars of individual intentions. Although the particulars of individual intentions, like a wide range of common sense knowledge, are crucial to understanding any discourse, such particulars cannot serve as the basis for


discourse structure.

What is essential for discourse structure is that such intentions bear certain kinds of structural relationships to one another. Since the CPs can never know the whole set of intentions that,might serve as DP/DSPs, what they must recognize is the relevant structural relationships among intentions. Although there is an infinite number of intentions, there are only a small number of relations relevant to discourse structure that can hold between them.

In this paper we distinguish between the


of the DSP and the


of it. We use the term


to refer to a semantic-like notion, namely, the complete specification of what is i n t e n d e d by whom; we use the term recognition to refer to a processing notion, namely, the processing that leads a discourse participant to identify what the intention is. These are obviously related concepts; the same information that determines a DSP may be used by an OCP to recognize it.

However, some questions are relevant to only one of them. For example, the question of when the informa- tion becomes available is not relevant to determination but is crucial to recognition. A n analogous distinction has been drawn with respect to sentence structure; the parse tree (determination) is differentiated from the pars- ing process (recognition) that produces the tree.


The third component of discourse structure, the atten- tional state, is an abstraction of the participants' focus of attention as their discourse unfolds. The attentional state is a property of the discourse itself, not of the discourse participants. It is inherently dynamic, recording the objects, properties, and relations that are salient at each point in the discourse. The attentional state is modeled by a set of

focus spaces;

changes in attentional state are modeled by a set of transition rules that specify the conditions for adding and deleting spaces. We call the collection of focus spaces available at any one time the

focusing structure

and the process of manipulating spaces


The focusing process associates a focus space with each discourse segment; this space contains those entities that are salient - either because they have been mentioned explicitly in the segment or because they became salient in the process of producing or compre- hending the utterances in the segmfnt (as in the original work on focusing: Grosz 1978a). The focus space also includes the DSP; the inclusion of the purpose reflects the


Linguistics, Volume 12, Number 3, July-September 1986 179


J. Grosz

and Candaee L. Sidner Attention, Intentions, and the Structure of Discourse

fact that the CPs are focused not only on what they are talking about, but also on why they are talking about it.

To understand the attentional state c o m p o n e n t of discourse structure, it is important not to confuse it with two other concepts. First, the attentional state c o m p o - nent is not equivalent to cognitive state, but is only one of its components. Cognitive state is a richer structure, one that includes at least the knowledge, beliefs, desires, and intentions of an agent, as well as the cognitive c o r r e - ' lates of the attentional state as modeled in this paper.

Second, although each focus space contains a DSP, the focus structure does


include the intentional structure as a whole.

Figure 1 illustrates how the focusing structure, in addi- tion to modeling attentional state, serves during process- ing to coordinate the linguistic and intentional structures.

The discourse segments (to the left of the figure) are tied to focus spaces (drawn vertically down the middle of the figure). The focusing structure is a stack. Information in lower spaces is usually accessible f r o m higher ones (but less so than the information in the higher spaces); we use a line with intersecting hash marks to denote when this is not the case. Subscripted terms are used to indicate the relevant contents of the focus spaces because the spaces contain representations of entities (i.e., objects, proper- ties, and relations) and not linguistic expressions.

Part one of Figure 1 shows the state of focusing when discourse segment DS2 is being processed. Segment DS1 gave rise to FS1 and had as its discourse purpose DSP I.

The properties, objects, relations, and purpose repres- ented in FS1 are accessible but less salient than those in

FS2. DS2 yields a focus space that is stacked relative to

FSl because DSP 1 of DSl dominates DS2's DSP, DSP 2. As a result of the relationship b e t w e e n FS1 and FS2, reduced noun phrases will be interpreted differently in DS2 than in DS1. F o r example, if some red balls exist in the world one of which is represented in DS2 and another in FS1, then

the red ball

used in DS2 will be understood to m e a n the particular red ball that is represented in DS2. If, however, there is also a green truck (in the world) and it is represented only in FS1,

the green truck

uttered in DS2 will be understood as referring to that green truck.

Part two of Figure 1 shows the state of focusing when segment DS3 is being processed. FS2 has been p o p p e d from the stack and FS3 has been pushed onto it because the DSP of DS3, DSP3, is dominated solely by DSP 1, not by DSP 2. In this example, the intentional structure includes only dominance relationships, although, it may, in general, also include satisfaction-precedence relation- ships.

The stacking of focus spaces reflects the relative sali- ence of the entities in each space during the correspond- ing segment's portion of the discourse. The stack relationships arise f r o m the ways in which the various

DSPs relate; information about such relationships is represented in the dominance hierarchy (depicted on the right in the figure). The spaces in Figure 1 a r e s n a p s h o t s illustrating the results of a sequence of operations, such as pushes onto and pops from a stack. A push occurs when the DSP for a new segment contributes to the DSP for the immediately preceding segment. W h e n the DSP contributes to some intention higher in the dominance hierarchy, several focus spaces are p o p p e d f r o m the stack before the new one is inserted.

T w o essential properties of the focusing structure are now clear. First, the focusing structure is parasitic upon the intentional structure, in the sense that the relation- ships among DSPs determine pushes and pops. N o t e however, that the relevant operation m a y sometimes be indicated in the language itself. F o r example, the cue word


often indicates the start of a segment whose

DSP contributes to the DSP of the preceding segment.

Second, the focusing structure, like the intentional and linguistic structures, evolves as the discourse proceeds.

N o n e of them exists a priori. E v e n in those rare cases in which an ICP has a complete plan for the discourse prior to uttering a single word, the intentional structure is constructed by the CPs as the discourse progresses. This discourse-time construction of the intentional structure m a y be more obviously true for speakers and hearers of spoken discourse than for readers and writers of texts, but, even for the writer, the intentional structure is devel- oped as the text is being written.

Figure 1 illustrates some fundamental distinctions b e t w e e n the intentional and attentional c o m p o n e n t s of discourse structure. First, the dominance hierarchy provides, among other things, a complete record of the discourse-level intentions and their dominance (as well as, when relevant, satisfaction-precedence) relationships, whereas the focusing structure at any one time can essen- tially contain only information that is relevant to purposes in a portion of the dominance hierarchy.

Second, at the conclusion of a discourse, if it completes normally, the focus stack will be empty, while the inten- tional structure will have b e e n fully constructed. Third, when the discourse is being processed, only the atten- tional state can constrain the interpretation of referring expressions directly.

We can now also clarify some misinterpretations of focus-space diagrams and task structure in our earlier work ( G r o s z 1978a, 1981, 1974). The focus-space hier- archies in that work are best seen as representing atten- tional state. The task structure was used in two ways:

1. to represent c o m m o n knowledge about the task;

2. as a special case of the intentional structure we posit in this paper.

Although the same representational scheme was used for encoding the focus-space hierarchies and the task struc- ture (partitioned networks: Hendrix 1979), the two structures were distinct.

180 Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse






,tM m i . l h , i . /

~ I l l . / ~ . IW l

DS3 a.J




G I V E ~ , ~
















~ ' l ~


DS2 ~ , ~

• - - ' -








\ T:X;T OF '>









Figure 1. Discourse Segments, Focus Spaces and Dominance Hierarchy.

Computational Linguistics, Volume 12, Number 3, July-September 1986 1 8 1

Barbara J. Grosz and Candace L. Sidner

Attention, Intentions, and the Structure of Discourse

Several researchers (e.g., Linde and G o g u e n 1978,

R e i c h m a n - A d a r 1984) misinterpreted the original research in an unfortunate and unintended way: they took the focus-space hierarchy to include (or be identical to) the task structure. The conflation of these two struc- tures forces a single structure to contain information about attentional state, intentional relationships, and general task, knowledge. It prevents a theory from accounting adequately for certain aspects of discourse, including interruptions (see Section 5).

A second instance of confusion was to infer (incor- rectly) that the task structure was necessarily a prebuilt tree. If the task structure is taken to be a special case of intentional structure, it becomes clear that the tree struc- ture is simply a more constrained structure than one might require for other discourses; the nature of the task related to the task-oriented discourse is such that the dominance hier~irchy of the intentional structure of the dialogue has both dominance and satisfaction-precedence relationships, 5 while other discourses m a y not exhibit significant precedence constraints among the DSPs.

Furthermore, there has never been any reason to assume that the task structures in task-oriented dialogues are prebuilt, any more than the intentional structure of any other kind of discourses. It is rather that one objective of discourse theory (not a topic considered here, however) is to explain how the OCP builds up a model of the task structure by using information supplied in the discourse.

However, it is important to note that conflating the aforementioned two roles of information about the task itself (as a portion of general c o m m o n s e n s e knowledge and as a special case of intentional structure) was regret- table, as it fails to make an important distinction.

Furthermore, as is clear when intentional structures are considered more generally, such a conflation of roles does not allow for differences between what one knows about a task and one's intentions for (or what one makes explicit in discourse about) performing a task.

In summary, the focusing structure is the central repository for the contextual information needed to proc- ess utterances at each point in the discourse. It distin- guishes those objects, properties, and relations that are most salient at that point and, moreover, has links to relevant parts of b o t h the linguistic and intentional struc- tures. During a discourse, an increasing amount of infor- mation, only some of which continues to be needed for the interpretation of subsequent utterances, is discussed.

Hence, it becomes more and more necessary to be able to identify relevant discourse segments, the entities they m a k e salient, and their DSPs. The role of attentional state in delineating the information necessary for under- standing is thus central to discourse processing.


T o illustrate the basic theory we have just sketched, we will give a brief analysis of two kinds of discourse: an argument f r o m a rhetoric text and ~i task-oriented dialogue. F o r each example we discuss the segmentation of the discourse, the intentions that underlie this segmen- tation, and the relationships a m o n g the various DSPs. In each case, we point out some of the linguistic devices used to indicate segment boundaries as well as some of the expressions whose interpretations depend on those boundaries. The analysis is concerned with specifying certain aspects of the behavior to be explicated by a theory of discourse; the remainder of the p a p e r provides a partial account of this behavior.


Our first example is an argument taken f r o m a rhetoric tdxt (Holmes and Gallagher 19176). It is an example used by C o h e n (1983) in her work on the structure of argu- ments. Figure 2 shows the dialogue and the eight discourse segments of which it is composed. The division of the argument into separate (numbered) clauses is

Cohen's, but our analysis of the discourse structure is different, since in C o h e n ' s analysis, every utterance is directly subordinated to another utterance, and there is only one structure to encode linguistic segmentation and the purposes of utterances. Although b o t h analyses segment utterance (4) separately f r o m utterances (1-3), some readers place this utterance in DS1 with utterances

(1) through (3); this is an example of the kind of disa- greement about b o u n d a r y utterances found in M a n n ' s data (as discussed in Section 2.1). The two placements lead to slightly different DSPs, but not to radically differ- ent intentional structures. Because the differences do not affect the major thrust of the argument, we will discuss only one segmentation.


Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner

Attention, Intentions, and the Structure of Discourse





I . _ _ .





1. The " m o v i e s " a r e so a t t r a c t i v e t o t h e g r e a t A m e r i c a n p u b l i c ,

2. e s p e c i a l l y t o y o u n g p e o p l e ,

3. t h a t it is t i m e t o t a k e c a r e f u l t h o u g h t a b o u t t h e i r e f f e c t on mind a n d m o r a l s .

4. O u g h t a n y p a r e n t t o p e r m i t h i s c h i l d r e n t o a t t e n d a m o v i n g p i c t u r e s h o w o f t e n o r w i t h o u t b e i n g q u i t e c e r t a i n of t h e s h o w h e p e r m i t s t h e m t o s e e ?

5. No o n e c a n d e n y , of c o u r s e , t h a t g r e a t e d u c a t i o n a l a n d e t h i c a l g a i n s m a y b e m a d e t h r o u g h t h e m o v i e s

6. b e c a u s e of t h e i r a s t o n i s h i n g v i v i d n e s s .

7. B u t t h e i m p o r t a n t f a c t t o b e d e t e r m i n e d is t h e t o t a l r e s u l t of c o n t i n u o u s a n d i n d i s c r i m i n a t e a t t e n d a n c e o n s h o w s of t h i s k i n d .

8. C a n it b e o t h e r t h a n h a r m f u l ?

9. In t h e f i r s t p l a c e t h e c h a r a c t e r of t h e p l a y s is s e l d o m of t h e b e s t .

10. One h a s o n l y t o r e a d t h e e v e r - p r e s e n t " m o v i e " b i l l b o a r d t o s e e h o w c h e a p , m e l o d r a m a t i c a n d v u l g a r m o s t of t h e p h o t o p l a y s a r e .

11. E v e n t h e b e s t p l a y s , m o r e o v e r , a r e b o u n d t o b e e x c i t i n g a n d o v e r - e m o t i o n a l .

12. W i t h o u t s p o k e n w o r d s , f a c i a l e x p r e s s i o n a n d g e s t u r e m u s t c a r r y t h e m e a n i n g :

13. b u t o n l y s t r o n g e m o t i o n , o r b u f f o o n e r y c a n b e r e p r e s e n t e d t h r o u g h f a c i a l e x p r e s s i o n a n d g e s t u r e .

14. The m o r e r e a s o n a b l e a n d q u i e t a s p e c t s of life a r e n e c e s s a r i l y n e g l e c t e d .

15. How c a n o u r y o u n g p e o p l e d r i n k in t h r o u g h t h e i r e y e s a c o n t i n u o u s s p e c t a c l e of i n t e n s e a n d s t r a i n e d a c t i v i t y a n d f e e l i n g w i t h o u t h a r m f u l e f f e c t s ?

16. P a r e n t s a n d t e a c h e r s will do well t o g u a r d t h e y o u n g a g a i n s t o v e r i n d u l g e n c e in t h e t a s t e f o r t h e " m o v i e " .

Figure 2. The Movies Essay.

Computational Linguistics, Volume 12, Number 3, July-September 1986


Barbara J. Grosz and Candace L. Sidner

Attention, Intentions, and the Structure of Discourse

Figure 3 lists the primary c o m p o n e n t of the DSP for each of these segments and Figure 4 shows the domi- nance relationships that hold among these intentions. In

Section 7 we discuss additional components of the discourse segment purpose; because these additional components are more important for completeness of the theory than for determining the essential dominance and satisfaction-precedence relationships b e t w e e n DSPs, we omit such details here. Rather than commit ourselves to a formal language in which to express the intentions of the discourse, we will use a shorthand notation and

English sentences that are intended to be a gloss for a formal statement of the actual intentions.

IO: (Intend I C P (Believe O C P P O ) ) where PO = the proposition that parents and teachers should guard the young from overindulgence in the movies.

I1: (Intend I C P (Believe O C P P1)) where P1 = the proposition that it is time to consider the effect of movies on mind and morals.

I2: (Intend I C P (Believe O C P P2)) where P2 = the proposition that young people cannot drink in through their eyes a continuous spectacle of intense and strained activity without harmful effects.

13: (Intend

I C P (Believe O C P P3)) where P3 -- the proposition that it is undeniable that great educational and ethical gains m a y be made through the movies.

14: (Intend I C P (Believe O C P P4)) where P4 = the proposition that although there are gains, the total result of continuous and indiscriminate attendance at movies is harmful.

15: (Intend I C P (Believe O C P P5)) where P5 = the proposition that the content of movies (i.e., the character of the plays) is not the best.

I6: (Intend I C P (Believe O C P P6)) where P6 = the proposition that the stories (i.e., the plays) in movies are excit- ing and over-emotional.

17: (Intend I C P (Believe O C P P7)) where P7 = the proposition that movies portray strong emotion and b u f f o o n e r y while neglecting the quiet and reasonable aspects of life.

Figure 3. Primary intentions of the DSPs for M o v i e s e s s a y .

I0 D O M I1

I0 D O M 12

12 D O M 13

12 D O M 14

14 D O M 15

14 D O M 16

16 D O M 17

Figure 4. Dominance relationships for the DSPs of the Movies essay.

184 Computational Linguistics, Volume 12, Number 3, July-September 1986


J. Grosz and Candaee L. Sidner

~ f

Attention, Intentions, and the Structure of Discourse

All the primary intentions for this essay are intentions that the reader (OCP) come to believe some proposition.

Some of these propositions, such as P5 and P6, can be read off the surface utterances directly. Other prop- ositions and the intemions of which they are part, such as

P2 and 12, are moCe indirect. Like the Gricean utter- ance'-level intentions (the analogy with these will be explored in Section 7), DSPs may or may not be directly expressed in the discourse. In particular, they may be expressed in any of the following ways:

1. explicitly as in I intend f o r you to believe that it's time

to consider the effects o f movies on m i n d and morals.

[which would produce I1 ]

2. directly, in one utterance, as in (3) [which does produce I 1 ]

3. directly, through multiple utterances, as in using (7) and the utterance It can only be h a r m f u l to produce


4. by derivation, in one or more utterances with an associ-

ated context, as in (15) to produce 12.

Not only may information about the DSP be conveyed by a number of features of the utterances in a discourse, but it also may come in any utterance in a segment. For example, although I0 is the DP, it is stated directly only in the last utterance of the essay. This leads to a number of questions about the ways in which OCPs can recognize discourse purposes, and about those junctures at which they need to do so. We turn to these matters directly in

Subsection 4.1.

This discourse also provides several examples of the different kinds of interactions that can hold between the linguistic expressions in a discourse and the discourse structure. It includes examples of the devices that may be used to mark overtly the boundaries between discourse segments - examples of the use of aspect, mood, and particular cue phrases - as well as of the use of referring expressions that are affected by discourse segment boun- daries.

The use of cue phrases to indicate discourse bounda- ries is illustrated in utterances (9) and (11); in (9) the phrase in the first place marks the beginning of DS5 while in (11) moreover ends DS5 and marks the start of DS6.

These phrases also carry information about the inten- tional structure, namely, that DSP5 and DSP6 are domi- nated by DSP4. In some cases, cue phrases have multiple functions; they convey propositional content as well as marking discourse segment boundaries.. The but in utter- ance (7) is an example of such a multiple function use.

The boundaries between DS1 and DS2, DS4 and DS5, and DS4 and DS2 reflect changes of aspect and mood.

The switch from declarative, present tense to interroga- tive modal aspect does not in itself seem to signal the boundary (for recognition purposes) in this discourse unambiguously, but it does indicate a possible line of demarcation which, in fact, is valid.

The effect of segmentation on referring expressions is shown by the use of the generic noun phrase a moving

picture show in (4). Although a reference to the movies was made with a pronoun (their) in (3), a full noun phrase is used in (4). This use reflects, and perhaps in part marks, the boundary between the segments DS1 and


Finally, this discourse has an example of the trade-off between explicitly marking a discourse boundary, as well as the relationship between the associated DSPs, and reasoning about the intentions themselves. There is no overt linguistic marker of the beginning of DS7; its sepa- ration must be inferred from DSP7 and its relationship to



The second example is a fragment of a task-oriented dialogue taken from Grosz (1981; it is from the same corpus that was used by Grosz 1974). Figure 5 contains the dialogue fragment and indicates the boundaries for its main segments. 7 Figure 6 gives the primary component of the DSPs for this fragment and shows the dominance relationships between them.

In contrast with the movies essay, the primary compo- nents of the DSPs in this dialogue are mostly intentions of the segment's ICP that the OCP intend to perform some action. Also, unlike the essay, the dialogue has two agents initiating the different discourse segments. In this particular segment, the expert is the ICP of DS1 and DS5, while the apprentice is the ICP of DS2-4. To furnish a complete account of the intentional structure of this discourse, one must be able to say how the satisfaction of one agent's intentions can contribute to satisfying the intentions of another agent. Such an account is beyond the scope of this paper, but in Section 7 we discffss some of the complexities involved in providing one (as well as its role in discourse theory).

For the purposes of discussing this example, though, we need to postulate two properties of the relationships among the participants' intentions. These properties seem to be rooted in features of cooperative behavior and depend on the two participants' sharing some partic- ular knowledge of the task. First, it is a shared belief that, unless he states otherwise, the OCP will adopt the intention to perform an action that the ICP intended him to. Second, in adopting the intention to carry out that action, the OCP also intends to perform whatever subactions are necessary. Thus, once the apprentice, intends to remove the flywheel, he also commits himself to the collateral intentions of loosening the setscrews and pulling the wheel off. Note, however, that not all the subactions need to be introduced explicitly into the discourse. The apprentice may do several actions that are never mentioned, and the expert may assume that these are being undertaken on the basis of other information that the apprentice obtains. The partiality of the inten- tional structure stems to some extent from these charac- teristics of intentions and actions.

Computational Linguistics, Volume

12, Number 3, July-September 1986 185

Barbara J. Grosz and Candace L. Sidner

Attention, Intentions, and the Structure of Discourse






E: F i r s t y o u h a v e t o r e m o v e t h e f l y w h e e l .

A: How do I r e m o v e t h e f l y w h e e l ?

E: F i r s t , l o o s e n t h e t w o a l l e n h e a d s e t s c r e w s h o l d i n g it t o t h e s h a f t , t h e n p u l l it off.







A: OK.

I c a n o n l y f i n d o n e s c r e w . W h e r e ' s t h e o t h e r o n e ?

E: On t h e h u b of t h e f l y w h e e l .

A: T h a t ' s t h e o n e I f o u n d . W h e r e ' s t h e o t h e r o n e ?

E: A b o u t n i n e t y d e g r e e s a r o u n d t h e h u b f r o m t h e f i r s t o n e .

A: I d o n ' t u n d e r s t a n d . I c a n o n l y f i n d o n e . Oh



w a i t , y e s I t h i n k I w a s on t h e w r o n g w h e e l .

E: S h o w me w h a t y o u a r e d o i n g .

( 1 1 )

A: I w a s o n t h e w r o n g w h e e l a n d l c a n f i n d t h e m b o t h now.

( 1 2 )

The t o o l I h a v e is a w k w a r d . Is t h e r e a n o t h e r t o o l t h a t I c o u l d u s e i n s t e a d ?




E: S h o w me t h e t o o l y o u a r e u s i n g .

A: OK.

E: Are y o u s u r e y o u a r e u s i n g t h e r i g h t s i z e k e y ?



DS4 (18)

A: I'll t r y s o m e o t h e r s .

I f o u n d a n a n g l e I c a n g e t a t it.

The t w o s c r e w s a r e l o o s e , b u t I'm h a v i n g t r o u b l e g e t t i n g t h e w h e e l off.


E: Use t h e w h e e l p u l l e r " Do y o u k n o w h o w t o u s e i t ?




A: No.

E: Do y o u k n o w w h a t it l o o k s l i k e ?

A: Yes.


E: S h o w it t o me p l e a s e .



A: OK.

E: G o o d , L o o s e n t h e s c r e w in t h e c e n t e r a n d p l a c e t h e j a w s a r o u n d t h e h u b of t h e w h e e l , t h e n t i g h t e n t h e s c r e w o n t o t h e c e n t e r of t h e s h a f t . T h e w h e e l s h o u l d s l i d e off.

Figure 5. A segment of a task-oriented dialogue.

As in the movies essay, some of the DSPs for this dialogue are expressed directly in utterances. F o r instance, utterances (1), (5), and (12) directly express the primary components of DSP1, DSP2 and DSP3, respectively. The primary component of DSP4 is a derived intention. The surface intention of

but I'm having trouble getting the wheel o f f

is t h a t the apprentice intends the expert to believe that the apprentice is having trouble taking off the flywheel. 14 is derived from the utterance and its surface intention, as well as from features of discourse, conventions about what intentions are associated with the

1 am having trouble doing X

type of utterance, and what the 1CP and OCP know about the task they have undertaken.

The dominance relationship that holds between I1 and

12, as well as the one that holds between I1 and 13, may seem problematic at first glance. It is not clear how locating any single setscrew contributes to removing the flywheel. It is even less clear how, in and of itself, identi- fying a n o t h e r tool does. Two facts provide the link: first, that the apprentice (the OCP of DS1) has taken on the task of removing the flywheel; second, that the appren- tice and expert share certain knowledge about the task.

Some of this shared task knowledge comes from the discourse per se [e.g., utterance (3)], but some of it


Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

comes from general knowledge, perceptual information, and the like. Thus, a combination of information is rele- vant to determining 12 and 13 and their relationships to

I1, including all of the following: the fact that I1 is part of the intentional structure, the fact that the apprentice is currently working on satisfying I1, the utterance-level intentions of utterances (5) and (12), and general know- ledge about the task.

The satisfaction-precedence relations among 12, 13, and 14 are not communicated directly in the dialogue, but, like dominance relations, depend on domain know- ledge. One piece of relevant knowledge is that a satisfac- tion precedence relation exists between loosening the setscrews and pulling off the flywheel. That relation is shared knowledge that is stated directly (First loosen ....


pull). The relation, along with the fact that both 12 and 13 contribute to loosening the setscrews, and that 14 contributes to pulling off the flywheel, makes it possible to conclude 13 SP 14 and 12 SP 14. To conclude that 12

SP 13, the apprentice must employ knowledge of how to go about loosening screw-like objects.

The dominance and satisfaction-precedence relations for this task-oriented fragment form a tree of intentions rather than just a partial ordering. In general, however, for any fragment, task-oriented or otherwise, this is not necessary.

It is essential to notice that the intentional structure is neither identical to nor isomorphic to a general plan for removing the flywheel. It is not identical because a plan encompasses more than a collection of intentions and relationships between them (compare Pollack's (1986) critique of AI planning formalisms as the basis for infer- ring intentions in discourse). It is not isomorphic because the intentional structure has a different substructure f r o m the general plan for removing the flywheel. In addition to the intentions arising from steps in the plan, the inten- tional structure typically contains DSPs corresponding to intentions generated by the particular execution of the task and the dialogue. F o r example, the general plan for the disassembly of a flywheel includes subplans for loos- ening the setscrews and pulling off the wheel; it might also include subplans (of the loosening step) for finding the setscrews, finding a tool with which to loosen the screws, and loosening each screw individually. However, this plan would not contain contingency subplans for what to do when one cannot find the screws or realizes that the available tool is unsatisfactory. Intentions I2 and

I3 stem from difficulties encountered in locating and loosening the setscrews. Thus, the intentional structure for this fragment is not isomorphic to the general plan for removing the flywheel.

Utterance (18) offers another example of the differ- ence between the intentional structure and a general plan for the task. This utterance is part of DS4 - not just part of DS1 - even though it contains references to more than one single part of the overall task (which is what I1 is about). It functions to establish a new DSP, 14, as most salient. Rather than being regarded as a report on the overall status of the task, the first clause is best seen as modifying the DSP. 8 With it, the apprentice tells the expert that the trouble in removing the wheel is not with the screws. Thus, although general task knowledge is used in determining the intentional structure, it is not identical to it.

In this dialogue, there are fewer instances in which cue phrases are employed to indicate segment boundaries than occur in the movies essay. The primary example is the use of fi rst in (1) to mark the start of the segment and to indicate that its DSP is the first of several intentions whose satisfaction will contribute to satisfying the larger discourse of which they are a part.

Primary Intentions:

II" (Intend Exper t (Intend Apprentic e ( R e m o v e A flywheel)))

I2: (Intend A (Intend E (Tell E A (Location other setscrew))))

I3: (Intend A (Intend E (Identify E A another tool)))

I4: (Intend A (Intend E (Tell E A ( H o w ( G e t o f f A wheel)))))

I5: (Intend E ( K n o w - H o w - t o A (Use A wheelpuller)))

Dominance Relationships:

I1 D O M I2

I1 D O M I3

I1 D O M I4

I4 D O M I5

Satisfaction-Precedence Relationships:








6. Intentional structure for the task-oriented dialogue segment.

Computational Linguistics, Volume 12, Number 3, July-September

1986 187

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

The dialogue includes a clear example of the influence of discourse structure on referring expressions. The phrase

the screw in the center

is used in (25) to refer to the center screw of the wheelpuller, not one of the two setscrews mentioned in (18). This use of the phrase is possible because of the attentional state of the discourse structure at the time the phrase is uttered.


In previous sections of the paper, we abstracted from the cognitive states of the discourse participants. The various components of discourse structure discussed so far are properties of the discourse itself, not of the discourse participants. To use the theory in constructing computa- tional models requires determining how each of the indi- vidual components projects onto the model of an individual discourse participant. In this regard, the prin- cipal issues include specifying

1. how the ICP indicates and the OCP recognizes the beginning and end of a discourse segment,

2. how the OCP recognizes the discourse segment purposes, and

3. how the focus space stack operates.

In essence, the OCP must judge for each utterance whether it starts a new segment, ends the current one

(and possibly some of its embedding segments), or contributes to the current one. The information available to the OCP for recognizing that an utterance starts a new segment includes any explicit linguistic cues contained in the utterance (see Section 6 9 ) as well as the relationship between its utterance-level intentions and the active DSPs

(i.e., those in some focus space that is still on the stack).

Likewise, the fact that an utterance ends a segment may be indicated explicitly by linguistic cues or implicitly from its utterance-level intentions and their relationship to elements of the intentional structure. If neither of these is the case, the utterance is part of the current segment.

Thus, intention recognition and focus space management play key roles in processing. Moreover, they are also related: the intentional structure is a primary factor in determining focus space changes, and the focus space structure helps constrain the intention recognition proc- ess.


The recognition of DP/DSPs is the central issue in the computational modeling of intentional structure. If, as we have claimed, for the discourse to be coherent and comprehensible, the OCP must be able to recognize both the DP/DSPs 10 and relationships (dominance and satis- faction-precedence) between them, then the question of how the OCP does so is a crucial issue.

For the discourse as a whole, as well as for each of its segments, the OCP must identify both the intention that serves as the discourse segment purpose and its relation- ship to other discourse-level intentions. In particular, the

OCP must be able to recognize which other DSPs that specific intention dominates and is dominated by, and, where relevant, with which other DSPs it has satisfac- tion-precedence relationships. Two issues that are central to the recognition problem are what information the OCP can utilize in effecting the recognition and at what point in the discourse that information becomes available.

A n adequate computational model of the recognition process depends critically on an adequate theory of intention and action; this, of course, is a large research problem in itself and one not restricted to matters of discourse. The need to use such a model for discourse, however, adds certain constraints on the adequacy of any theory or model. Pollack (1986) describes several prop- erties such theories and models must possess if they are to be adequate for supporting recognition of intention in single-utterance queries; she shows how current AI plan- ning models are inadequate and proposes an alternative planning formalism. The need to enable recognition of discourse-level intentions leads to yet another set of requirements.

As will become clear in what follows, the information available to the OCP comes from a variety of sources.

Each of these can typically provide partial information about the DSPs and their relationships. These sources are each partially constraining, but only in their ensemble do they constrain in full. To the extent that more informa- tion is furnished by any one source, commensurately less is needed from the others. The overall processing model must be one of constraint satisfaction that can operate on partial information. It must allow for incrementally constraining the range of possibilities on the basis of new information that becomes available as the segment progresses.


At least three different kinds of information play a role in the determination of the DSP: specific linguistic markers, utterance-level intentions, and general knowledge about actions and objects in the domain of discourse. Each plays a part in the OCP's recognition of the DSP and can be utilized by the ICP to facilitate this recognition.

Cue phrases are the most distinguished linguistic means that speakers have for indicating discourse segment boundaries and conveying information about the

DSP. Recent evidence by Hirschberg and Pierrehumbert

(i~986) suggests that certain intonational properties of utterances also provide partial information about the DSP relationships. Because some cue phrases may be used as clausal connectors, there is a need to distinguish their discourse use from their use in conveying propositional content at the utterance level. F o r example, the word


functions as a boundary marker in utterance (7) of the discourse in Section 3.1, but it can also be used solely (as in the current utterance) to convey propositional content

(e.g., the conjunction of two propositions) and serve to connect two clauses within a segment.

188 Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

As discussed in Section 6, cue phrases can provide information about dominance and satisfaction-prece- dence relationships between segments' DSPs. However, they may not completely specify which DSP dominates or satisfaction-precedes the DSP of the segment they start.

Furthermore, cue phrases that explicitly convey informa- tion only about the attentional structure (see Section 6) may be ambiguous about the state to which attention is to shift. For example, if there have been several inter- ruptions (see Section 5), the phrase but anyway indicates a return to some previously interrupted discourse, but does not specify which one. Although cue phrases do not completely specify a DSP, the information they provide is useful in limiting the options to be considered.

The second kind of information the OCP has available is the utterance-level intention of each utterance in the discourse. As the discussion of the movies example

(Section 3.1) pointed out, the DSP may be identical to the utterance-level intention of some utterance in the segment. Alternatively, the DSP may combine the intentions of several utterances, as is illustrated in the following discourse segment:

I want you to arrange a trip for me to Palo Alto.

It will be for two weeks.

I only fly on TWA.

The DSP for this segment is, roughly, that the ICP intends for the OCP to make (complete) trip arrange- ments for the ICP to go to Palo Alto for two weeks, under the constraint that any flights be on TWA. The

Gricean intentions for these three utterances are as follows:

Utterance I : ICP intends that OCP believe that ICP intends that OCP intend to make trip plans f o r ICP to go to

Palo Alto


ICP intends that OCP believe that ICP intends OCP to believe that the trip will last two weeks


ICP intends that OCP believe that ICP intends OCP to believe that ICP flies only on


These intentions must be combined in some way to produce the DSP. The process is quite complex, since the

OCP must recognize that the reason for utterances 2 and

3 is not simply to have some new beliefs about the ICP, but to use those beliefs in arranging the trip. While this example fits the schema of a request followed by two informings, schemata will not suffice to represent the behavior as a general rule. A different sequence of utter- ances with different utterance-level intentions can have the same DSP; this is the case in the following segment:

S 1: Have I told you yet to arrange my trip to Palo Alto?

Remember that I will fly only on TWA. OK?

$2: OK.

$3: I ' m planning on staying for two weeks.

It is possible for a sequence that consists of a request followed by two informings not to result in a modifica- tion of the trip plans. F o r example, in the following sequence the third utterance results in changing the way the arrangements are made, rather than constraining the nature of the arrangements themselves.

I want you to arrange a two-week trip for me to Palo

Alto. I fly only on TWA. The rates go up tomorrow, so you'll want to call today.

Not only is the contribution of utterance-level intentions to DSPs complicated, but in some instances the

DSP for a segment may both constrain and be partially determined by the Gricean intention for some utterance in the segment. F o r example, the Gricean-intention for utterance (15) in the movies example (Section 3.1) is derived from a combination of facts about the utterance itself, and from its place in the discourse. On the surface,

(15) appears to be a question addressed to the OCP; its intention would be roughly that the ICP intends the OCP to believe that the ICP wants to know how young people, etc. But (15) is actually a rhetorical question and has a very different intention associated with it - namely, that the ICP intends the OCP to believe proposition P2

(namely, that young people cannot drink in through their eyes a continuous spectacle of intense and strained activ- ity without harniful effects). In this example, this partic- ular intention is also the primary component of the DSP.

The third kind of information that plays a role in determining the DP/DSPs is shared knowledge about actions and objects in the domain of discourse. This shared knowledge is especially important when the linguistic markers and utterance-level intentions are insufficient for determining the DSP precisely.

In Section 7 we introduce two relations, a supports relation between propositions and a generates relation between actions, and present two rules stating equiv- alences; one links a dominance relation between two

DSPs with a supports relation between propositions and the other links a dominance relation between DSPs to a generates relation between actions. Use of these rules in one direction allows for (partially) determining what supports or generates relationship holds from the domi- nance relationship. But the rules can be used in the oppo- site direction also: if, from the content of utterances and reasoning about the domain of discourse, a supports or generates relationship can be determined, then the domi- nates relationship between DSPs can be determined. In such cases it is important to derive the dominance relationship so that the appropriate intentional and atten- tional structures are available for processing or determin- ing the interpretation of the subsequent discourse.

F r o m the perspective of recognition, a trade-off implicit in the two equivalences is important. If the ICP makes the dominance relationship between two DSPs explicit (e.g., with cue phrases), then the OCP can use this information to help recognize the (ICP's beliefs about the) supports relationship. Conversely, if the ICP's utterances make clear the (ICP's beliefs about the) supports or generates relationship, then the OCP can use

Computational Linguistics, Volume 12, Number 3, July-September

1986 189

" ~ , .~ ,, -,

Barbara J. Grosz and Candaee

L . Sithaer

Attention, Intentions, and the Structure of Discourse

this information to help recognize the dominance relationship. Although it is most helpful to use the domi- nance relationships to constrain the search for appropri- ate supports and generates relationships, sometimes these latter relationships can be inferred reasonably directly from the utterances in a. segment using general know- ledge about the objects and actions in the domain of discourse. It remains an open question what inferences are needed and how complex it will be to compute supports and generates relationships if the dominance relationship is not directly indicated in a discourse.

Utterances f r o m the movies essay illustrate this trade- off. In utterance (9), the phrase

in the first place

expresses the dominance relationship between DSPs of the new segment DS5 and the parent segment DS4 direct- ly. Because of the dominance relationship (as well as the intentions expressed in the utterances), the OCP can determine that the ICP believes that the proposition that the content of the plays is not the best provides support for the proposition that the result of indiscriminate movie going is harmful. H e n c e determining dominance yields the support relation. The support relation can also yield dominance. Utterances (12)-(14), which comprise DS7, are not explicitly marked for a dominance relation. It can be inferred from the fact that the propositions in

(12)-(14) provide support for the proposition e m b e d d e d in DSP6 (that is, that the stories in movies are exciting and over-emotional) that DSP6 dominates DSP7.

Finally, the more information an ICP supplies explicit- ly in the actual utterances of a discourse, the less reason- ing about domain information an OCP has to do to achieve recognition. C o h e n (1983) has made a similar claim regarding the problem of recognizing the relation- ship between one proposition and another.

4 . 1 . 2 W H E N I S T H E I N T E N T I O N R E C O G N I Z E D ?

As discussed in Section 2.2, the intentional structure evolves as the discourse does. By the same token, the discourse participants' mental-state correlates of the intentional structure are not prebuilt; neither participant m a y have a complete model of the intentional structure

"in mind" until the discourse is completed. The domi- nance relationships that actually shape the intentional structure cannot be known a priori, because the specific intentions that will come into play are not known (never by the OCP, hardly ever by the ICP) until the utterances in the discourse have been made. Although it is assumed that the participants' c o m m o n knowledge includes 11 enough information about the domain to determine vari- ous relationships such as supports and generates, it is not assumed that, prior to a discourse, they actually had inferred and are aware of all the relationships they will need for that discourse.

Because any of the utterances in a segment m a y contribute information relevant to a complete determi- nation of the DSP, the recognition process is not complete until the end of the segment. However, the OCP must be able to recognize at least a generalization of the

DSP so that he can make the proper moves with respect to the attentional structure. That is, some combination of explicit indicators and intentional and propositional content must allow the OCP to ascertain where the DSP will fit in the intentional structure at the beginning of a segment, even if the specific intention that is the DSP cannot be determined until the end of the segment.

Utterance (15) in the movies example illustrates this point. The author writes, " H o w can our young people drink in through their eyes a continuous spectacle of intense and strained activity and feeling without harmful effects?" The primary intention 12 is derived f r o m this utterance, but this cannot be done until very late in the discourse segment [since (15) occurs at the end of DS2].

Furthermore, the segment for which 12 is primary has complex embedding of other segments. U t t e r a n c e (16), intention I0, and DS0 constitute another example of the expression of a primary intention late in a discourse segment. In that case, I0 cannot be c o m p u t e d until (16) has been read, and (16) is not only the last utterance in

DS0, but is one that covers the entire essay. If an OCP must recognize a DSP to understand a segment, then we ask: how does the OCP recognize a DSP when the utter- ance from which its primary intention is derived comes so late in the segment?

We conjecture with regard to such segments as D2 of the movies essay that the primary intention (e.g., 12) m a y be determined partially (and hence a generalized version b e c o m e recognizable) before the point at which it is actually expressed in the discourse. While the DP/DSP m a y not be expressed early, there is still

partial informa- tion

about it. This partial information often suffices to establish dominance (or satisfaction-precedence) relationships for additional segments. As these latter are placed in the hierarchy, their DSPs can provide further partial information for the underspecified DSP. F o r example, even though the intention I0 is expressed directly only in the last utterance of the movies essay, utterance (4) expresses an intention to k n o w whether p or ~ p is true (i.e., whether or not parents should let chil- dren see movies often and without close monitoring). I0 is an intention to believe, whose proposition is a gener- alization of the ~ p expressed in (4). Consider also the primary intention 14. It occurs in a segment e m b e d d e d within DS2, is more general than 12, but is an approxi- mation to it. It would not be surprising to discover that

OCPs can in fact predict something close to 12 on the basis of 14, utterances (9)-(14), and the partial domi- nance hierarchy available at each point in the discourse.



The focus space structure enables certain processing decisions to be made locally. In particular, it limits the information that must be considered in recognizing the

DSP as well as that considered in identifying the referents of certain classes of definite noun phrases.

1 9 0

Computational Linguistics, Volume 12, Number 3, July-September

1 9 8 6


J . G r o s z

and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

A primary role of the focus space stack is to constrain the range of DSPs considered as candidates for domi- nation or satisfaction-precedence of the DSP of the current segment. Only those DSPs in some space on the focusing stack are viable prospects. As a result of this use of the focusing structure, the theory predicts that this decision will be a local one with respect to attentional state. Because two focus spaces may be close to each other in the attentional structure without the discourse segments they arise from necessarily being close to one another and vice versa, this prediction corresponds to a claim that locality in the focusing structure is what matters to determination of the intentional structure.

A second role of the focusing structure is to constrain the OCP's search for possible referents of definite noun phrases and pronouns. To illustrate this role, we will consider the phrase

the s c r e w in the center

in utterance

(25) of the task-oriented dialogue of Section 3. The focus stack configuration when utterance (25) is spoken is shown in Figure 7. The stack contains (in bottom-to- top order) focus spaces FSI, FS4, and FS5 for segments

DS1, DS4, and DS5, respectively. For DS5 the wheelpul- ler is a focused entity, while for DS4 the two setscrews are (because they are explicitly mentioned). The entities in FS5 are considered before those in FS4 as potential referents. The wheelpuller has three screws: two small screws fasten the side arms, and a large screw in the center is the main functioning part. As a result, this large screw is implicitly in focus in FS5 (Grosz 1977) and thus identified as the referent without the two setscrews ever being considered.

Attentional state also constrains the search for refer- ents of pronouns. Because pronouns contain less explicit information about their referents than definite descriptions, additional mechanisms are needed to account for what may and may not be pronominalized in the discourse. One such mechanism is centering (which we previously called immediate focusing; Grosz, Joshi, and Weinstein 1983; Sidner 1979).

Centering, like focusing, is a dynamic behavior, but is a more local phenomenon. In brief, a backward-looking center is associated with each utterance in a discourse segment; of all the focused elements the backward-look- ing center is the one that is central in that utterance (i.e., the uttering of the particular sequence of words at that point in the discourse). A combination of syntactic, semantic, and discourse information is used to identify the backward-looking center. The fact that some entity is the backward-looking center is used to constrain the search for the referent of a pronoun in a subsequent utterance. Note that unlike the DSP, which is constant for a segment, the backward-looking center may shift: different entities may become more salient at different points in the segment.

The presence of both centers and DSPs in this theory leads us to an intriguing conjecture: that "topic" is a concept that is used ambiguously for both the DSP of a segment and the center. In the literature the concept of

"topic" has appeared in many guises. In syntactic form it is used to describe the preposing of syntactic constituents in English and the " w a " marking in Japanese. Research- ers have used it to describe the sentence topic (i.e., what the sentence is about; Firbas 1971, Sgall, Haji~ov~, and


F L Y W H E E L I o


FS1 t i m e






F L Y W H E E L I o








F L Y W H E E L l o



Figure 7. Focus Stack Transitions Leading up to Utterance (25).

Computational Linguistics, Volume 12, Number 3, July-September 1986





F L Y W H E E L I o




F L Y W H E E L l o



1 9 1

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

Benesova 1973), and as a pragmatic notion (Reinhart

1981); others want to use the term for discourse topic, either to mean what the discourse is about, or to be defined as those proposition(s) the ICP provides or requests new information about (see Reinhart (1981) for a review of m a n y of the notions of aboutness and topic).

It appears that m a n y of the descriptions of sentence topic correspond (though not always) to centers, while discourse topic corresponds to the DSP of a segment or of the discourse.


Interruptions in discourses pose an important test of any theory of discourse structure. Because processing an utterance requires ascertaining how it fits with previous discourse, it is crucial to decide which parts of the previ- ous discourse are relevant to it, and which cannot be.

Interruptions, by definition, do not fit; consequently their treatment has implications for the treatment of the normal flow of discourse. Interruptions m a y take m a n y forms - some are not at all relevant to the content and flow of the interrupted discourse, others are quite rele- vant, and m a n y fall somewhere in between these extremes. A theory must differentiate these cases and explain (among other things) what connections exist between the main discourse and the interruption, and how the relationship between them affects the processing of the utterances in both.

The importance of distinguishing between intentional structure and attentional state is evident in the three examples considered in Subsections 5.2, 5.3, and 5.4.

The distinction also permits us to explain a type of behavior deemed by others to be similar - so-called semantic returns - an issue we examine in Subsection


These examples do not exhaust the types of inter- ruptions that can occur in discourse. There are other ways to vary the explicit linguistic (and nonlinguistic) indicators used to indicate boundaries, the relationships between DSPs, and the combinations of focus space relationships present. However, the examples provide illustrations of interruptions at different points along the spectrum of relevancy to the main discourse. Because they can be explained more adequately by the theory of discourse structure presented here than by previous theo- ries, they support the importance of the distinctions we have drawn.


F r o m an intuitive view, we observe that interruptions are pieces of discourse that break the flow of the preceding discourse. An interruption is in some way distinct from the rest of the preceding discourse; after the break for the interruption, the discourse returns to the interrupted piece of discourse. In the example below, f r o m Polanyi and Scha (forthcoming), there are two (separate) discourses, D1 indicated in normal type, and D2 in italics.

D2 is an interruption that breaks the flow of D1 and is distinct from D 1.

D I : John came by and left the groceries


Stop that you kids

D I : and I put them away after he left

Using the theory described in previous sections, we can capture the above intuitions about the nature of interruptions with two slightly different definitions. The strong definition holds for those interruptions we classify as "true interruptions" and digressions, while the weaker form holds for those that are flashbacks. The two defi- nitions are as follows:

Strong definition:

An interruption is a discourse segment whose DSP is not dominated nor satisfac- tion-preceded by the DSP of any preceding segment.

W e a k definition: An interruption is a discourse segment whose DSP is not dominated nor satisfac- tion-preceded by the DSP of the immediately preced- ing segment.

Neither of the above definitions includes an explicit mention of our intuition that there is a " r e t u r n " to the interrupted discourse after an interruption. The return is an effect of the normal progress of a conversation. If we assume a focus space is normally popped f r o m the focus stack if and only if a speaker has satisfied the DSP of its corresponding segment, then it naturally follows both that the focus space for the interruption will be p o p p e d after the interruption, and that the focus space for the interrupted segment will be at the top of the stack because its DSP is yet to be satisfied.

There are other kinds of discourse segments that one m a y want to consider in light of the interruption contin- uum and these definitions. Clarification dialogues (Allen

1979) and debugging explanations (Sidner 1983) are two such possibilities. Both of them, unlike the interruptions discussed here, share a DSP with their preceding segment and thus do not c o n f o r m to our definition of interruption.

These kinds of discourses m a y constitute another general class of discourse segments that, like interruptions, can be abstractly defined.


The first kind of interruption is the true interruption, which follows the strong definition of interruptions. It is exemplified by the interruption given in the previous subsection. Discourses D1 and D2 have distinct, unre- lated purposes and convey different information about properties, objects, and relations. Since D2 occurs within

D1, one expects the discourse structures for the two segments to be s o m e h o w e m b e d d e d as well. The theory described in this paper differs from Polanyi and Scha's

(1984; and other more radically different proposals as well; e.g., Linde and G o g u e n 1978, C o h e n 1983, Reich-

192 Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

man-Adar 1984) because the "embedding" occurs


in the attentional structure. As shown in Figure 8, the focus space for D2 is pushed onto the stack above the focus space for D1, so that the focus space for D2 is more salient than the one for D 1, until D2 is completed.

The intentional structures for the two segments are distinct. There are two DP/DSP structures for the utter- ances in this sequence - one for those in D1 and the other for those in D2. It is not necessary to relate these two; indeed, from an intuitive point of view, they are not related.

The focusing structure for true interruptions is differ- ent from that for the normal embedding of segments, because the focusing boundary between the interrupted discourse and the interruption is impenetrable. 12 (This is depicted in the figure by a line with intersecting hash marks between focus spaces). The impenetrable bounda- ry between the focus spaces prevents entities in the spac- es below the boundary from being available to the spaces above it. Because the second discourse shifts attention totally to a new purpose (and may also shift the identity of the intended hearers), the speaker cannot use any


















; - , - , ; - , - , , - . - - . ; - r.t



















J O H N l o






F i g u r e

8. The structures of a true interruption.

Computational Linguistics, Volume 12, Number 3, July-September 1986


1 9 3

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

referential expressions during it that depend on the accessibility of entities from the first discourse. Since the boundary between the focus space for D1 and the one for D2 is impenetrable, if D2 were to include an utter- ance such as

put them away,

the pronoun would have to refer deictically, and not anaphorically, to the groceries.

In this sample discourse, however, D1 is resumed almost immediately. The pronoun



and I put them away

cannot refer to the children (the focus space for D2 has been popped from the stack), but only to the groceries. For this to be clear to the OCP, the ICP must indicate a return to D 1 explicitly. One linguistic indicator in this example is the change of mood from imperative.

Indicators that the

stop that

utterance is an interruption include the change to imperative mood and the use of the vocative (Polanyi and Scha 1983). Two other indicators may be assumed to have been present at the time of the discourse - a change of intonation (imagine a slightly shrill tone of command with an undercurrent of annoy- ance) and a shift of gaze (toward and then away from the kids). It is also possible that the type of pause pres- ent in such cases is evidence of the interruption, but further research is needed to establish whether this is indeed the case.

In contrast to previous accounts, we are not forced to integrate these two discourses into a single grammatical structure, or to answer questions about the specific relationship between segments D2 and D1, as in

Reichman's model (Reichman-Adar 1984). Instead, the intuition that readers have of an embedding in the discourse structure is captured in the attentional state by the stacking of focus spaces. In addition, a reader's intui- tive impression of the distinctness of the two segments is captured in their different intentional (DP/DSP) struc- tures.


Sometimes an ICP interrupts the flow of discussion because some purposes, propositions, or objects need to be brought into the discourse but have not been: the ICP forgot to include those entities first, and so must now go back and fill in the missing information. A flashback segment occurs at that point in the discourse. The flash- back is defined as a segment whose DSP satisfaction-pre- cedes the interrupted segment and is dominated by some other segment's DSP. Hence, it is a specialization of the weak definition of interruptions. This type of inter- ruptio n differs from true interruptions both intentionally and linguistically: the DSP for the flashback bears some relationship to the DP for the whole discourse. The linguistic indicator of the flashback typically includes a comment about something going wrong. In addition the audience always remains the same, whereas it may change for a true interruption (as in the example of the' previous section).

In the example below, taken from Sidner (1982), the

ICP is instructing a mock-up system (mimicked by a person) about how to define and display certain informa- tion in a particular knowledge-representation language.

Again the interruption is indicated by italics.

OK. N o w how do I say that Bill is

Whoops I forgot about ABC.

I need an individual concept f o r the company A B C

...[remainder o f discourse segment on ABC]...

N o w back to Bill. H o w do I say that Bill is an employee of ABC?

The DP for the larger discourse from which this sequence was taken is to provide information about vari- ous companies (including ABC) and their employees.

The outer segment in this example - DBill - has a DSP -

DSPBill - to tell about Bill, while the inner segment -

D A B C -- has a

D S P - D S P A B C -- t o convey certain infor- mation about ABC. Because of the nature of the infor- mation being told, there is order in the final structure of the DP/DSPs: information about ABC must be conveyed before all of the information about Bill can be. The ICP in this instance does not realize this constraint until after he begins. The "flashback" interruption allows him to s a t i s f y D S P A B C while suspending satisfaction of


(which he then resumes). Hence, there is an intentional structure rooted at DP and with

D S P A B C and

D S P B i u a s


sister nodes. The following three relationships hold between the different DSPs:14




This kind of interruption is distinct from a true inter- ruption because there is a connection, although indirect, between the DSPs for the two segments. Furthermore, the linguistic features of the start of the interruption signify that there is a precedence relation between these

DSPs (and hence that the correction is necessary). Flash- backs are also distinct from normally embedded discourses because of the precedence relationship between the DSPs for the two segments and the order in which the segments occur.

The available linguistic data permit three possible attentional states as appropriate models for flashback- type interruptions: one is identical to the state that would ensue if the flashback segment were a normally embed- ded segment, the second resembles the model of a true interruption, and the third differs from the others by requiring an auxiliary stack. A n example of the stack for a normally embedded sequence is given in Section 4.2

Figure 9 illustrates the last possibility. The focus space for the flashback - FSAB C -- is pushed onto the stack after an appropriate number of spaces, including the focus space for the outer segment

- FSBill , have been popped from the main stack and pushed onto an auxiliary stack. All of the entities in the focus spaces remaining on the main stack are normally accessible for reference, but none of those on the auxiliary stack are. In the example in the figure, entities in the spaces from FS A to FS B are accessible as well (though less salient than) those ir/

1 9 4 Computational Linguistics, Volume 12, Number 3, July-September 1986


J. Grosz and Candace L. Sidner

Attention, Intentions, and the Structure of Discourse

space FSAB C. Evidence for this kind of stack behavior could come from discourses in which phrases in the segment about ABC could refer to entities represented in

FSB, but not to those in FSBi u or FS C. After an explicit indication that there is a return to DSPBill (e.g., the

Now back to Bill

used in this example), any focus spaces left on the stack from the flashback are popped off, and all spaces on the auxiliary stack (including FSBill ) are returned to the main stack. Note, however, that this model does not preclude the possibility of a return to some space between FS A and FS B before popping the auxiliary stack. Whether there are discourses that include such a return and are deemed coherent is an open ques- tion.

The auxiliary stack model differs from the other two models by the references permitted and by the spaces that can be popped to. Given the initial configuration in

Figure 9, if the segment with DSPAB C were normally embedded, FSAB C would just be added to the top of the stack. If it were a true interruption, the space would also

'be added to the stack, but with an impenetrable bounda- ry between it and FSBill. In the normal stack model, enti- ties in the spaces lower in the stack would be accessible; in the true interruption they would not. In either of these two models, however, FSBill would be the space returned to first. The auxiliary stack model is obviously more complicated than the other two alternatives. Whether it

(or some equivalent alternative) is necessary depends on facts of discourse behavior that have not yet been deter- mined.


The third type of interruption, which we call a digression, is defined as a strong interruption that contains a refer- ence to some entity that is salient in b o t h the interruption and the interrupted segment. F o r example, if while discussing Bill's role in c o m p a n y ABC, one conversational participant interrupts with,

Speaking o f Bill, that reminds me, he came to dinner last week,

Bill remains salient, but the DP changes. Digressions c o m m o n l y begin with phrases such as

speaking o f John


that reminds me,

although no cue phrase need be present, and

that reminds me

m a y also signal other stack and intention shifts.

In the processing of digressions, the discourse-level intention of the digression forms the base of a separate intentional structure, just as in the case of true inter- ruptions. A new focus space is formed and pushed onto the stack, but it contains at least one - and possibly other


entities from the interrupted segment's focus space.

Like the flashback-type interruption, the digression must usually be closed with an explicit utterance such as

getting back to ABC...















FS A time



FS c


F i g u r e

9. The auxiliary stack model for flashbacks.


Linguistics, Volume 12, Number 3, July-September 1986

1 9 5

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse


One case of discourse behavior that we must distinguish comprises the so-called "semantic returns" observed by

Reichman (1981) and discussed by Polanyi and Scha

(1983). In all the interruptions we have considered so far, the stack must be popped when the interruption is over and the interrupted discourse is resumed. The focus space for the interrupted segment is "returned to." In the case of semantic returns, entities and DSPs that were sali- ent during a discourse in the past are taken up once again, but are explicitly reintroduced. For example, suppose that yesterday two people discussed how badly

Jack was behaving at the party; then today one of them says

Remember our discussion about Jack at the party?

Well, a lot o f other people thought he acted just as badly as we thought he did.

The utterances today recall, or return to, yesterday's conversation to help satisfy the intention that more be said about Jack's poor behavior.

Anything that can be talked about once can be talked about again. However, if there is no focus space on the stack corresponding to the segment and DSP being discussed further, then, as Polanyi and Scha (1983) point out, there is no popping of the stack. There need not be any discourse underway when a semantic return occurs; in such cases, the focus stack will be empty. Thus, unlike the returns that follow normal interruptions, semantic returns involve a push onto the stack of a new space containing, among other things, representations of the reintroduced entities.

The separation of attentional state from intentional structure makes clear not only what is occurring in such cases, but also the intuitions underlying the term seman- tic return. In reintroducing some entities from a previous discourse, conversational participants are establishing some connection between the DSP of the new segment and the intentional structure of the original discourse. It is not a return to a previous focus space because the focus space for the original discourse is gone from the stack, and the items to be referred to must be re-establ- ished explicitly. For example, the initial reference to

Jack in the preceding example cannot be accomplished with a pronoun; with no prior mention of Jack in the current discussion, one cannot say,

Remember our discussion about him at the party.

The intuitive impression of a return in the strict sense is only a return to a previous intentional structure.


Both attentional state and intentional structure change during a discourse. ICPs rarely change attention by directly and explicitly referring to attentional state (e.g., using the phrase

Now let's turn our attention to...).

Like- wise, discourses only occasionally include an explicit reference to a change in purpose (e.g., with an utterance such as

Now I want to explain the theory o f dynamic programming).

More typically, ICPs employ indirect means of indicating that a change is coming and what kind of change it is. Cue phrases provide abbreviated, indirect means of indicating these changes.

In all discourse changes, the ICP must provide infor- mation that allows the OCP to determine all of the following:

1. that a change of attention is imminent;

2. whether the change returns to a previous focus space or creates a new one;

3. how the intention is related to other intentions;

4. what precedence relationships, if any, are relevant;

5. what intention is entering into focus.

Cue phrases can pack in all of this information, except for (5). In this section, we explore the predictions of our discourse structure theory about different uses of these phrases and the explanations the theory offers for their various roles.

We use the configuration of attentional state and intentional structure illustrated in Figure 10 as the start- ing point of our analysis. In the initial configuration, the focus space stack has a space with DSP X at the b o t t o m and another space with DSP A at the top. The intentional structure includes the information that X dominates A.

F r o m this initial configuration, a wide variety of moves may be made. We examine several changes and the cue phrases that can indicate each of them. Because these phrases and words in isolation may ambiguously play either discourse or other functional roles, we also discuss the other uses whenever appropriate. Furthermore, cue phrases do not function unambiguously with respect to a particular discourse role. Thus for example,


can be used for two different moves that we discuss below.

First, consider what happens when the ICP shifts to a new DSP, B, that is dominated by A (and corresponding- ly by X). The dominance relationship between A and B becomes part of the intentional structure. In addition, the change in DSP results in a change in the focus stack.

The focus stack models this change, which we call

n e w dominance,

by a having new space pushed onto the stack with B as the DSP of that space (as illustrated in Figure

11). The space containing A is salient, but less so than the space with B. Cue phrase(s) to signal this case, and only this one, must communicate two pieces of informa- tion: that there is a change to s o m e new purpose (result- ing in a new focus space being created in the attentional state model rather than a return to one on the stack) and that the new purpose (DSP B) is dominated by DSP A.

Typical cue phrases for this kind of change are

f o r exam- ple


to wit,

and sometimes




Cue phrases can also exhibit the existence of a satis- faction-precedence relationship. If B is to be the first in a list of DSPs dominated by A, then words such as



in the first place

can be used to communicate this fact. Later in the discourse, cue phrases such as

second, third,



can be used to indicate DSPs that are dominated by A and satisfaction-preceded by B. In these cases, the focus space containing B would be popped


Computational Linguistics, Volume 12, Number 3, July-September


Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse from the stack and the new focus space inserted above the one containing A.

There are three other kinds of discourse segments that change the intentional structure with a resulting push of new focus spaces onto the stack: the true-interruption, where B is not dominated by A; the flashback, where B satisfaction-precedes A; and the digression, where B is not dominated by A, but some entity from the focus space containing A is carried over to the new focus space.

One would expect that there might be cue phrases that would distinguish among all four of these kinds of chang- es. Just that is so. There are cue phrases that announce one and only one kind of change. T h e cue phrases mentioned above for new dominance are never used for the three kinds of discourse interruption pushes. The cue phrases for true-interruptions express the intention to interrupt (e.g. Excuse me a minute, or ! must interrupt) while the distinct cue phrase for flashbacks (e.g. Oops, !

forgot about ...) indicates that something is out of order.

The typical opening cue phrases of the digression mention the entity that is being c a r d e d forward (e.g.

Speaking o f John ... or Did you hear about John?).


FOCUS SPACE S T A C K D O M I N A N C E H I E R A R C H Y teN,..~ 4,Uhu

dU, QI, p~iL~.





\ xl



Figure 10. An initial discourse structure configuration.

A T T E N T I O N A L ~ T A T E C H A N G E








DSP = X t I t 2

Figure 11. Attentional and intentional structures for a new subsegment.

Computational Linguistics, Volume 12, Number 3, July-September 1986 197

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

Cue phrases can also exhibit the satisfaction of a DSP, and hence the completion of a discourse segment. The completion of a segment causes the current space to be popped from the stack. There are many means of linguistically marking completions. In texts, paragraph and chapter boundaries and explicit comments (e.g. The

E n d )

are common. In conversations, completion can be indicated either with cue phrases such as f i n e or O K 15 or with more explicit references to the satisfaction of the intention (e.g., That's all f o r point 2, or The ayes have it.).

Most cue phrases that communicate changes to atten- tional state announce pops of the focus stack. However, at least one cue phrase can be construed to indicate a push, namely, That reminds me. By itself, this phrase does not specify any particular change in intentional structure, but merely shows that there will be a new DSP. Since this is equivalent to indicating that a new focus space is to be pushed onto the stack, this cue phrase is best seen as conveying attentional information.

Cue phrases that indicate pops to some other space back in the stack include but anyway, anyway, in any case, and now back to... When the current focus space is popped from the stack, a space already on the stack becomes most salient. From the configuration in Figure

10, the space with A is popped from the stack, perhaps with others, and another space on the stack becomes the top of the stack. Popping back changes the stack without creating a new DSP, or a dominance or satisfaction- precedence relationship. The pop entails a return to an old DSP; no change is effected in the intentional struc- ture.

There are cue phrases, such as now and next, that signal a change of attentional state, but do not distinguish between the creation of a new focus space and the return to an old one. These words can be used for either move.

For example, in a task-oriented discourse during which some task has been mentioned but put aside to ask a question, the use of now indicates a change of focus. The utterance following now, however, will either return the discussion to the deferred task or will introduce some new task for consideration.

Note, finally, that a pop of the focus stack may be achieved without the use of cue phrases as in the follow- ing fragment of a task-oriented dialogue (Grosz 1974):

A: One bolt is stuck, i ' m trying to use both the pliers and the wrench to get it unstuck, but I haven't had much luck.

E: D o n ' t use pliers. Show me what you are doing.

A: I ' m pointing at the bolts.

E: Show me the 1 / 2 " combination wrench, please.

A: OK.

E: Good, now show me the 1 / 2 " box wrench.

A: I already got it loosened.

The last utterance in this fragment returns the discourse to the discussion of the unstuck bolt. The pop can be inferred only from the content of the main portion of the utterance. The pronoun (or, more accurately, the fact that it cannot be referring to the wrench) is a cue that a pop is needed, but only the reference to the loos- ening action allows the OCP to recognize to which discourse segment this utterance belongs, as discussed by

Sidner (1979) and Robinson (1981). A summary of the uses of cue phrases is given in Figure 12.

Attentional Change

(push) now, next, that reminds me, and, but

(pop to) anyway, but anyway, in any case, now back to

(complete) the end, ok, fine, (paragraph break)

True interruption

I must interrupt, excuse me


Oops, I forgot.


By the way, incidentally, speaking of,

Did you hear about .... That reminds me

Satisfaction-precedes in the first place, first, second, finally, moreover, furthermore

New dominance for example, to wit, first, second, and, moreover, furthermore, therefore, finally

Figure 12. The uses of cue phrases.

198 Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

The cases listed here do not exhaust the changes in focus spaces and in the dominance hierarchy that can be represented - nor have we furnished a set of rules that specify when cue phrases are necessary. Additional cases, especially special subcases of these, m a y be possi- ble. When discourse is viewed in terms of intentional structure and attentional state, it is clearer just what kinds of information linguistic expressions and intonation convey to the hearer about the discourse structure.

Furthermore, it is clear that linguistic expressions can function as cue phrases, as well as sentential connections; they can tell the hearer about changes in the discourse structure and be carriers of discourse, rather than sentence-level semantic, meaning.





The intentions that serve as DP/DSPs are natural exten- sions of the intentions Grice (1969) considers essential to developing a theory of utterer's meaning. There is a crucial difference, however, between our use of disc- ourse-level intentions in this paper (and the theory, as developed so far) and Grice's use of utterance-level intentions. We are not yet addressing the issue of discourse

meaning, but are concerned with the role of

Dp/DSPs in determining discourse structure and in speci- fying how these intentions can be recognized by an OCP.

Although the intentional structure of a discourse plays a role in determining discourse meaning, the DP/DSPs do not in and of themselves constitute discourse segment meaning. The connection between intentional structure and discourse meaning is similar to that between atten- tional and cognitive states; the attentional state plays a role in a hearer's understanding of what the speaker means by a given sequence of utterances in a discourse segment, but it is not the only aspect of cognitive state that contributes to this understanding.

We will draw upon some particulars of Grice's defi- nition of utterer's meaning to explain DSPs more fully.

His initial definition is as follows:

U meant something by uttering x is true iff [for some audience A]:

1. U intended, by uttering x, to induce a certain response in A

2. U intended A to recognize,

at least in part from

the utterance o f x, that U intended to produce that response

3. U intended the fulfillment of the intention mentioned in (2) to be at least in part A's reason for fulfilling the intention mentioned in


Grice refines this definition to address a n u m b e r of counterexamples. The following portion of his final definition 16 is relevant to this paper:

By uttering x U meant that *6p is true iff

(~tA)(3f [features of the utterance]) (3c [ways of correlating f with utterances17]):

(a) U uttered x intending

1. A to think x possesses f

2. A to think f correlated in way c with

~-ing that p

3. A to think, on the basis of fulfillment of (1) and

(2) that U intends A to think that U ffs that p

4. A on the basis of fulfillment of (3) to think that

U ~ks that p

5. and (in some cases), A on the basis of fulfill- ment of (4) himself to ~k that p

Grice takes *~p to be the meaning of the utterance, where *ff is a m o o d indicator associated with the proposi- tional attitude q~ (e.g., *q~=assert and ~k=believe). H e considers attitudes like believing that ICP is a G e r m a n soldier and intending to give the ICP a beer as examples of the kinds of

~b-ing that p that utterance intentions can embed. F o r expository purposes, we use the following notation to represent these utterance-level intentions:

Intend(ICP, Believe(OCP, ICP is a G e r m a n soldier))

Intend(ICP, Intend(OCP, OCP give ICP a beer))

To extend Grice's definition to discourses, we replace the utterance x with a discourse segment DS, the utterer

U with the initiator of a discourse segment ICP, and the audience A with the OCP. To complete this extension, the following problems must be resolved:

1. specifying the discourse-level intentions and attitudes that correspond to the utterance-level intentions and

~'s that p;

2. identifying the kinds of f s that contribute to deter- mining discourse-level intentions;

3. identifying the modes of correlation (the c's) between features of the discourse segments and types of disc- ourse-level intentions;

4. specifying how the discourse-level intentions can be recognized b y an OCP.

Although each of these issues is an unresolved prob- lem in discourse theory, this paper has provided partial answers. The examples presented illustrate the range of discourse-level intentions; these intentions appear to be similar to utterance-level intentions in kind, but differ in that they occur in a context in which several utterances m a y be required to ensure their comprehension and satis- faction. The features so far identified as conveying information about DSPs are: specific linguistic markers

(e.g., cue phrases, intonation), utterance-level intentions, and propositional content of the utterances. We have not explored the problem of identifying modes of correlation in any detail, but it is clear that those modes that operate at the utterance level also function at the discourse level.

As discussed previously, the proper treatment of the recognition of discourse-level intentions is especially necessary for a computationally useful account of discourse. At the discourse level, just as at the utterance level, the intended recognition of intentions plays a

Computational Linguistics, Volume 12, Number 3, July-September 1986 199

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

central role. The DSPs are intended to be recognized: they achieve their effects, in part, because the OCP recognizes the ICP's intention for the OCP to ~ that p.

The OCP's recognition of this intention is crucial to its achieving the desired effect. In Section 4 we described certain constraints on the recognition process.


In extending Grice's analysis to the discourse level, we have to consider not only individual beliefs and intentions, but also the relationships a m o n g them that arise because of the relationships among various discourse segments (and utterances within a segment) and the purposes the segments serve with respect to the entire discourse. To clarify these relationships, consider an analogous situation with nonlinguistic actions. 18 An action m a y divide into several subactions; for example, the planting of a rose bush divides into preparing the soil, digging a hole, placing the rose bush in the hole, filling the rest of the hole with soil, and watering the ground around the bush. The intention to p e r f o r m the planting action includes several subsidiary intentions (one for each of the subactions - namely, to do it).

In discourse, in a manner that is analogous to nonlin- guistic actions, the DP (and some DSPs) includes several subsidiary intentions related to the DSPs it dominates.

F o r purposes of exposition, we will use the term primary

i n t e n t i o n

to distinguish the overall intention of the DP from the subsidiary intentions of the DP. F o r example in the movies argument of Section 3.1, the primary inten- tion is for the reader to come to believe that parents and teachers should keep children from seeing too m a n y movies; in the task dialogue of Section 3.2, the intention is that the apprentice remove the flywheel. Subsidiary intentions include, respectively, the intention that the reader believe that it is important to evaluate movies and the intention that the expert help the apprentice locate the second setscrew.

Because the beliefs and intentions of at least two different participants are involved in discourse, two prop- erties of the general-action situation (assuming a single agent performs all actions) do not carry over. First, in a discourse, the ICP intends the OCP to recognize the ICP's beliefs about the connections among various propositions and actions. F o r example, in the movies argument, the reader (OCP) is intended to recognize that the author

(ICP) believes some propositions provide support for others; in the task dialogue the expert (ICP) intends the apprentice (OCP) to recognize that the expert believes the performance of certain actions contributes to the performance of other actions. In contrast, in the gener- al-action situation in which there is no communication, there is no need for recognition of another agent's beliefs about the interrelationship of various actions and intentions.

The second difference concerns the extent to which the subsidiary actions or intentions specify the overall action or intention. To p e r f o r m some action, the agent must p e r f o r m each of the subactions involved; by performing all of these subactions the agent performs the action. In contrast in a discourse, the participants share the assumption of discourse sufficiency: it is a convention of the communicative situation that the ICP believes the discourse is sufficient to achieve the primary intention of the DP. Discourse sufficiency does not entail logical sufficiency or action completeness. It is not necessarily the case that satisfaction of all of the DSPs is sufficient in and of itself for satisfaction of the DP. Rather, there is an assumption that the information conveyed in the discourse will suffice in conjunction with other information

the ICP believes the OCP has (or can obtain) to allow for satisfaction of the primary intention of the DP. Satisfac- tion of all of the DSPs, in conjunction with this additional information, is enough for satisfaction of the DP. Hence, in discourse the intentional structure (the analogue of the action hierarchy) need not be complete.

F o r example, the propositions expressed in the movies essay do not provide a logically sufficient proof of the claim. The author furnishes information he believes to be adequate for the reader to reach the desired conclusion and assumes the reader will supplement what is actually said with appropriate additional information and reason- ing. Likewise, the task dialogue does not mention all the subtasks explicitly. Instead, the expert and apprentice discuss explicitly only those subtasks for which some instruction is needed or in connection with which some p r o b l e m arises.

To be more concrete, we shall look at the extension of the Gricean analysis for two particular cases, one involv- ing a belief, the other an intention to p e r f o r m some action. We shall consider only the simplest situations, in which the primary intentions of the DP/DSPs are about either beliefs or actions, but not a mixture. Although the task dialogue obviously involves a mixture, this is an extremely complicated issue that d e m a n d s additional research.


In the belief case, the primary intention of the DP is to get the OCP to believe some proposition, say p. Each of the discourse segments is also intended to get the OCP to believe a proposition, say qi for some i= 1 ..... n (where there are n discourse segments). In addition to the prima- ry intention - i.e., that "the OCP should come to believe p


the DP includes an intention that the OCP come to believe each of the qi and, in addition, an intention that the OCP come to believe the qi provide support for p. We can represent this schematically as: 19

Yi= 1 ..... n Intend(ICP, Believe(OCP,p) A

Believe(OCP,qi ) A

Believe(OCP, Supports (p, q l A . . . A q n ) ) )

There are several things to note here. To begin with, the first intention, (Intend ICP (Believe (OCP p)), is the primary c o m p o n e n t of the DSP. Second, each of the

200 Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

intended beliefs in the second conjunct corresponds to the primary c o m p o n e n t of the DSP of some e m b e d d e d discourse segment. Third, the supports relation is not implication. The OCP is not intended to believe that the qi imply p, but rather to believe that the qi in conjunction with other facts and rules that the ICP assumes the OCP has available or can obtain and thus come to believe are sufficient for the OCP to conclude p. Fourth, the

DP/DSP may only be completely determined at the end of the discourse (segment), as we discussed in Section 4.

Finally, to determine how the discourse segments corresponding to the qi are related to the one corre- sponding to p, the OCP only has to believe that the ICP believes a supports relationship holds. Hence, for the purpose of recognizing the discourse structure, it would be sufficient for the third clause to be

... Believe(OCP, Believe(ICP,

Supports (p, q l A . . . A q n ) ) )

However, the DP of a belief-case discourse is not merely to get the OCP to believe p, but to get the OCP to believe p by virtue of believing the qi. That this is so can be seen clearly by considering situations in which the OCP already believes p and is known by the ICP to do so, but does not have a good reason for believing p. This last property of the belief case is not shared by the action case.

There is an important relationship between the supports relation and the dominance relation that can hold between DP/DSPs; it is captured in the following rule (using the same notation as above):

¥ i = 1 ..... n I n t e n d ( C P 1, Believe(CP2,p)) A

I n t e n d ( C P p Believe(CP2,qi)) A

Believe(CPp Supports(p, q l A . . . A q n ) ) ~

D O M ( I n t e n d ( C P p Believe(CP2,p))

I n t e n d ( C P 1, Believe(CP2,qi)))

The implication in the forward direction states that if a conversational participant

( C P I ) believes that the propo- sition p is supported by the proposition qi, and he intends another participant (CP2) to adopt these beliefs, then his intention that

C P 2 believe p dominates his intention that

CP 2 believe qi- Viewed intuitively, CPl's belief that qi provides support for p, underlies his intention to get

C P 2 to believe p by getting him to believe qi. The satisfaction of CP~'s intention that CP 2 should believe qi will help satisfy CP~'s intention that

C P 2 believe p. This relation- ship plays a role in the recognition of DSPs.


An analogous situation holds for a discourse segment comprising utterances intended to get the OCP to perform some set of actions directed at achieving some overall task (e.g., some segments in the task-oriented dialogue of Section 3.2). The full specification of the

DP/DSP contains a

generates relation that is derived from a relation defined by G o l d m a n (1970). F o r this case, the

DP/DSPs are of the following form:

¥ i = 1 ..... n Intend(ICP, Intend(OCP, D o ( A ) ) A

Intend(OCP, Do(ai)) A

Believe(OCP, Believe(ICP,

G e n e r a t e s ( A , a l A . . . A a n ) ) ) )

Each intention to act represented in the second conjunct corresponds to the primary intention of some discourse segment.

Like supports, the generates relation is partial (its partiality distinguishes it in part f r o m G o l d m a n ' s relation). Thus, the OCP is not intended to believe that the ICP believes that p e r f o r m a n c e of a i alone is sufficient for performance of A, but rather that doing all of the a i and other actions that the OCP can be expected to know or figure out constitutes a performance of A. In the task dialogue of Section 3.2, m a n y actions that are essential to the task (e.g., the apprentice picking up the Allen wrench and applying it correctly to the setscrews) are never even mentioned in the dialogue.

Note that it is unnecessary for the ICP or OCP to have a complete plan r~lating all of the a i to A at the start of the discourse (or discourse segment). All that is required is that, for any given segment, the OCP be able to deter- mine what intention to act the segment corresponds to and which other intentions dominate that intention.

Finally, unlike the belief case, the third conjunct here requires only that the OCP recognize that the ICP believes a generates relationship holds. The OCP can do

A by virtue of doing the a i without coming himself to believe anything about the relationships between A and the a i.

As in the belief case, there is an equivalence that links the generates relation a m o n g actions to the dominance relation between intentions. Schematically, it is as follows:

¥ i = 1 ..... n I n t e n d ( C P 1, I n t e n d ( C P 2, D o ( A ) ) ) A

Intend(CP1, Intend(CP2, Do(ai))) A

Believe(CPt, G e n e r a t e s ( A , a l A . . . A a n ) ) < >

D O M ( I n t e n d ( C P 1 , Intend(CP2, D o ( A ) ) )

I n t e n d ( C P t, I n t e n d ( C P 2, Do(ai))))

This equivalence states that, if an agent ( C P 1) believes that the performance of some action (ai) contributes in part to the performance of another action (A), and if CP 1 intends for

C P 2 t o

(intend to) do b o t h of these actions, then his intention that CP 2 (intend to) p e r f o r m a i is dominated b y his intention that

C P 2

(intend to) p e r f o r m

A. Viewed intuitively, CP1's belief that doing a i will contribute to doing A underlies his intention to get

C P 2 to do A by getting

C P 2 t o do a i. The satisfaction of C P t ' s intention for CP 2 to do a i will help satisfy CP~'s intention for

C P 2 t o do A.

So, for example, in the task-oriented dialogue of

Section 3.2, the expert knows that using the wheelpuller is a necessary part of removing the flywheel. His inten- tion that the apprentice intend to use the wheelpuUer is thus dominated by his intention that the apprentice intend to take off the flywheel. Satisfaction of the inten- tion to use the wheelpuller will contribute to satisfying

Computational Linguistics, Volume 12, Number 3, July-September 1986 201

Barbara J. Grusz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

the intention to remove the flywheel. In general, the action a i does not have to be a necessary action though it is in this example (at least if the task is done correctly).

A definitive statement characterizing primary and subsidiary intentions for task-oriented dialogues awaits further research not only in discourse theory, but also in the theory of intentions and actions. In particular, a clearer statement of the interactions among the intentions of the various discourse participants (with respect to both linguistic and nonlinguistic actions) awaits the formulation of a better theory of cooperation and multiagent activity.


We are now in a position to contrast the role of

DP/DSPs, supports, generates, DOM, and SP in our theo- ry with the rhetorical relations that, according to a number of alternative theories (e.g., Grimes 1975, H o b b s

1979, M a n n and T h o m p s o n 1983, R e i c h m a n - A d a r 1984,

M c K e o w n 1985), are claimed to underlie discourse struc- ture. A m o n g the various rhetorical relations that have been investigated are elaboration, summarization, enablement, justification, and challenge. Although the theories each identify different specific relations, they all use such relations as the basis for determining discourse structure.

These rhetorical relations apply specifically to linguis- tic behavior and most of them implicitly incorporate intentions (e.g., the intention to summarize, the intention to justify). The intentions that typically serve as DP/DSPs in our theory are more basic than those that underlie such rhetorical relations in that they are not specialized for linguistic behavior; in m a n y cases, their satisfaction can be realized by nonlinguistic actions as well as linguis- tic ones.

The supports and generates relations that must some- times be inferred to determine domination are also more basic than rhetorical relations; they are general relations that hold between propositions and actions. Hence, the inferring of relationships such as supports and generates is simpler than that of rhetorical relationships. The determination of whether a supports or generates relationship exists depends only on facts of how the world is, not on facts of the discourse. In contrast, the recognition of rhetorical relations requires the combined use of discourse and domain information.

F o r several reasons, rhetorical relationships do

n o t

have a privileged status in the account given here.

Although they appear to provide a metalevel description of the discourse, their role in discourse interpretation remains unclear. As regards discourse processing, it seems obvious that the ICP and OCP have essentially different access to them. In particular, the ICP m a y well have such rhetorical relationships "in mind" as he produces utterances (as in M c K e o w n ' s (1985) system), whereas it is much less clear when (if at all) the OCP infers them. A claim of the theory being developed in this paper is that a discourse can be understood at a basic level even if the OCP never does or can construct, let alone name, such rhetorical relationships. Furthermore, it appears that these relationships could be recast as a combination of domain-specific information, general relations between propositions and actions (e.g., supports and generates), and general relations b e t w e e n intentions

(e.g., domination and satisfaction-precedence). 20 E v e n so, rhetorical relationships are, in all likelihood, useful to the theoretician as an analytical tool for certain aspects of discourse analysis.


The theory of discourse structure presented in this p a p e r is a generalization of theories of task-oriented dialogues.

It differs from previous generalizations in that it carefully distinguishes three c o m p o n e n t s of discourse structure: one linguistic, one intentional, and one attentional. This distinction provides an essential basis for explaining interruptions, cue phrases, and referring expressions.

The particular intentional structure used also differs f r o m the analogous aspect of previous generalizations.

Although, like those generalizations, it supplies the prin- cipal f r a m e w o r k for discourse segmentation and deter- mines structural relationships for the focusing structure

(part of the attentional state), unlike its predecessors it does not depend on the special details of any single domain or type of discourse.

Although admittedly still incomplete, the theory does provide a solid basis for investigating both the structure and meaning of discourse, as well as for constructing discourse-processing systems. Several difficult research problems remain to be explored. Of these, we take the following to be of primary importance:

1. Specification of the relationship b e t w e e n discourse- level (DP/DSP) and utterance-level intentions;

2. Identification of the information that discourse partic- ipants use to recognize these intentions, and the ways in which they utilize it;

3 D e v e l o p m e n t of an adequate treatment of the inter- action a m o n g intentions of multiple participants;

4. Investigation of the effect of multiple DSPs on the theory;

5. Investigation of alternative models of attentional state.

Finally, the theory suggests several important conjec- tures. First, that a discourse is coherent only when its discourse purpose is shared by all the participants and when each utterance of the discourse contributes to achieving this purpose, either directly or indirectly, by contributing to the satisfaction of a discourse segment purpose. Second, general intuitions about " t o p i c " corre- spond most closely to DP/DSPs, rather than to syntactic or attentional concepts. Finally, the theory suggests that the same intentional structure can give rise to different attentional structures through different discourses. The different attentional structures will be manifest in part

202 Computational Linguistics, Volume 12, Number 3, July-September 1986

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

because different referring expressions will be valid, and in part because different cue phrases and other indicators will be necessary, optional, or redundant.


We have benefited greatly from discussions with Martha

Pollack, Ray Perrault, and Scott Weinstein. The paper has benefited from the comments of Jon Barwise, Marcia

Derr, Brad Goodman, David Israel, Amichai Kronfeld,

Mitch Marcus, Martha Pollack, Ray Perrault, John Perry,

Jane Robinson, Stuart Shieber, Ralph Weischedel, Scott

Weinstein, and the anonymous reviewers for

Computational Linguistics.

Whatever errors remain are, of course, all ours.

This paper was made possible by a gift from the

System Development Foundation. Support was also provided for the second author by the Advanced

Research Projects Agency of the Department of Defense and was monitored by ONR under Contract No.

N00014-85-C-0079. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the Defense

Advanced Research Projects Agency or the U.S. Govern- ment.


Allen, J.F. 1979 A Plan-based Approach to Speech Act Recognition.

Technical Report 131, Department of Computer Science, University of Toronto, Toronto, Canada.

Allen, J.F. 1983 Recognizing Intentions from Natural Language Utter- ances. In Brady, M. and Berwick, R.C., Eds.,

Computational Models o f Discourse.

MIT Press: 107-166.

Allen, J.F. and Perrault, C.R. 1980 Analyzing intention in dialogues.

Artificial Intelligence

15(3): 143-178.

Appelt, D. 1985 Planning English Referring Expressions.


26: 1-33.


Butterworth, B. 1975 Hesitation and semantic planning in speech.

Journal o f Psycholinguistic Research

(4): 75-87.

Chafe, Wallace L. 1979 The Flow of Thought and the Flow of

Language. In Givon. T., Ed.,

Syntax and Semantics, Vol. 12,

Discourse and Syntax.

Academic Press, New York, New York: 159-


Chafe, W.L. 1980 The Deployment of Consciousness in the Production of a Narrative. In Chafe, W.L., Ed.,

The Pear Stories: Cognitive,

Cultural and Linguistic Aspects o f Narrative Production. Vol. 3.

Advances in Discourse Processes.

Ablex Publishing Corp, Norwood,

New Jersey: 9-50.

Cohen, P.R. and Levesque, H.L. 1980 Speech Acts and the Recogni- tion of Shared Plans.

Proceedings o f the Third Biennial Conference o f the Canadian Society for Computational Studies o f Intelligence.

Victo- ria, British Columbia: 263-271.

Cohen, R. 1983 A Computational Model for the Analysis of Argu- ments..Technical Report CSRG-151, Computer Systems Research

Group, University of Toronto, Toronto, Canada.

Cohen, P.R. and Levesque, H.J. 1985 Speech Acts and Rationality.

Proceedings o f 23rd Annual Meeting o f the Association for Computa- tional Linguistics.

Chicago, Illinois: 49-60.

Firbas, J. 1971

On the Concept o f Communicative Dynamism in the

Theory o f Functional Sentence Perspective. Brno Studies in English,

Vol. 7.

Brno University, Brno, Czechloslovakia: 12-47.

Goldman, A.I. 1970

A Theory o f Human Action.

Press, Princeton, New Jersey.

Princeton University

Grice, H.P. 1969 Utterer's Meaning and Intentions.


68(2): 147-177.


Grimes, J.E. 1975 The

Thread o f Discourse.

Mouton Press, The Hague,


Grosz, Barbara [Deutsch] 1974 The Structure of Task Oriented

Dialogs. In

IEEE Symposium on Speech Recognition: Contributed


Carnegie Mellon University Computer Science Dept., Pitts- burgh, Pennsylvania: 250-253.

Grosz, B.J. 1977 The representation and use of focus in dialogue understanding. Technical Report 151, Artificial Intelligence Center, sri International, Menlo Park, California.

Grosz, B.J. 1978a Discourse Analysis. In .,Walker, D., Ed.: 235-268.

Grosz, B.J. 1978b Focusing in Dialog.

Theoretical Issues in Natural

Language Processing-2.

University of Illinois at Urbana-Champaign,

Champaign, Illinois: 96-103.

Grosz, B.J. 1 9 8 1 Focusing and Description in Natural Language

Dialogues. In Joshi, A.; Webber, B.; and Sag, I., Eds.,

Elements o f

Discourse Understanding.

Cambridge University Press, New York,

New York: 84-105.

Grosz, B.J.; Joshi, A.K.; and Weinstein, S. 1983 Providing a Unified

Account of Definite Noun Phrases in Discourse.

Proceedings o f the

21st Annual Meeting o f the Associa)ion for Computational Linguistics.

Cambridge, Massachusetts: 44-50.

Haji~ov~i, E. 1983 Topic and Focus.


Theoretical Linguistics


Hendrix, G.G. 1979 Encoding Knowledge in Partitioned Networks. In

Findler, N. V., Ed., The

Representation and Use o f Knowledge in


Academic Press, New York, New York: 51-92.

Hirschberg, J. and Pierrehumbert, J. 1986 The Intonational Structuring of Discourse.

Proceedings o f the 24th Annual Meeting o f the Associ- ation for Computational Linguistics.

New York, New York: 136-144.

Hobbs, J. 1979 Coherence and Co-reference.

Cognitive Science



Holmes, H.W. and Gallagher, O. 1917

Composition and Rhetoric. D.

Appleton and Co., New York, New York.

Linde, C. and Goguen, J. 1978 Structure of Planning Discourse. J.

Social BioL Struct.

1: 219-251.

Linde, C. 1979 Focus of Attention and the Choice of Pronouns in

Discourse. In Givon, T., Ed.,

Syntax and Semantics, Vol. 12,

Discourse and Syntax.

Academic Press, New York, New York: 337-


Litman, Diane. 1985 Plan Recognition and Discourse Analysis: An

Integrated Approach for Understanding Dialogues. PhD disserta- tion, University of Rochester, Rochester, New York.

Mann, W.C.; Moore, M.A.; Levin, J.A.; and Carlisle, J.H. 1975

Observation Methods for Human Dialogue. Technical Report

R R / 7 5 / 3 3 , Information Sciences Institute, Marina del Rey, CA.

Mann, W.C. and Thompson, S.A. 1983 Relational Propositions in

Discourse. Technical Report RR-83-115, Information Sciences

Institute, Marina del Rey, CA.

Marcus, M.P.; Hindel, D.; and Fleck, M.M. 1983 D-Theory: Talking about Talking about Trees.

Proceedings o f the 21st Annual Meeting o f the Association for Computational Linguistics.

Cambridge, MA: 129-


McKeown, Kathleen R. 1985

Press, New York, New York.

Text Generation.

Cambridge University

Polanyi, L. and Scha, R. 1983 On the Recursive Structure of

Discourse. In Ehlich, K. and van Riemsdijk, H., Eds.,

Connectedness in Sentence, Discourse and Text.

Tilburg University, Tilburg:


Polanyi, L. and Scha, R. 1984 A Syntactic Approach to Discourse


Proceedings o f International Conference on Computational


Stanford University, Stanford, CA: 413-419.

Polanyi, L. and Scha, R.J.H. forthcoming Discourse Syntax and

Semantics. In Polanyi, L., Ed.,

The Structure o f Discourse.

Publishing Co., Norwood, New Jersey.


Pollack, Martha E. 1986 Inferring Domain Plans in Question-Answer- ing. PhD dissertation, University of Pennsylvania.

Reichman, R. 1981 Plain-speaking: A theory and grammar of sponta- neous discourse. PhD dissertation, Department of Computer

Science, Harvard University. Also, BBN Report No. 4681, Bolt

Beranek and Newman Inc., Cambridge, Massachusetts.

Computational Linguistics, Volume 12, Number 3, July-September 1986 2 0 3

Barbara J. Grosz and Candace L. Sidner Attention, Intentions, and the Structure of Discourse

Reichman-Adar, R. 1984 Extended Person-Machine Interface.


cial Intelligence 22(2): 157-218.

Reinhart, T. 1981 Pragmatics and Linguistics: An Analysis of Sentence

Topics. Philosophica 27( 1 ):53-94.

Robinson, A. 1981 Determining Verb Phrase Referents in Dialogs.

American Journal o f Computational Linguistics 7 (1): 1-16.

Schank, R.C.; Collins, G.C.; Davis, E.; Johnson, P.N.; Lytinen, S.; and

Reiser, B.J. 1982 What's the Point?

Cognitive Science



Sgall, P.; Haji~ov~i, E.; and Benesova, E. 1973

Topic, Focus and Gener-

ative Semantics. Scripter Verlag, GMbH, and Co, Kronberg, Taunus,

East Germany.

Sidner, C.L. 1979 Towards a Computational Theory of Definite

Anaphora Comprehension in English Discourse. Technical Report

537, Artificial Intelligence Laboratory, Massachusetts Institute of

Technology, Cambridge, Massachusetts.

Sidner, C.L. 1982 Protocols of Users Manipulating Visually Presented

Information with Natural Language. Technical Report 5128, Bolt

Beranek and Newman Inc., Cambridge, Massachusetts.

Sidner, C.L. 1983 What the Speaker Means: The Recognition of

Speakers' Plans in Discourse.

International Journal o f Computers and

Mathematics, Special Issue in Computational Linguistics 9(1):


Sidner, C.L. 1985 Plan Parsing for Intended Response Recognition in


Computational Intelligence 1(1): 1-10.

Sidner, C.L. and Israel, D.J. 1981 Recognizing Intended Meaning and

Speaker's Plans. Proceedings of the Seventh International Joint

Conference in Artificial Intelligence. University of British Colum- bia, British Columbia, Canada: 203-208.

Walker, D. 1978 Understanding Spoken Language. Elsevier North-Hol- land, New York, New York.

Wittgenstein,L. 1953 Philosophical Investigations. Oxford University

Press, London, England.


1. The use of the phrase "linguistic structure" to refer to the struc- ture of sequences of utterances is a natural extension of its use in traditional linguistic theories to refer to the syntactic structure of individual sentences. To avoid confusion the phrase "linguistic structure" will be used in this paper only to refer to the structure of a sequence of utterances composing a discourse or discourse segment.

2. Mann has also reported that the subjects did not label segments nearly so consistently. We believe this fact is related to the kinds of relations the labels were dependent upon. As discussed in

Section 7.4, there is a difference between the intentional structure we describe and the relations that others use.

3. Referring expressions can also be used to mark a discourse bound- ary. For example, novelists sometimes use pronouns to indicate a new scene in a story.

4. These two relations are similar to ones that play a role in parsing at the sentence level: immediate dominance and linear precedence.

However, the dominance relation, like the one in Marcus and

Hindle's D-theory (Marcus et al. 1983), is partial (i.e., nonimme- diate).

5. Even in the task case the orderings may be partial. In fact, the systems built for task-oriented dialogues (Robinson 1981, Walker

1978) did not use a prebuilt tree, but constructed the tree - based on a partially-ordered model - only as a given discourse evolved.

6. The observant reader will note that this was written in the early days of the cinema, before the advent of sound; hence the quota- tion marks around "movies." Note also that utterance (7) contains a somewhat odd preposition, and utterance (16) somewhat odd definite noun phrases. We have quoted the text exactly as it was printed.

7. The segmentation omits some levels of detail. For example, utter- ances 19-24 are a segment within DSS. Rather than present this detail, we concentrate on the larger segments here so as to focus on the major issues with which this paper is concerned.

8. This modification "folds in" an informing action with the request.

Such combining of two types of speech acts is similar to the action subsumption that Appelt (1985) discusses in regard to referring expressions.

9. Hirschberg and Pierrehumbert (1986) have shown recently that intonational features, most notably pitch range, can also be used to indicate discourse segment boundaries.

10. We assume here that the OCP must recognize intentions rather than actions. The argument that such is the case is beyond the scope of this paper. At a very general level, it centers on the possibility that the very same sequence of utterance actions will correspond to two different discourse structures with the differ- ence statable only in terms of the ICP's intentions. The possibility of such sequences was suggested to us by Michael Bratman

[personal communication]. The irony contained in such a clause as

you're a real sweetheart illustrates the need to consider intentions.

11. This knowledge may be available prior to the discourse or from information supplied by previous utterances in the discourse.

12. This boundary is clearly atypical of stacks. It suggests that ulti- mately the stack model is not quite what is needed. What struc- ture should replace the stack remains unclear to us.

13. Because this is so clearly the case on other grounds, the segment boundary is obvious even to a reader after the fact.

14. From just the fragment presented, all that can be determined is that the two dominates relationships are domination but not direct domination.

15. O K is many ways ambiguous. It may also mean (at least) I heard

what you said, ! heard and intend to do what you intend me to intend,

1 am done what I undertook to do, or I approve what you are about to


16. This portion is taken from Redefinition IVB: a further redefinition deals with abstracting about audience and would unnecessarily complicate our initial view of intentions and discourse.

17. Grice (1969) mentions iconic, conventional, and associative modes, giving examples of each.

18. This analogy is meant to help clarify and motivate the discussion.

Although it also suggests some important problems in common between research on discourse and research on theories of action and intention, those issues are the subject of another paper.

19. Here again we use a notational shorthand rather than a formal language to make some of the relationships clearer.

20. This claim reflects a move analogous to the one made by Cohen and Levesque (1985) in showing that the definitions of various speech acts can be derived as lemmas within a general theory of rational behavior.

2 0 4 Computational Linguistics, Volume 12, Number 3, July-September 1986

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF