Centering and Pronominal Reference: In Dialogue, In Spanish
Department of Linguistics
Simon Fraser University
Burnaby, B.C. V5A 1S6
[email protected]
This paper describes an application of
Centering Theory (Grosz et al. 1995) to
dialogue in Spanish. Centering is a theory
of local focus and local discourse
coherence. It also relates focus in
discourse to choice of referring expression.
In this paper, I discuss the analysis of nine
task-oriented conversations. The main aim
of the paper is to establish links between
transition type and choice of pronoun,
studying correlations between those two,
and examining what other factors might be
at play when a general tendency (e.g., of
CONTINUE transitions to result in pro-drop)
is not followed. I explore whether some of
those factors could be related to
characteristics of dialogue, such as
signaling turn-taking.
Different theories attempt to account for how
anaphoric terms can be linked to their
referents, often with the goal of providing
anaphora resolution in computational systems.
Centering Theory (Grosz et al. 1995), a theory
of focus of attention in discourse, has also been
applied to the problem of anaphora resolution
(Brennan et al. 1987). Usually the perspective
is that of a language understanding system,
which needs to keep track of referents and
pronouns in discourse. In this paper, I proceed
from a different point of view, trying to
establish not what antecedent a pronoun has,
but instead what type of anaphoric term is used
in a particular context. Given the transitions
proposed by Centering, I examine the
anaphoric term (pronoun, clitic, stressed
pronoun, noun phrase, etc.) chosen for the
backward-looking center of an utterance.
To examine these issues, I studied a corpus of
dialogues in Spanish. The paper first presents a
brief introduction of Centering, and then the
application of the theory to a relatively new area,
dialogue, and to a new language, Spanish
(Sections 2 and 3). In Section 4, I present the
results of the corpus study.
1 Centering
Centering (Grosz et al. 1995) is a theory of focus
in discourse, and its relation to choice of referring
expressions. It provides rules and constraints to
explain how entities become focused as the
discourse proceeds, and how transitions from one
focus to the next make the discourse coherent. For
each utterance in the discourse, Centering
establishes a ranked list of entities mentioned or
evoked1, the forward-looking list (Cf). The first
member in the Cf list is the preferred center (Cp).
Additionally, one of the members of the Cf list is
a backward-looking center (Cb), the highestranked entity from the previous utterance that is
realized in the current utterance.
In Example (1), part of a conversation, the
speaker proceeds from having he as a focus of
attention in (1a), to we all in (1b), and back again
to he in (1c). The changes are indicated by using
explicit subjects in both (1b) and (1c), todos and
él, respectively. The centers for each utterance in
(1) are provided in (2)2.
I discuss what “evoked” means in Section 2.
Abbreviations used in the examples: 1/2/3 first/second/third person; CL - clitic; DAT - dative; NOM
- nominative; SG - singular; PL - plural; IMP imperative; INF - infinitive; FUT - future. Only those
examples that illustrate a grammatical point are
rendered in a word-by-word gloss. The rest are simply
Bos, Foster & Matheson (eds): Proceedings of the sixth workshop on the semantics and pragmatics of dialogue (EDILOG 2002),
4-6 September 2002, Edinburgh, UK, Pages 177-184.
Proceedings of EDILOG 2002
(1) CREA, uam480.per001
a. Entonces si le surge un problema,
then if he-CL-DAT arise-3SG a problem
‘So if he has any problem come up,’
b. pues todos le vamos a echar una mano,
then all he-CL-DAT go-1PL to give
a hand
‘we’ll all give him a hand,’
c. pero él es la persona responsable.
but he-NOM is the person responsible
‘but he’s the one in charge.’
(2) a. Cf: HE, PROBLEM; Cb: HE; Cp: HE
b. Cf: WE ALL, HE; Cb: HE; Cp: WE ALL
c. Cf: HE; Cb: HE; Cp: HE
In addition to the three types of centers,
Centering proposes different types of
transitions, based on the relationship between
the backward-looking centers of any given pair
of utterances, and the relationship of the Cb
and Cp of each utterance in the pair.
Transitions, shown in Table 1, capture the
introduction and continuation of new topics.
Cb(Ui) and Cp(Ui) refer to the centers in the
current utterance. Cb(Ui-1) refers to the
backward-looking center of the previous
utterance. Thus, a CONTINUE occurs when the
Cb and Cp of the current utterance are the
same and, in addition, the Cb of the current
utterance is the same as the Cb of the previous
utterance. Transitions are ranked according to
the demands they pose on the reader. The
SHIFT > ROUGH SHIFT. In (1) above, the
transition from (1a) to (1b) is a SMOOTH SHIFT,
whereas the transition from (1b) to (1c) is a
Cb(Ui )=
Cp(Ui )
Cb(Ui )≠
Cp(Ui )
Cb(Ui )=Cb(Ui-1)
or Cb(Ui-1)=0
Cb(Ui )≠Cb(Ui-1)
Table 1. Transition types.
With these constructs as a starting point, the
main goal of the paper is to show how the choice
of anaphoric terms in spoken Spanish can be
explained according to the transition type between
one utterance and the next. For this purpose, I
have carried out a corpus analysis of task-oriented
dialogues in Spanish. The speakers communicate
in order to complete a task, in this case, finding a
date when they can both meet. Before analyzing
spoken Spanish, I will describe the application of
Centering Theory to dialogue, and to Spanish.
2 Centering in Dialogue
Centering has been applied mostly to monologic
discourse. A few studies show how it can be
extended to explain the coherence of dialogue.
Brennan (1998) studied a corpus of spontaneous
dialogues, and observed problems in segmentation
and speaker change. Byron and Stent (1998)
suggest the following as the main issues regarding
the adaptation of Centering to two-party dialogue
(multi-party dialogue is yet to be addressed):
1. Utterance segmentation. Utterance boundaries
determine which entities are available for the
following utterance’s Cf list. A related issue is
how to segment complex sentences.
2. Speech phenomena. We also need to consider
the utterance status of segments before and
after pauses and hesitations, and how to
account for side sequences or self-repairs (as
studied in Conversation Analysis).
3. First and second person pronouns. In dialogue,
participants often refer to themselves and to
each other through a pronoun in subject
position. Since in the ranking of the Cf list,
subjects take precedence (at least in English),
these subjects will be ranked higher in the Cf
list. However, it is not clear whether they are
the centers of the utterance.
4. Linearity. Centering depends on the previous
utterance to determine the entities and the
focus of the current utterance. Since current
and previous might have been uttered by
different speakers, we should consider whether
to use “previous by same speaker” rather than
simply “previous”.
The model for dialogue adopted here is Byron
and Stent’s (1998) Model 1, that is, a model where
both first and second person pronouns are
included in the Cf list. In addition, utterances are
Taboada / Centering and Pronominal Reference: In Dialogue, In Spanish
consecutive: in the search for Cbn, only Cfn-1 is
searched, whether it was produced by the same
speaker or not. Byron and Stent (1998) found
that this model performed better than models
that discarded first and second person
pronouns, and models that considered previous
or current speaker’s previous utterance3.
The decision to use Model 1 settles issues
(3) and (4) above, and seems justified given
the data. I found that reference to first and
second person was important in the
conversations, and often ‘I’ is mentioned in
contrast to ‘you’ (singular) and to ‘both of us’,
because the speakers discuss what dates one of
them is available, versus what dates the other
speaker has free, and which dates suit both of
them as a unit.
Another general issue in Centering is what
to include in the Cf list, and how to interpret
the need that entities in that list be realized in
the current utterance. The definition of
‘realize’ depends, according to Walker et al.
(1998: 4), on the semantic theory one chooses.
This is of particular importance in dialogue
because it relies more than monologue on the
context outside the text proper. Particularly
difficult were decisions having to do with dates
and times, and how those are related to each
other. In general, I considered only “include”
relations (Hurewitz 1998), such that a date was
deemed to be part of the previous utterance’s
focus if it was part of a date range mentioned
there. However, when the date was not within
the time frame established, it is plausible to
think that the hearer had to construct a new
model for it. In Example (3), speaker famm4
3 Their performance measures were based on (i)
number of zero Cbs, (ii) whether the Cb that
Centering found corresponded with a loose notion
of sentence topic, and (iii) number of cheap vs.
expensive transitions. Cheap and expensive refer to
inference load on the hearer (Strube and Hahn
1999), based on whether Cpn-1 , the expected Cbn, is
actually realized as Cbn.
4 Speakers are referred to using their initials, plus
an ‘f’ or ‘m’ in front, to identify their gender. These
conventions are also used to name dialogues (e.g.,
in Table 2). ‘famm_06_12’ refers to speaker
famm’s turn 12 in conversation number 6. In the
transcripts, slashes indicate backchannels (/uh
huh/), and angle brackets false starts (<de> de).
proposes the week of the fourth, after having
discussed the previous week. However, speaker
mjnm returns to the previous week, and mentions
Friday, October 1st, i.e. a date not in the week of
the fourth. I believe this is a new entity, and
cannot be related to the immediately preceding
utterance. As it happens, this results in a zero Cb.
(3) famm_06_12: … quieres tratar la semana de
‘... do you want to try for the week of the 4th?’
mjnm_06_13: qué te parece el viernes primero
de octubre, luego de las once de la mañana?
‘what do you think of Friday October 1st, after
There are still two dialogue-related issues I
have not discussed, namely (1) and (2) above. As
for utterance segmentation, I separated compound
sentences in their component coordinate clauses,
following Kameyama (1998). For subordinates, I
separated finite subordinate clauses that have a
clear final intonation (4), but not non-finite
clauses (5) or those where there is continuing
(4) a.realmente tengo una reunión <de> desde las
diez hasta las doce.
‘I actually have a meeting <from> from 10
till 12.’
b.por tanto no creo que sea muy conveniente
ese día.
‘So I don’t think that day is very convenient.’
(5) por favor no te olvides de traer todos los
legajos. <para poder este> para tener toda la
información a mano.
‘please don’t forget to bring all the papers. <to
be able to uh> to have all the information
The issue of speech disfluencies is left open at
this point. For the present analysis, I have
discarded backchannels, repairs, and utterances
with no entities in them. On the other hand, I did
include side sequences and clarifications. These
often result in zero Cbs, at either the beginning or
the end boundary of the clarification.
Intonation is represented by orthographic signs (. , ?).
Proceedings of EDILOG 2002
3 Centering in Spanish
Centering is proposed as a cross-linguistic
universal that relates participants’ focus of
attention and choice of referring expression.
The transitions, constraints and rules are
believed to be common to all languages,
because they reflect human processing
constraints5. The only adjustment necessary
from language to language is the ordering of
the Cf list, what Cote (1998) calls the Cf
template for a language.
In English, the Cf template is considered to
correspond closely to grammatical function
and to linear order. Thus, subjects are ranked
higher than objects, and these higher than
adverbials. Walker, Iida and Cote (1994)
proposed a different ranking for Japanese,
which includes topic markers and markers of
empathy in verbs, ranked higher than subjects
or objects. Di Eugenio (1998) also ranks
empathy the highest in her template for Italian,
following Turan’s (1995) for Turkish.
Spanish is a pro-drop language; subjects do
not need to be realized as pronouns if they are
known in context. Additionally, it has direct
and indirect object clitics (unstressed
pronouns). Corresponding stressed object
pronouns are possible for animate entities only.
Animacy is a relevant feature in Spanish, in
my view. Clitics and reflexive pronouns that
refer to participants in the discourse have two
characteristics that would make them
candidates for a higher ranking: (i) they
convey empathy and (ii) they are often placed
before the verb, linearly before non-animate
direct objects. In (2b), the most salient entities
are ‘you’ and ‘me’ (captured in the clitic me);
‘the time’ should rank lower in the list.
(6) a./mm/ de todas formas <el> el martes
estaré listo.
/mm/ of all ways <the> the Tuesday beFUT-1SG ready
‘Anyway I’ll be ready on Tuesday.’
b.fíjame tú la hora
set-IMP-2SG-me-CL you the time
‘You set the time (for me)’
Walker (1998) discusses Centering as a model of
human memory.
I also ranked experiencers in psychological
and perception verbs higher in the scale,
following previous research (Turan 1995;
Brennan 1998). These are the experiencers in
verbs such as ‘interest’, ‘seem’ and ‘feel’. The
experiencer in such verbs is often expressed
through a pre-verbal clitic pronoun in Spanish (me
parece, ‘it seems to me’). Ranking animacy
higher unifies empathy in psychological and
perception verbs, and it also captures the fact that
pronouns that refer to participants are placed first
in the discourse, thus resulting in linear ordering
of the Cf list.
A proposal for a Cf template for Spanish is
presented in (7). This is a working version, and it
still needs to address other issues: subject-verb
constructions), possessives6, etc.
(7) Cf template for Spanish
empathy/animacy > subject > animate indirect
object > direct object > other
One case that I found especially difficult was
the frequent occurrence of para mí (‘for me’) as a
benefactive. The speakers often say ‘X date is not
good for me’ or ‘For me that’s not good’.
Following the template above, we could include it
as an animate indirect object. However, the
emphasis seemed to be different according to
whether para mí appeared at the beginning or the
end of the clause, possibly making this a case for
considering word order as well as grammatical
4 Centering and Reference in Spoken
This section presents the results of the corpus
analysis, an analysis of nine two-party
conversations from the Interactive Systems Lab
scheduling corpus7. These are conversations
between two speakers, with the goal of finding a
suitable time to meet. They are grouped by gender
of participants, three conversations being between
See Di Eugenio (1998) for a discussion of
7 The conversations were recorded by ISL at Carnegie
Mellon University. Thanks to ISL and Alex Waibel, its
director, for permission to use the data.
Taboada / Centering and Pronominal Reference: In Dialogue, In Spanish
% of total transitions
(excluding 0, n=181)
Cb=0 Continue
Smooth shift
Rough shift
Smooth shift
Rough shift
Table 2. Transition types per conversation.
Pro, participant
Stressed pronoun, participant
Demonstrative pronoun
Stressed IO (para mí)
Unstressed IO (clitic, me)
Pro, non-participant
NP, non-participant
Table 3. Choice of referring expression according to transition.
two females, three between two males, and
three mixed. Conversations were divided into
utterances, and centers coded for each. The
conversations had in total 2,858 words and 271
utterances. They were divided in utterances,
according to the methodology explained in
Section 2, and centers were coded for each
utterance. Overall numbers of utterances and
transitions are presented in Table 2.
In general for the entire corpus, and in
particular for each conversation, CONTINUE is
the preferred transition by far (66.85% of all
non-zero transitions in the corpus). RETAIN
and SMOOTH SHIFT are similar in overall
percentage (14.92% and 12.15%, respectively).
Finally, ROUGH SHIFT is the least frequent of all
transitions, occurring 6.08% of the time.
A number of backward-looking centers were
empty, which resulted in 81 zero transitions. A
backward-looking center is empty when none of
the entities in the previous utterance, Un-1, is
repeated in the current utterance, Un. In (8) the
entities of (8a) are not repeated in (8b), because
the speaker proceeds from the mention of ‘I’ and
‘you’ to ‘us’. This was considered to be a new
entity in the discourse. Given that Cb for (8b) is
empty, the transition is zero. Instances of empty
Cb were very common towards the end of the
conversations, which consist mainly of good-byes
and repetitions of the dates agreed upon. From a
structural point of view, a series of empty Cbs
Proceedings of EDILOG 2002
could then indicate that the conversation is
nearing its end.
(8) a.fmgl_01_10: sí. okay. /eh/ te llamo por
teléfono antes cuando yo salga de mi
oficina. /mm/?
‘yes. okay. /uh/ I’ll call you on the phone
before, when I leave my office /mm/?’
Cf: FMGL [pro], FMCS [te], PHONE [por
teléfono], FMGL [yo], OFFICE [mi oficina]
b. así <[n]> combinamos bien
‘That way <[n]> we can coordinate’
Cf: US [pro]; Cb: 0; Transition: ZERO
The focus of this paper is the relation of
transition type to type of pronoun chosen for
the backward-looking center. For each
transition pair, I looked at the linguistic
realization in the backward-looking center in
the second utterance of the pair. That
realization was then characterized according to
the categories presented in Table 3. The first
column for each transition presents raw
frequencies, and the second column the
percentage of that transition type that was
realized in each of the categories. (Although
the focus is on choice of pronoun, all the
realizations of the Cb were coded.)
The first two categories include reference to
participants as subjects, divided in stressed
pronoun or pro. In (9), the speaker refers to
herself with a stressed pronoun in the first
sentence of the turn, but continues the
reference with a zero pronoun (9b).
(9) a.a ver yo estoy de viaje del treinta y uno
hasta el miércoles junio dos, el dos de
‘Let’s see I am away from the 31st until
Wednesday the 2nd, June 2nd.’
b. o sea que [pro] no voy a poder
so [pro] no go-1SG to be-able-to-INF
‘So (I) won’t be able to make it.’
Participants can also refer to themselves
through indirect objects realized as clitics, as
in (10) and (11), or with stressed pronouns
preceded by a preposition: para mí (‘for me’),
a mí (‘to me’), a ti (‘to you’), as in (12).
Pronouns can also be in first person singular (13).
(10) me parece lo mejor dejarlo para la otra
me seem-3SG the best leave-INF-it-CL for the
other week
‘I think it’s better to leave it till the next
(11) /eh/ a qué hora te viene mejor?
/uh/ at what time you come-3SG better?
‘What time is it better for you?’
(12) para mí esto está ideal Cati.
for me that be-3SG ideal Cati.
‘For me that’s ideal, Cati.’
(13) <pero> pero no nos alcanza el tiempo porque
/eh/ tenemos hasta las cinco de la tarde.
<but> but not us reach-3SG the time because
/uh/ have-1PL until the five of the afternoon.
‘But that’s not enough time, because we
(only) have until 5pm.’
The next few categories include entities other
than the participants, circumstances such as dates
and places. These can also be realized as zero
pronouns, as in (14), where el jueves (Thursday),
the subject of parece, is implicit.
(14) a. viene mejor el jueves, <pero> por
ejemplo empezar a las dos de la tarde.
me come-3SG better the Thursday <but> for
instance start-INF at the two of the afternoon.
‘Thursday is better for me, <but> for instance
to start at 2pm.’
b.qué te parece [pro=el jueves]?
what you seem-3SG
‘What do you think (of Thursday)?’
In Spanish, it is not common to refer to a nonanimate entity with a stressed personal pronoun
(ello, ‘it’). In the corpus, the pronoun of choice
for these entities is a demonstrative (15). Finally,
reference can also be made through NPs (16), or
PPs (17).
(15) y así podemos hacer eso, y ya.
and so can-1PL do-INF that, and already.
‘And then we can do that, and that’s it.’
Taboada / Centering and Pronominal Reference: In Dialogue, In Spanish
primer lugar, no me dijiste qué lunes o
martes o miércoles.
‘Okay. First of all, you didn’t tell me what
Monday, or Tuesday, or Wednesday.’
(16) bueno. el dieciséis está bien.
okay. the sixteen is good.
‘Okay. The 16th is fine.’
(17) qué tal *pause* está tu horario en esta
siguiente semana, del ocho al doce.
how *pause* is-3SG your schedule in this
following week, from the eight to the
‘How’s... your schedule in this coming
week, from the 8th to the 12th.’
As for the relation of transition and
anaphoric term, and as was expected,
CONTINUE transitions are frequently realized
through pro-drop. I divided the use of pro-drop
according to whether the dropped pronoun
referred to a participant in the conversation or
to some other entity (dates and places). In
74.38% of all CONTINUE, the Cb referred to a
participant, and did so without an explicit
pronoun. A few Cbs, 4.13% of CONTINUE,
referred to another entity in the discourse. The
rest of the categories are small in number,
perhaps with the exception of a stressed
pronoun to refer to a participant (5.79%). The
use of a stressed pronoun in some of these
cases might have a role in turn-taking (see
In the RETAIN transitions, again the most
frequent realization was a pro, but two other
categories are interesting. First of all, dative
clitics (me, te, ‘me’, ‘you’) appear often. These
appear sometimes in the first utterance of a
new turn. In (18b) below, speaker fmcs
addresses speaker fmgl for the second time. In
(18c), the other speaker takes the turn,
addressing fmcs, but making reference to
herself (‘you didn’t tell me what Monday…’).
Since the reference to fmgl is the only
connection between the two turns, that is the
Cb for (18c).
(18)a. fmcs_01_05:
... qué te parece?
‘...what do you think?’
SMOOTH SHIFT transitions very frequently
result in a pro for the Cb of the utterance. Because
of the situated nature of the conversations, it is not
necessary to use a stressed pronoun to clarify
referents, even when a shift in focus of attention
takes place.
It is interesting to note that some of the
stressed pronouns that referred to participants
occurred right after a change of turn. There were
in total four of these (one in CONTINUE, one in
instance, in (19), a CONTINUE, speaker fcba
signals turn-yielding by asking a question of the
other speaker. Speaker fnba then starts a turn by
making reference to herself with a stressed subject
pronoun, yo.
(19) fcba_04_01: ... puedes reunirte conmigo en
‘... can you meet with me in May?’
fnba_04_02: a ver yo estoy de viaje del
treinta y uno hasta el miércoles junio dos, el
dos de junio,...
‘Let’s see, I’m away from the 31st to
Wednesday June 2, June 2nd,...’
This phenomenon opens up the possibility of
establishing points of contact between Centering
and Conversation Analysis (e.g., Sacks et al.
1974). There might also be a connection between
transition type and turn-taking. Example (20)
shows the last two utterances of a long turn in
which the speaker presents his available dates, all
resulting in CONTINUE transitions. The very last
utterance in the turn introduces the other speaker
with an imperative8, but the speaker retains a
reference to himself in the clitic me. This is the
first RETAIN in the sequence, preparing for the
turn change.
b.fíjate tu horario, a ver qué tal te viene.
‘Check your schedule, and see whether
that’s good for you.’
Subjects of imperatives (i.e., the addressee) are
considered entities in the discourse.
Proceedings of EDILOG 2002
(20) viernes puedo todo el día.
‘Friday I can all day.’
entonces mientras para no hacértelo más
difícil, dime si puedes uno de estos días.
‘So in the meantime so that this doesn’t get
more difficult, let me know if you can one
of these days.’
Other phenomena that cannot be discussed
here include null objects and their relationship
to evoked entities, and the status of clausal
subjects. I would also like to explore the
relationship of stressed indirect objects (para
mí) to their unstressed counterparts (clitics).
grammatically optional, and they can appear
towards the beginning or the end of the clause.
This paper has presented an application of
Centering Theory to dialogue in Spanish. A
corpus analysis of nine task-oriented
conversations shows certain correlations
between transition type and choice of referring
expression. I have also discussed some of the
difficulties of applying Centering to dialogue,
and how to establish the Cf template for
Spanish. Future directions of this work include
analysis of other corpora, especially written
data, in order to provide a general
characterization of certain phenomena in
Spanish such as zero pronoun, clitic doubling,
and choice of definite/indefinite article.
Another extension will compare these results
to both spoken and written English.
Further research is also needed to establish
the adequacy of the proposed Cf template for
Spanish. A measure would be its success in
resolving anaphora.
The final aim of the study is to formalize
the relationship between anaphoric terms and
transitions, so that the formalization can be
used in a computational application.
