Modality and Negation in Natural Language Processing

Modality and Negation in Natural Language Processing
Modality and Negation in Natural Language
Processing
Roser Morante
CLiPS - University of Antwerp
November 8, 2011
IJCNLP 2011 Tutorial, Chiang Mai, Thailand
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
1 / 448
Negation
Somalia has not had an effective central government
since 1991, when the former government was toppled by
clan militias that later turned on each other. For
decades, generals, warlords and warrior types have
reduced this once languid coastal country in Eastern
Africa to rubble. Somalia remains a raging battle zone
today, with jihadists pouring in from overseas, intent on
toppling the transitional government.
No amount of outside firepower has brought the country
to heel. Not thousands of American Marines in the early
1990s. Not the enormous United Nations mission that
followed. Not the Ethiopian Army storming into Somalia
in 2006. Not the current African Union peacekeepers,
who are steadily wearing out their welcome.
Source: http://topics.nytimes.com/top/news/international/
countriesandterritories/somalia/index.html
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
2 / 448
Modality
Environmentalists may regard such schemes with mixed
feelings. Carbon-neutral extraction would do nothing to
cut the bulk of oil-related emissions that come from
combustion. Eco-friendlier tar sands could also
encourage unconventional development elsewhere:
Jordan, Madagascar, Congo and Venezuela, where the
government claims a reserve of bitumen even greater
than Alberta’s, may be less open to environmental
scrutiny. Kill Alberta’s tar sands, say some, and rising
crude prices would choke oil consumption and force an
era of clean energy into being.
Source: http://www.economist.com/node/17959688?story_id=17959688
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
3 / 448
Uncertainty about exoplanets
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
4 / 448
Statements about exoplanets
Same proposition, different meanings
Other types of life have taken root in planet HD85512b
Other types of life could conceivably take root in planet HD85512b
Have other types of life taken root in planet HD85512b?
Other types of life will never take root in planet HD85512b
Other types of life might have taken root in planet HD85512b
If 60% of the planet is covered in cloud, other types of life will probably
take root in planet HD85512b
It is expected that other types of life have taken root in planet
HD85512b
It has been denied that other types of life have taken root in planet
HD85512b
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
5 / 448
Opinions about exoplanets
just think we could have come from those planets in the first place. Maybe
we screwed up those first and had to get away from them. And possibly we
lost the technology due to war of our people from along time ago. maybe the
(otherside) won to the point of almost human annilation here on earth and
we are just survivors from the war. once you colonize a planet or lands you
kill the competion. you win. them stupidity follows suit again history repeates
it self. we need to realize we are from the universe understandin ourselves.
Source:
http://www.space.com/
12918-habitable-alien-planet-hd-85512b-super-earth-infographic.html
Last consulted 14 October 2011
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
6 / 448
Outline
Part
Part
Part
Part
1:
2:
3:
4:
Introduction: Modality and Negation
Categorising and Annotating Modality and Negation
Tasks Related to Processing Modality and Negation
Modality and Negation in Applications
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
7 / 448
Part I
Introduction: Modality and Negation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
8 / 448
Outline
1
Defining modality
Related concepts
2
Defining negation
3
Why is it interesting to process modality and negation?
4
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
9 / 448
Outline
1
Defining modality
Related concepts
2
Defining negation
3
Why is it interesting to process modality and negation?
4
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
10 / 448
Defining modality
Modality (von Fintel 2006)
“Modality is a category of linguistic meaning having to do with the
expression of possibility and necessity. A modalized sentence locates an
underlying or prejacent proposition in the space of possibilities
Sandy might be home
says that there is a possibility that Sandy is home.
Sandy must be home
says that in all possibilities, Sandy is home.”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
11 / 448
Defining modality
Modality as displacement (von Fintel 2006)
“The counterpart of modality in the temporal domain should be called
“temporality”, but it is more common to talk of tense and aspect, the
prototypical verbal expressions of temporality.
Together, modality and temporality are at the heart of the property of
“displacement” (...) that enables natural language to talk about affairs
beyond the actual here and now.”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
12 / 448
Defining modality
Modality, tense, aspect (Palmer 2001)
Tense: is concerned with the time of the event
Aspect: is concerned with the nature of the event, in terms of the
internal temporal constituency
Modality: is concerned with the status of the proposition that describes
the event
All three are categories of the clause
All three are concerned with the event or situation that is reported by
the utterance
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
13 / 448
Defining modality
Expressions with modal meanings (von Fintel 2006)
1
Modal auxiliaries
Sandy must/should/might/may/could be home
2
Semimodal Verbs
Sandy has to/ought to/needs to be home
3
Adverbs
Perhaps, Sandy is home
4
Nouns
There is a slight possibility that Sandy is home
5
Adjectives
It is far from necessary that Sandy is home
6
Conditionals
If the light is on, Sandy is home
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
14 / 448
Defining modality
Distribution of modal cues in different text types
(Table from Thompson et al. (2008) Categorising modality in biomedical texts.
Proceedings of LREC 2008, page 27.)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
15 / 448
Defining modality
Modality categories (Palmer 2001)
Propositional modality: speaker’s judgement of the truth value
or factual status of the proposition
Epistemic: speakers express judgement about the factual status of the
proposition
I
I
I
Speculative: express uncertainty
John may be in his office
Deductive: indicate an inference from observable evidence
John must be in his office, the lights are on
Assumptive: indicate inference from what is generally known
John’ll be in his office, he is always there at this time
Evidential: speakers indicate the evidence they have about the factual
status of the proposition
I
I
Reported
Sensory
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
16 / 448
Defining modality
Modality categories (Palmer 2001)
Event modality: speaker’s attitude towards a potential future
event that has not taken place
Deontic: relates to obligation or permission
I
I
I
Permissive: John can come in now
Obligative: John must come in now
Commissive: John promises to come back
Dynamic: relates to ability or willingness
I
I
Abilitive: John can speak French
Volitive: John will do it for you
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
17 / 448
Defining modality
Types of modal meaning (von Fintel 2006)
Epistemic modality concerns what is possible or necessary given what
is known and what the available evidence is.
Deontic modality concerns what is possible, necessary, permissible, or
obligatory, given a body of law or a set of moral principles or the like.
Bouletic modality concerns what is possible or necessary, given a
person’s desires.
Circumstantial modality concerns what is possible or necessary, given a
particular set of circumstances.
Teleological modality concerns what means are possible or necessary
for achieving a particular goal.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
18 / 448
Defining modality
’Have’ - ambiguity of modality triggers (von Fintel 2006)
1
It has to be raining.
[after observing people coming inside with wet umbrellas; epistemic
modality]
2
Visitors have to leave by six pm.
[hospital regulations; deontic]
3
You have to go to bed in ten minutes.
[stern father; bouletic]
4
I have to sneeze.
[given the current state of one’s nose; circumstantial]
5
To get home in time, you have to take a taxi.
[telelological]
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
19 / 448
Defining modality
Another classification (Portner 2009) depending on at what level the modal
meaning is expressed
Sentential modality: at the level of the sentence
Modal auxiliaries, sentential adverbs
Sub-sentential modality: at the level of constituents smaller than a full
clause
Within the predicate, modifying a noun phrase, verbal mood
Discourse modality: any contribution of modality to meaning in
discourse
Any modal meaning that it is not part of sentential truth conditions
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
20 / 448
Defining modality
Sentential modality (Portner 2009)
Modal auxiliaries and modal verbs: must, can might, should, ...
Modal adverbs: ought, need (to)
Generics, habituals and individual level predicates:
I
I
I
G: A dog is a wonderful animal
H: Ben drinks chocolate milk
ILP: Noah is smart
Tense and aspect: future, use of past to express “unreality”,
progressive, perfect
Even if Mary stayed until tomorrow, I’d be sad
Conditionals: if ... then constructions
Covert modality: it seems that no overt material in the sentence
expresses modal meaning
I
Ben knows how to solve the problem = ‘Tim knows how he can solve the
problem’
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
21 / 448
Defining modality
Sub-sentential modality (Portner 2009)
Modal adjectives and nouns: possible, necessary, certain, possibility, ...
Propositional attitude verbs and adjectives: believe, hope, know,
remember, certain, pleased, ...
Verbal mood: indicative, subjunctive
Infinitives
Dependent modals
I’d be surprised if David should win
Negative polarity items: words and phrases that must be licensed by
another element
David will *(not) ever leave
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
22 / 448
Defining modality
Discourse modality (Portner 2009)
Evidentiality: a speaker’s assessment of her grounds for saying
something
Clause types: declarative, interrogative, and imperative sentences
Performativity of sentential modals
Modality in discourse semantics, modal subordination: pragmatic
phenomenon in which one sentence involving (sentential) modality
affects the interpretations of subsequent modal sentences
I
John might go to the store. He should buy some fruit
Meaning of second sentence: ‘If he goes to the store, he should buy some
fruit’.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
23 / 448
Defining modality
Source: J. van der Auwera and V. Plungian (2008) Epistemic possibility. In: Haspelmath, M., Dryer, M.
S., Gil, D. and Comrie, B. (eds.) The World Atlas of Language Structures Online. Munich: Max Planck
Digital Library, chapter 39. Available online at
http://wals.info/refdb/record/van-der-Auwera-and-Plungian-1998. Accessed on 21 Jan 2011
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
24 / 448
Defining modality
(Baker et al. 2010)
“Modality might be construed broadly to include several types of
attitudes that a speaker might have toward an event or state”.
Modality might indicate:
I
I
I
Factivity is related to whether an event, state, or proposition happened
or didn’t happen.
It distinguishes things that happened from things that are desired,
planned, or probable.
Evidentiality deals with the scope of information and may provide clues
to the reliability of the information.
Did the speaker have first hand knowledge of what he or she is reporting
or was it inferred from indirect evidence?
Sentiment deals with a speaker’s positive or negative feelings toward an
event, state, or proposition’.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
25 / 448
Related concepts: Evidentiality
Evidentiality (Aikhenvald 2003)
“In a number of languages, the nature of the evidence on which a statement
is based must be specified for every statement - whether the speaker saw it,
or heard it, or inferred from indirect evidence, or learnt it from someone else.
This grammatical category, referring to an information source, is called
‘evidentiality’.”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
26 / 448
Related concepts: Evidentiality
Evidentiality (von Fintel 2006)
“Various languages regularly add markers, inflectional or otherwise, to
sentences that indicate the nature of the evidence that the speaker has for
the prejacent proposition.
A typical evidential system might centrally distinguish between direct
evidence and indirect evidence.”
“The standard European languages do not have elaborate evidential systems
but find other ways of expressing evidentiality when needed”
1
Kim has apparently been offered a new job
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
27 / 448
Related concepts: Hedging
Lakoff’s (1972) hedges
“Words whose job it is to make things more or less fuzzy”.
Hyland’s (1998) hedging
“Linguistic devices used to qualify a speaker’s confidence in the truth of
a proposition, the kind of caveats like I think, perhaps, might and maybe
which we routinely add to out statements to avoid commitment to
categorial assertions.”
Any linguistics means used to indicate either
I
I
a) a lack of complete commitment to the truth value of an accompanying
proposition, or
b) a desire not to express that commitment categorically.”
“Hedging is one part of epistemic modality; it indicates an unwillingness
to make an explicit and complete commitment to the truth of
propositions”.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
28 / 448
Related concepts: Hedging
Hyland categories of surface realizations of hedging in scientific
articles
Lexical
Modal auxiliaries: may, might, could, would, should
Epistemic judgment verbs: suggest, indicate, speculate, believe, assume
Epistemic evidential verbs: appear, seem
Epistemic deductive verbs: conclude, infer, deduce
Epistemic adjectives: likely, probable, possible
Epistemic adverbs: probably, possibly, perhaps, generally
Epistemic nouns: possibility, suggestion
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
29 / 448
Related concepts: Hedging
Hyland categories of surface realizations of hedging in scientific
articles
Non-lexical features
Reference to limiting experimental conditions, reference to a model or
theory or admission to a lack of knowledge.
Their surface realizations typically go beyond words and even phrases.
I
Whereas much attention has focused on elucidating basic mechanisms
governing axon development, relatively little is known about the genetic
programs required for the establishment of dendrite arborization patterns
that are hallmarks of distinct neuronal types.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
30 / 448
Related concepts: Hedging
Hedge examples from Medlock and Briscoe (2007)
Dl and Ser have been proposed to act redundantly in the sensory bristle
lineage
How endocytosis of Dl leads to the activation of N remains to be
elucidated
A second important question is whether the roX genes have the same,
overlapping or complementing functions
To test whether the reported sea urchin sequences represent a true
RAG1-like match, we repeated the BLASTP search against all GenBank
proteins
This hypothesis is supported by our finding that both pupariation rate
and survival are affected by EL9
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
31 / 448
Related concepts: Hedging
Hedge instances as defined in Medlock and Briscoe (2007)
Speculative question
A second important question is whether the roX genes have the same,
overlapping or complementing functions
Statement of speculative hypothesis
To test whether the reported sea urchin sequences represent a true
RAG1-like match, we repeated the BLASTP search against all GenBank
proteins
Anaphoric hedge reference
This hypothesis is supported by our finding that both pupariation rate
and survival are affected by EL9
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
32 / 448
Related concepts: Hedging
What is not hedging? (Medlock and Briscoe 2007)
Indication of experimentally observed nonuniversal behaviour
Proteins with single BIR domains can also have functions in cell cycle
regulation and cytokinesis
Confident assertion based on external work
Two distinct E3 ubiquitin ligases have been shown to regulate Dl
signaling in Drosophila melanogaster
Statement of existence of proposed alternatives
Different models have been proposed to explain how endocytosis of the
ligand, which removes the ligand from the cell surface, results in N
receptor activation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
33 / 448
Related concepts: Hedging
Experimentally-supported confirmation of previous speculation
Here we show that the hemocytes are the main regulator of adenosine in
the Drosophila larva, as was speculated previously for mammals
Negation of previous hedge
Although the adgf-a mutation leads to larval or pupal death, we have
shown that this is not due to the adenosine or deoxyadenosine simply
blocking cellular proliferation or survival, as the experiments in vitro
would suggest
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
34 / 448
Related concepts: Factuality
Factuality (Saurı́ and Pustejovsky 2009)
“Information conveying whether events mentioned in text correspond to real
situations in the world or, instead, to situations of uncertain status.”
“The level of information expressing the commitment of relevant sources
towards the factual nature of events mentioned in discourse”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
35 / 448
Related concepts: Factuality
(Saurı́ and Pustejovsky 2009)
Events in language are couched in terms of a continuum that ranges
from truly factual to counterfactual
Depending on the polarity, events are then depicted as either facts or
counterfacts
Five U.N. inspection teams visited a total of nine other sites
The size of the contingent was not disclosed
Depending on the level of uncertainty combined with polarity, events will
be presented as possibly factual or possibly counterfactual
United States may extend its naval quarantine to Jordan’s Red Sea port
of Aqaba
They may not have enthused him for their particular brand of political
idealism
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
36 / 448
Related concepts: Factuality
Linguistic means of expressing factuality (Saurı́ and Pustejovsky 2009)
Polarity particles express the positive or negative factuality of events
mentioned in text (no, not)
Modality particles contribute different degrees of certainty to a given
event
Event-selecting predicates: predicates that select for an argument
denoting an event of some sort
I
They project factuality information on the event denoted by its argument
through syntactic means: claim, suggest, promise, offer, avoid, try, delay,
think
The Human Rights Committee regretted that discrimination against
women persisted in practice
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
37 / 448
Related concepts: Factuality
Linguistic means of expressing factuality (Saurı́ and Pustejovsky 2009)
Syntactic constructions: some syntactic constructions involving
subordination introduce factuality information of some sort.
I
I
The embedded event is presupposed as holding as fact
Rice, who became secretary of state two months ago today, took stock of
a period of tumultuous change
The embedded event is presented as underspecified with respect to its
factuality status
The environmental commission has adopted regulations to ensure that
people are not exposed to radioactive waste
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
38 / 448
Related concepts: Factuality
Linguistic means of expressing factuality (Saurı́ and Pustejovsky 2009)
Discourse structure: Some events may first have their factual status
characterized in one way, but then be presented differently in a
subsequent sentence
Yesterday, the police denied that [drug dealers were tipped off before the
operation]. However, it emerged last night that [a reporter from London
Weekend Television unwittingly tipped off residents about the raid] when he
phoned contacts on the estate to ask if there had been a raid–before it had
actually happened
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
39 / 448
Defining modality: Subjectivity
Term introduced by Banfield (1982)
Work on subjectivity in computational linguistics is initially due to
Wiebe, Wilson, and collaborators (Wiebe 1994, Wiebe et al 2001, 2005,
Wilson et al 2006, Wilson 2008, ...) and focuses on learning subjectivity
from corpora
Wiebe et al 2004
“Subjectivity is language used to express private states in the context of a
text or conversation. Private state is a general covering term for opinions,
evaluations, emotions, and speculations.”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
40 / 448
Defining modality: Subjectivity
(Wiebe et al. 2001, 2004)
Main types of subjectivity
I
I
Evaluation: emotions, evaluations, judgements, opinions
Speculation: “anything that removes the presuppositions of events
occurring or states holding, such as speculation and uncertainty”.
Many expressions are not subjective in all contexts. A subjective element
is an instance of a potential subjective element, in a particular context,
that is subjective in that context
A subjective element expresses the subjectivity of a source
There can be multiple subjective elements in a sentence, of different
types and attributed to different sources and targets
Subjective elements might be complex expressions
Syntactic or morphological devices may also be subjective elements
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
41 / 448
Defining modality: Certainty
Rubin et al. 2005
“Certainty is viewed as a type of subjective information available in texts and
a form of epistemic modality expressed through explicitly-coded linguistic
means
Such devices as subjectivity expressions, epistemic comments,
evidentials, reporting verbs, attitudinal adverbials, hedges, shields,
approximators, understatements, tentatives, intensifiers, emphatics,
boosters, and assertives, often overlap in their definitions, classifications,
and lexical representations in English
They explicitly signal presence of certainty information that covers a full
continuum of writer’s confidence, ranging from uncertain possibility and
withholding full commitment to statements”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
42 / 448
Outline
1
Defining modality
Related concepts
2
Defining negation
3
Why is it interesting to process modality and negation?
4
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
43 / 448
Defining negation
Lawler (2007)
“Negation is a linguistic, cognitive, and intellectual phenomenon. Ubiquitous
and richly diverse in its manifestations, it is fundamentally important to all
human thought. As Horn and Kato 2000 put it:
“Negative utterances are a core feature of every system of human
communication and of no system of animal communication. Negation and
its correlates – truth-values, false messages, contradiction, and irony – can
thus be seen as defining characteristics of the human species.” (p.1)”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
44 / 448
Defining negation
In natural language, negation functions as an operator along with
quantifiers and modals
Operators have a scope: elements to which negative, modals and
quantifiers refer are in the scope of the negative operator
Negation interacts with other operators (modals, quantifiers) in complex
ways
I
Ambiguous cases:
I
Idiosyncratic combination with modals:
F
F
F
Every boy didn’t leave
Deontic may not: You may not go (‘not possible’)
Epistemic may not: This may not be the place (‘possibly not’)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
45 / 448
Defining negation: Philosophical tradition
Negation has been studied from a philosophical perspective since
Aristotle (2500 years ago!)
It has been studied in terms of truth values: how does the truth value of
a sentence change if we add a negative element?
Aristotle distinctions:
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
46 / 448
Defining negation: Philosophical tradition
Contradictory negation
Contrary negation
Found in pairs such as:
Found in pairs such as:
Socrates is sitting
Socrates is a good man
Socrates is not sitting
Socrates is a bad man
If one is true the other is necessarily
false.
*Socrates is neither sitting nor not
sitting
The following truth values apply:
p ¬p
T
F
F
T
R. Morante (CLiPS - University of Antwerp )
Only one sentence can be true at any
point in time and both sentences can
be false at the same time.
Socrates is neither a good man nor a
bad man
Modality and Negation in NLP
November 8, 2011
47 / 448
Defining negation: Philosophical tradition
Law of Contradiction
A statement cannot be true and false at the same time
¬ ∃ x (Px ∧ ¬ Px)
Applies to contrary and contradictory negation
Law of the Excluded Middle
A statement must be either true or false
∀ x (Px ∨ ¬ Px)
Applies to contradictory negation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
48 / 448
Defining negation: Philosophical tradition
Aristotle studies also negation combined with quantifiers
1
Every man is white
2
Some men are white
3
No man is white
4
Not every man is white
The pairs (2, 3) and (1, 4) are contradictory. LC and LEM apply
The pair (1, 3) is a contrary pair to which the LC applies
These oppositions are represented in the Square of Oppositions
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
49 / 448
Defining negation: Philosophical tradition
Square of Oppositions by Apuleius and Boethius
AI vs. EO: affirmative vs. negative opposition
AE vs. IO: universal vs. particular
More information: Stanford Encyclopedia of Philosophy; http://plato.stanford.edu/entries/square/
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
50 / 448
Defining negation: Logics
Partee (2007)
Negation ¬ is a unary (or monadic) sentential operator.
Monadic means it has just one argument, unlike ∨ and ∧, which are
binary
Sentential means that its argument must be a formula, an expression of
type t.
Semantically it is a function of type t → t: It maps 1 onto 0 and 0 onto
1.
In predicate logic and propositional logic, it is assumed that every
formula gets a truth value (relative to a model and an assignment)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
51 / 448
Defining negation: Grammar
R.L. Trask (1993) A dictionary of grammatical terms in linguistics.
Routledge.
Negation
The presence of a negative in a sentence or constituent, or the addition of
such an element, or the effect of such an element when present.
Negative
1. A grammatical element which, when added to a sentence expressing a
proposition, reverses the truth value of that proposition. [...] A negative
element is an operator which takes some part of its sentence as its scope;
that scope may be the entire proposition [...] or only some part of it [...]
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
52 / 448
Defining negation: Grammar
R.L. Trask (1993) A dictionary of grammatical terms in linguistics.
Routledge.
Negative concord
The phenomenon by which the presence of an overt negative requires other
elements in the sentence to be marked as negative.
Sp. No he visto nada
Eng. I didn’t see anything
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
53 / 448
Defining negation: Grammar
Negative polarity item
Any of various items which can only occur within the scope of a negative and
possibly also in certain other specified grammatical circumstances, notably in
questions.
We don’t have any wine
Do we have any wine?
any, anyone, anything, anywhere, ever, at all; give a damn, lift a finger, move
a muscle, pay the slightest attention
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
54 / 448
Defining negation: Traditional negation problems
Truth value and presuppositions
1
The King of France is not bald
If France is a mononarchy, the proposition is either true or false
Presupposition: there is a King of France
Internal negation
If France is a republic, does the proposition have a truth value? It can
continue as:
1
The King of France is not bald, because there is no King of France
The presupposition that there is a King of France is cancelled
External negation
According to Horn (1985) this is case of pragmatic ambiguity and
metalinguistic negation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
55 / 448
Defining negation: Traditional negation problems
Scope of negation and quantifiers
1
All the boys did not leave
Interpretation 1: not boys at all left, but some did
Interpretation 2: all the boys stayed
Use of negative polarity items: use of some versus any
Neg-raising
1
I don’t think he is here
2
I think that he is not here
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
56 / 448
Defining negation: Types
Clausal negation (Tottie 1991)
Denials
The audio system on this television is not very good, but the picture is
amazing.
Rejections: one participant rejects an offer or suggestion of another.
Appear in expository text where a writer explicitly rejects a previous
supposition or expectation
Given the poor reputation of the manufacturer, I expected to be
disappointed with the device. This was not the case.
Imperatives: directing an audience away from a particular action
Do not neglect to order their delicious garlic bread
Questions
Why couldn’t they include a decent speaker in this phone?
Supports and repetitions: express agreement and add emphasis or
clarity
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
57 / 448
Defining negation: Types
Sentential versus inter-sentential negation
Intersentential negation
The language used in one sentence may explicitly negate a proposition or
implication found in another sentence:
Rejections and supports
Sentential negation
Negations within the scope of a single sentence:
Sentential denials, imperatives, and questions
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
58 / 448
Defining negation: Types
Clausal versus constituent negation (Payne 1997)
Clausal negations negate an entire proposition
I don’t have books
Constituent negation is associated with particular constituents or clauses
I have no books
The effect of clausal and consituent negation can be very similar or
identical, but constituent negation is less common as a grammatical
device
Most languages posses more than one type of clausal negation. The
functional difference has to do with:
I
I
I
Negation of existence
Negation of fact
Negation of different aspects, modes or speech acts
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
59 / 448
Defining negation: Types of clausal negation (Payne 1997)
Lexical negation
“Describes a situation in which the concept of negation is part and parcel of
the lexical semantics of a particular verb.”
Lack as the lexical negative of have
Morphological negation
Morphemes that express clausal negation are associated with the verb
Analytic negation
Negative particles are normally associated with the main verb of the clause
Negative particles: n’t, not, never
Finite negative verbs (not in English)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
60 / 448
Defining negation: Types of clausal negation (Payne 1997)
Derivational negation
“Languages will allow a stem to change into its “opposite” by use of some
derivational morphology.
Prefixes: unhappy, non-smoker
Suffixes: motionless
Negative quantifiers
“Many languages employ quantifiers that are either inherently negative (none,
nothing) or are negated independently of clausal negation (not many).”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
61 / 448
Defining negation: Negation atlas
http://wals.info
Chapter 112: negative morphemes by Matthew S. Dryer
Shows the nature of morphemes signalling clausal negation in declarative
sentences
All of the ways of indicating negation involve negative morphemes
There are no known instances of languages in which negation is realized
by a change in word order or by intonation
All languages have negative morphemes
Both negative particles and negative affixes are widely distributed
throughout the world
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
62 / 448
Defining negation: Negation atlas
neg. affix, neg. particle, neg. aux. verb, neg. word, neg. wordaffix, double neg.
http://wals.info/feature/112
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
63 / 448
Defining negation: Negation vs negative polarity
Negation: grammatical phenomenon used to state that some event,
situation, or state of affairs does not hold.
Polarity: a relation between semantic opposites.
I
“polarity encompases not just the logical relation between negative and
affirmative propositions, but also the conceptual relations defining
contrary pairs like hot–cold, long–short, and good–bad” (Israel 2004).
The relation between negation and polarity lies in the fact that negation
can reverse the polarity of an expression.
In the context of sentiment analysis positive and negative polarity is used
in the sense of positive and negative opinions, emotions, and evaluations.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
64 / 448
Outline
1
Defining modality
Related concepts
2
Defining negation
3
Why is it interesting to process modality and negation?
4
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
65 / 448
Why is it interesting to process modality and negation?
Some NLP applications aim at extracting factual information from texts.
As Prabhakaran et al. (2010) put it: “There is more to “meaning than”
just propositional content”:
1
2
3
4
5
6
7
GM will lay off workers
A spokesman for GM said GM will lay off workers
GM may lay off workers
The politician claimed that GM will lay off workers
Some wish GM would lay of workers
Will GM lay off workers?
Many wonder if GM will lay off workers
Examples from Prabhakaran et al. (2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
66 / 448
Why is it interesting to process modality?
(Saurı́ and Pustejovsky 2009)
Opinion Mining: the same situation can be presented as a fact in the
world, a mere possibility, or a counterfact according to different sources.
Textual Entailment:
I
I
Factuality-related information has been taken as a basic feature in some
systems using the data from PASCAL RTE challenges (Tatu and
Moldovan 2005, de Marneffe et al. 2006, and Snow and Vanderwende
2006).
The system that obtained the best absolute result in the three RTE
challenges, scoring an 80% accuracy (Hickl and Bensley 2007), is based
on identifying the set of publicly-expressed beliefs of the author
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
67 / 448
Why is it interesting to process modality and negation?
Textual Entailment Dagan et al. include negation and modality as an
aspect of the logical structure:
Slide borrowed from the Tutorial on Textual Entailment - ACL 2007, by Ido Dagan, DanRoth and Fabio
Massimo Zanzotto. www.cs.biu.ac.il/~dagan/TE-Tutorial-ACL07.ppt
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
68 / 448
Why is it interesting to process modality and negation?
Summarization
Fiszman et al. (2006) report that the majority of the system errors were
due to two phenomena: missed negation and complicated sentence
structure.
I
Example of missed negation:
Selegiline was found unable to inhibit deamination of beta-PEA.
I
System output:
Selegiline INTERACTS WITH Phenethylamine
M. Fiszman, Th. C. Rindflesch, and H. Kilicoglu (2006) Summarizing Drug Information in
Medline Citations. AMIA Annu Symp Proc. 2006; 2006: 254–258.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
69 / 448
Why is it interesting to process modality and negation?
Information Extraction
The atovaquone/proguanil combination has not been widely used
yet in West Africa so it is unlikely that the patient was initially
infected with an atovaquone-resistant strain.
Extracted information that falls under the scope of a negation signal
cannot be presented as factual information (Vincze et al. 2008)
More than 13 % of the sentences in the BioScope corpus contain
negation signals (Szarvas et al. 2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
70 / 448
Why is it interesting to process modality and negation?
Information Extraction
Not being able to recognize negation can hinder automated indexing
systems (Mutalik et al. 1991)
Approximately half of the conditions indexed in dictated reports are
negated (Chapman et al. 2001)
Negation status was the most important feature for classifying patients
based on whether they had an acute lower respiratory syndrome;
including negation status contributed significantly to classification
accuracy (Chu et al. 2006)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
71 / 448
Why is it interesting to process modality and negation?
Medlock 2008
“it is clear that interactive bioinformation systems that take account of
hedging can render a significantly more effective service to curators and
researchers alike”
30% of sentences in the results and discussion sections of biomedical
papers contain speculative assertions, and this figure increases to around
40% for the conclusions section (Mercer and Marco 2004) (Medlock
2008)
a significant part of the gene names mentioned (638 occurences out of a
total of 1968) appears in a speculative sentence. This means that
approximately 1 in every 3 genes should be excluded from the interaction
detection process (Szarvas 2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
72 / 448
Why is it interesting to process modality and negation?
Biomedical information extraction
Light 2004
“The scientific process involves making hypotheses, gathering evidence, using
inductive resoning to reach a conclusion based on the data, and then making
new hypotehses. Scientist are often not completely certain of a conclusion.
This lack of definite belief is often reflected in the way scientists discuss their
work”’
(Light 2004)
11% sentence in MEDLINE contain speculative language
Extracting tables of protein-protein interactions would benefit from
knowing which interactions were speculative and which were definite
In the context of knowledge discovery (KR), current speculative
statements about a topic of interest can be used as a seed for the
automated knowledge discovery process.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
73 / 448
Why is it interesting to process modality and negation?
Examples from Light (2004)
1
Pdcd4 may thus constitute a useful molecular target for cancer
prevention. (1131400)
2
On the basis of these complementary results, it has been concluded that
curcumin shows very high binding to BSA, probably at the hydrophobic
cavities inside the protein. (12870844)
3
Removal of the carboxy terminus enables ERP to interact with a variety
of ets-binding sites including the E74 site, the IgH enhancer pisite, and
the lck promoter ets site, suggesting a carboxy-terminal negative
regulatory domain. (7909357)
4
Results suggest that one of the mechanisms of curcumin inhibition of
prostate cancer may be via inhibition of Akt. (12682902)
5
To date, we find that the signaling pathway triggered by each type of
insult is distinct. (10556169)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
74 / 448
Why is it interesting to process modality and negation?
Biomedical information extraction
Biomedical information extraction focuses on identifying biomedical
entities and their relations
Biomedical information retrieval focuses on finding documents that are
relevant for specific database curation tasks
“However, the fact that a gene is mentioned, and even information about it is
provided, does not necessarily imply that the information is reliable or useful
in satisfying the scientist’s information need (Shatkay et al. 2008).”
“We believe that an important first step towards more accurate text-mining
lies in the ability to identify and characterize text that satisfies various types
of information needs.” (Wilbur et al. 2006)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
75 / 448
Outline
1
Defining modality
Related concepts
2
Defining negation
3
Why is it interesting to process modality and negation?
4
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
76 / 448
References: Modality
Bybee, J. L., R. Perkins, and W. Pagliuca (1994) The Evolution of Grammar. Tense,
Aspect, and Modality in the Languages of the World. Chicago: University of Chicago
Press
de Haan, F. (1997) The Interaction of Modality and Negation. A typological study.
New York: Garland Publishing, Inc.
Frawley, W. (2005) (ed.) The expression of modality. Berlin: Mouton de Gruyter
Palmer, F. R. (1979) Modality and the English modals. London: Longman
Palmer, F. R. (2001) Mood and Modality. Cambridge: CUP
Portner, P. (2009) Modality. Oxford University Press, Oxford, UK
Rizomilioti, V. (2006). Exploring Epistemic Modality in Academic Discourse Using
Corpora. Information Technology in Languages for Specific Purposes (7), pp. 53-71
Salkie, R., P. Busuttil and J. van der Auwera (2009) (eds.). Modality in English.
Theory and Description. Berlin: Mouton de Gruyter
von Fintel, K. (2006) “Modality and Language”. In Encyclopedia of Philosophy Second Edition, edited by D. M. Borchert. Detroit: MacMillan Reference USA. Most
recent version online at http://mit.edu/fintel/www/modality.pdf
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
77 / 448
References: Evidentiality
Aikenvald, A. (2004) Evidentiality. Oxford: Oxford University Press
Cornillie, B. (2009) Evidentiality and epistemic modality. Functions of Language 16:1,
44-62
Wallace, Ch. and J. (1986) (eds.) Evidentiality: The Linguistic Coding of
Epistemology. Norwood, New Jersey: Ablex
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
78 / 448
References: Certainty
Rubin, V.L., E. D. Liddy, and N. Kando (2005) Computing Attitude and Affect in
Text: Theory and Applications, chapter Certainty identification in texts:
Categorization model and manual tagging results. Springer-Verlag, New York
Rubin, V.L. (2006) Identifying certainty in texts. Ph.D. thesis, Syracuse, NY, USA
Rubin, V.L. (2007) Stating with certainty or stating with doubt: intercoder reliability
results for manual annotation of epistemically modalized statements. In Proceedings
of NAACL’07: Human Language Technologies 2007. Companion Volume, pages
141–144, Morristown, NJ, USA
Rubin, V.L. (2010) Epistemic modality: from uncertainty to certainty in the context
of information seeking as interactions with texts. Information processing and
management 46:533–540
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
79 / 448
References: Hedging
Hyland, K. (1994) Hedging in academic writing and eaptextbooks. English for
Specific Purposes, 13:239-256
Hyland, K. (1998) Hedging in scientific research articles. Amsterdam, The
Netherlands: John Benjamins
Lakoff, G. (1973). Hedges: A study in meaning criteria and the logic of fuzzy
concepts. Journal of Philosophical Logic, 2:458-508
Medlock, B. and T. Briscoe (2007) Weakly supervised learning for hedge classfication
in scientifi literature. In Proceedings of ACL 2007, pages 992–999
Vincze, V., G. Szarvas, R. Farkas, G. Móra, and J. Csirik (2008) The BioScope
corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC
Bioinformatics, 9((Suppl 11)):S9.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
80 / 448
References: Factuality
Pustejovsky J., P. Hanks, R. Saurı́, A. See , R. Gaizauskas, A. Setzer, D. Radev, B.
Sundheim, D. Day, L. Ferro, and M. Lazo (2003) The TimeBank corpus. Proceedings
of Corpus Linguistics 2003:647-656
Saurı́ R., M. Verhagen and J. Pustejovsky (2006) SlinkET: a partial modal parser for
events. Proceedings of 5th International Conference on Language Resources and
Evaluation 2006
Saurı́, R. and J. Pustejovsky (2009). FactBank: a corpus annotated with event
factuality. Language Resources and Evaluation 43: 227–268
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
81 / 448
References: Subjectivity
Banfield, A. (1982) Unspeakable sentences. Routledge and Kegan Paul, Boston
Prabhakaran, V., O. Rambow, and M. Diab (2010) Automatic committed belief
tagging. In Proceedings of COLING 2010, pages 1014–1022
Wiebe, J., R. Bruce, M. Bell, M. Martin and Th. Wilson (2001) A corpus study of
evaluative and speculative language. Proceeding SIGDIAL ’01 Proceedings of the
Second SIGdial Workshop on Discourse and Dialogue
Wiebe, J., Th. Wilson, R. Bruce, M. Bell, and M. Martin (2004) Learning subjective
language. Computational Linguistics, 30(3):277–308
Wilson, Th, J. Wiebe, and R. Hwa (2006) Recognizing strong and weak opinion
clauses. Computational Intelligence, 22(2):73–99
Wilson, Th. (2008) Fine-grained subjectivity and sentiment analysis: recognizing the
intensity, polarity, and attitudes of private states. Ph.D. thesis, University of
Pittsburgh, Pittsburgh, PA, USA
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
82 / 448
References: Negation
Dryer, M. S. (2008) Negative Morphemes. In: M. Haspelmath, M. S. Dryer, D. Gil, and B. Comrie
(eds.) The World Atlas of Language Structures Online. Munich: Max Planck Digital Library,
chapter 39. Available online at http://wals.info/feature/39. Accessed on 20 Jan 2011
de Haan, F. (1997) The Interaction of Modality and Negation. A typological study. New York:
Garland Publishing, Inc.
Hoeksema, F., H. Rullmann, V. Sánchez-Valencia and T. van der Wouden (2001) (eds.)
Perspectives on Negation and Polarity Items. Amsterdam: John Benjamins
Horn, L.R., and Y. Kato (2000) (eds) Studies in Negation and Polarity. Oxford University Press
Jespersen, O. (1924) The Philosophy of Grammar. Chicago: The University of Chicago Press,
edition 1992
Klima, E. S. (1964) Negation in English. Readings in the Philosophy of Language. Ed. J. A.
Fodor and J. J. Katz. Prentice Hall, Englewood Cliffs, NJ: 246-323
Lawler, J. (2007) Negation and negative polarity. Cambridge Encyclopedia of the Language
Sciences http://www.umich.edu/~jlawler/CELS-Negation.pdf
Tottie, G. (1991) Negation in English Speech and Writing: A Study in Variation Academic, San
Diego, CA
van der Wouden, T. (1997) Negative Contexts. Collocation, polarity, and multiple negation.
London and New York: Routledge
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
83 / 448
Additional References Part I
Chapman, W. W., W. Bridewell, P. Hanbury, G. F. Cooper, and B. G. Buchanan (2001) A simple
algorithm for identifying negatedfindings and diseases in discharge summaries. J Biomed Inform,
34:301–310
Chu D, J. N. Dowling, and W. W. Chapman (2006) Evaluating the effectiveness of four
contextual features in classifying annotated clinical conditions in emergency department reports.
AMIA Annu Symp Proc 2006:141–5
Light, M., X. Y. Qiu, and P. Srinivasan (2004) The language of bioscience: facts, speculations,
and statements in between. In Proceedings of BioLINK 2004, pages 17–24
Medlock, B. (2008) Exploring hedge identfication in biomedical literature. JBI, 41:636–654
Mercer, R. E., Ch. DiMarco, and F. W. Kroon (2004) The frequency of hedging cues in citation
contexts in scientific writing. In Proceedings of the Canadian Conference on AI, pages 75–88
Mutalik, P. G., A. Deshpande, and P. M. Nadkarni (2001) Use of general-purpose negation
detection to augment concept indexing of medquantitative study using the UMLS. J Am Med
Inform Assoc, 8(6):598–609.
Shatkay, H., F. Pan, A. Rzhetsky, and W. J. Wilbur (2008) Multi-dimensional classfication of
biomedical texts: toward automated practical provision of high-utility text to diverse users.
Bioinformatics, 24(18):2086–2093
Wilbur, W. J., A. Rzhetsky, and H. Shatkay (2006) New directions in biomedical textannotations:
dfinitions, guidelines and corpus construction. BMC Bioinformatics, 7:356
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
84 / 448
Part II
Categorising and Annotating Modality and Negation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
85 / 448
Outline
5
Annotation schemes
6
Existing resources
7
Future directions
8
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
86 / 448
Outline
5
Annotation schemes
6
Existing resources
7
Future directions
8
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
87 / 448
Annotation schemes: Modality in OntoSem
Nirenburg, S. and M. McShane (2008) Annotating modality. OntoSem final project
report. March.
Framework: OntoSem project
Text processing environment that takes as input unrestricted raw text
and carries out several levels of linguistic analysis, including modality at
the semantic level
The output of the semantic analysis is represented as formal
text-meaning representations (TMRs)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
88 / 448
OntoSem
Overall architecture of the OntoSem semantic analyzer
From Nirenburg et al (2004) Evaluating the performance of the OntoSem semantic analyzer. Proc. of the ACL Workshop on Text Meaning
Representation.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
89 / 448
OntoSem
Modality information is encoded as part of the semantic module in the
lexical entries of the modality cues.
Four modality attributes are encoded:
I
I
I
I
Modality type
Scalar value ranges from zero to one
Scope attribute: the predicate that is affected by the modality
Attributed-to attribute: indicates to whom is the modality assigned
(default value = speaker)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
90 / 448
OntoSem
Modality type
Polarity, whether a proposition is positive or negated;
Volition, the extent to which someone wants or does not want the
event/state to occur;
Obligation, the extent to which someone considers the event/state to
be necessary;
Belief, the extent to which someone believes the content of the
proposition;
Potential, the extent to which someone believes that the event/state is
possible;
Permission, the extent to which someone believes that the event/state
is permitted;
Evaluative, the extent to which someone believes the event/state is a
good thing.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
91 / 448
OntoSem
Examples
Polarity: Reed refused to back down demanding the Republican led
intelligence committee finish a long awaited report on whether the Bush
administration twisted intelligence
Volition: he’s trying to get Hamas to co-exist with Israel
Obligation: For payment, we have to forecast the money two days out
Belief: This week, the government arrested Jose Abello Silva, said to be
the fourth-ranking cartel leader
Potential: If I can get all of the information today, I can tell you this
afternoon
Permission: Flights are not permitted into Iraq
Evaluative: Ditches: They are better than road bumps because they are
harder to see
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
92 / 448
OntoSem
Values
Volition: do not want 0 – really want 1
Obligation: need not 0 – must 1
Belief: do not believe it 0 – strongly believe it 1
Potential: can’t achieve/be achieved 0 – can 1
Permission: may not 0 – may 1
Evaluative: evaluation really poor 0 – evaluation really highly 1
“Assigning values of modality to lexical items is a judgment call, not a science. No
value that anyone assigns is set in stone. While some might argue that something
so inherently inexact will not be helpful in text processing, we fervently disagree.
The reason for using scalar values for modalities lies not in their absolute values but
in their relative values. Whether disfavor is given the value (.3) or (<> .2 .3) or
(<.4) on the scale of evaluative modality is less important than the fact that it has
a much lower value than adore.” (Nirenburg and McShane 2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
93 / 448
OntoSem
In the sentence
Entrance to the tower should be totally camouflage
should is identified as a modality cue and characterized with:
Type obligative
Value 0.8
Scope camouflage
and is attributed to the speaker
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
94 / 448
OntoSem
Modality entry in OntoSem
From Nirenburg and McShane (2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
95 / 448
Annotation schemes: Private States
Wiebe, J., Th. Wilson, and C. Cardie (2005) Annotating expressions of opinions and
emotions in language. Language Resources and Evaluation, 38:165–210.
Context: sentiment analysis, opinion mining
Annotation scheme that identifies key components and properties of
opinions, emotions, sentiments, speculations, evaluations, and other
private states
I
Private states: internal states that cannot be directly observed by others
Goal: identiying private state expressions in context
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
96 / 448
Private States
Two types of frames to distinguish between opinion-oriented material and
factual material
Objective speech event frames that represent “material that is
attributed to some source, but is presented as an objective fact”.
I
I
The source is the speaker or writer;
The target, what the private state is about;
Private state frame for every expression of private state
I
I
I
The source of the private state, whose private state is being expressed;
The target, what the private state is about;
Properties like intensity, significance, and type of attitude (positive,
negative, other, none).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
97 / 448
Private States
Three types of private state expressions are considered for the
annotation:
I
I
I
Explicit mentions
The U.S. fears a spill-over,” said Xirao-Nima
Speech events
Sargeant O’Leary said the incident took place at 2:00pm
Expressive subjective elements,
The report is full of absurdities,” Xirao-Nima said
Private states are expressed by the words and the style of language that
is used
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
98 / 448
Private States
Nested sources: attribution of private states
“private states are often filtered through the “eyes” of another source, and
private states are often directed toward the private states of others ” (Wiebe
et. al 2005)
1
“The U.S. fears a spill-over,” said Xirao-Nima.
According to the writer, according to Xirao-Nima, the U.S. fears a
spill-over.
Nested source of private state fears: [writer, Xirao-Nima, U.S.]
The concept of source is very relevant for the annotation of modalities
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
99 / 448
Private States
Annotation at word/phrase level
Annotators were not limited to marking a type or list of words
Large variety of words appearing in subjective expressions
Many sentences are mixtures of subjectivity and objectivity:
44% of the sentences analysed are mixtures of two or more subjectivity
intensity ratings or mixtures of subjectivity and objectivity
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
100 / 448
Private States
MPQA Opinion Corpus
http://www.cs.pitt.edu/mpqa/
I
I
I
10,657 sentences
535 documents of English newswire
Annotated with information about private states at the word and phrase
level.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
101 / 448
Annotation schemes: Attribution in PDTB
Prasad, R. , N. Dinesh, A. Lee, A. Joshi, and B. Webber. 2006. Annotating
attribution in the Penn Discourse TreeBank. In SST ’06: Proceedings of the
Workshop on Sentiment and Subjectivity in Text, pages 31-38, Morristown, NJ, USA.
ACL.
Discourse connectives and their arguments are assigned
attribution-related features
Goal: to capture the source and degrees of factuality of abstract objects
I
I
I
I
SOURCE: writer, other, arbitrary
TYPE: reflects the nature of the relation between the agent and the
abstract object
SCOPAL POLARITY of attribution: identifies cases when verbs of
attribution (say, think, ...) are negated syntactically (didn’t say) or
lexically (denied).
DETERMINACY: indicates the presence of contexts canceling the
entailment of attribution
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
102 / 448
Attribution in PDTB
TYPE: nature of the relation between the agent and the abstract object
Propositions: “attribution to an agent of his/her (varying degrees of)
commitment towards the truth of a proposition”
I
I
Assertions: identified by assertive predicates or verbs of communication
Beliefs: identified by “propositional attitude verbs” (believe, think,
expect, suppose, imagine, etc.)
Facts: “attribution to an agent of an evaluation towards or knowledge
of a proposition whose truth is taken for granted (i.e., a presupposed
proposition)”
Eventualities: “attribution to an agent of an intention/attitude towards
an eventuality”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
103 / 448
Attribution in PDTB
SCOPAL POLARITY: the negation reverses the polarity of the attributed
relation or argument content
Null: the neg-lowered interpretations are not present
Neg: the interpretation of the connective requires the surface negation
to take semantic scope over the lower argument.
I
Example (Prasad et al. 2006): “Having the dividend increases is a
supportive element in the market outlook, but I don’t think it’s a main
consideration,” he says.
The polarity of Arg2 I don’t think is Neg because it scopes over the
embedded clause it’s a main consideration.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
104 / 448
Annotation schemes: ACE 2008
Linguistic Data Consortium. 2008. ACE (Automatic Content Extraction) English
annotation guidelines for relations. Technical Report Version 6.2 2008.04.28, LDC.
Automatic Content Extraction (ACE) 2008 corpus
Goal: relation detection and recognition
English and Arabic texts from several sources
Relations are ordered pairs of entities annotated with modality and tense
attributes
Modality attributes
I
I
Asserted: relations pertain to situations in the real world
Other: relations pertain to situations in “some other world defined by
counterfactual constraints elsewhere in the context”
Example: We are afraid Al-Qaeda terrorists will be in Baghdad
I
I
ORG-Aff.Membership relation between terrorists and Al-Qaeda: asserted
Physical.Located relation between terrorists and Baghdad: other
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
105 / 448
Annotation schemes: Certainty
Rubin, V.L., E. D. Liddy, and N. Kando (2005) Computing Attitude and Affect in
Text: Theory and Applications, chapter Certainty identification in texts:
Categorization model and manual tagging results. Springer-Verlag, New York.
Rubin, V.L. (2010) Epistemic modality: from uncertainty to certainty in the context
of information seeking as interactions with texts. Information processing and
management 46:533-540.
Uncertainty is understood as the speculative type of subjectivity
Subjectivity: aspects of language used to express opinions and
evaluations (Wiebe 1994)
Certainty can also be seen as a variety of epistemic modality expressed
through epistemic comments (probably, perhaps)
Certainty is a pragmatic position rather than a grammatical feature
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
106 / 448
Certainty
View on certainty (Rubin et al 2005)
“Certainty is viewed as a type of subjective information available in texts and
a form of epistemic modality expressed through explicitly-coded linguistic
means”
Explicit markers of certainty
“explicitly signal presence of certainty information that covers a full
continuum of writer’s confidence”
Devices
Subjectivity expressions, epistemic comments, evidentials, reporting verbs,
attitudinal adverbials, hedges, shields, approximators, understatements,
tentatives, intensifiers, emphatics, boosters, and assertives
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
107 / 448
Certainty
Certainty identificaiton (Rubin et al 2005)
“Certainty identification is defined as an automated process of extracting
information from certainty-qualified texts or individual statements along four
hypothesized dimensions of certainty”
Level: degree of certainty
Perspective: whose certainty is involved
Focus: what the object of certainty is
Time: what time the certainty is expressed
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
108 / 448
Certainty
Uncertainty model (Rubin 2006)
(Image from Rubin 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
109 / 448
Certainty
Examples: certainty level
1
Certain An enduring lesson of the Reagan years, of course, is that it
really does take smoke and mirrors to produce tax cuts, spending
initiatives and a balanced budget at the same time.
2
Less certain So far the presidential candidates are more interested in
talking about what a surplus might buy than in the painful choices that
lie ahead.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
110 / 448
Certainty
Examples: perspective
1
2
Writer More evenhanded coverage of the presidential race would help
enhance the legitimacy of the eventual winner, which now appears likely
to be e Putin.
Reported The Dutch recruited settlers with an advertisement that
promised to provide them with slaves who “would accomplish more work
for their masters, ...”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
111 / 448
Certainty
Examples: focus
1
Abstract information: statements that reflect an idea that does not
represent an external reality, but rather a hypothesized world
In Iraq, the first steps must be taken to put a hard-won new security
council resolution on arms inspections into effect.
2
Factual information: based on facts that have an actual existence in
the world of events
The settlement may not fully compensate survivors for the delay in
justice, ...
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
112 / 448
Certainty
Data
32 articles from the The New York Times
685 sentences, excluding headlines
Sentence-level
Average of 0.53 explicit certainty markers per sentence
The distinction of focus into factual and abstract information presented
the most difficulties for annotation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
113 / 448
Annotation schemes: Factuality
Saurı́, R. and J. Pustejovsky (2009). FactBank: a corpus annotated with event
factuality. Language Resources and Evaluation 43: 227–268.
Factuality (Saurı́ and Pustejovsky 2009)
“Information conveying whether events mentioned in text correspond to real
situations in the world or, instead, to situations of uncertain status.”
“The level of information expressing the commitment of relevant sources
towards the factual nature of events mentioned in discourse”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
114 / 448
Factuality
FactBank A corpus of events annotated with factuality information
Define a discrete set of factuality values and a battery of criteria that
allow annotators to differentiate among these values
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
115 / 448
Factuality
Difficulties in annotating factuality:
“To find an expressive enough set of discrete factuality values that is
grounded on linguistic intuitions but also supported by commonsense
reasoning”
“Factuality is expressed through a complex interaction of many different
aspects of the overall linguistic expression”:
I
I
I
Polarity, epistemic modality, evidentiality, mood.
Component in the semantics of specific syntactic structures with
presuppositional effects
Component in certain types of predicates (e.g. factive and implicative
predicates)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
116 / 448
Factuality
Challenges (Saurı́ and Pustejovsky 2009)
Distinguishing among factuality degrees
Interaction between factuality markers
1
2
3
The Royal Family will continue to allow detailed fire brigade inspections
of their private quarters
The Royal Family will continue to refuse to allow detailed fire brigade
inspections of their private quarters
The Royal Family may refuse to allow detailed fire brigade inspections of
their private quarters
Relevant sources: different discourse participants may present divergent
views about the factuality nature of the very same event
1
Slobodan Milosevic’s son said Tuesday that the former Yugoslav president
had been murdered at the detention center of the UN war crimes tribunal
in The Hague
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
117 / 448
Factuality
Factuality values in Saurı́ and Pustejovsky (2009)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
118 / 448
Factuality
Specified Values:
CT1 According to the source, it is certainly the case that X.
PR1 According to the source, it is probably the case that X.
PS1 According to the source, it is possibly the case that X.
CT2 According to the source, it is certainly not the case that X.
PR2 According to the source it is probably not the case that X.
PS2 According to the source it is possibly not the case that X.
Underspecified Values:
CTu The source knows whether it is the case that X or that not X.
Uu The source does not know what is the factual status of the event, or
does not commit to it.
Discriminatory tests are applied to discriminate between values
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
119 / 448
Factuality
FactBank Data:
208 documents
9,488 manually annotated events
0.81 Kcohen agreement
FactBank as a second layer on top of TimeBank
Example
Newspaper reports have said Amir was infatuated with
Har-Shefi and may have been trying to impress her by killing
the prime minister
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
120 / 448
Factuality
TimeBank annotation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
121 / 448
Factuality
FactBank annotation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
122 / 448
Factuality
Annotation tasks by 2 annotators
Identifying source-introducing predicates (SIP) (reporting, knowledge
and opinion) [0.88 kcohen ]
I
SIP contribute new source to the discourse
Identifying sources [0.95 kcohen ]
I
In mid-2001, Colin Powellsource and Condoleezza Ricesource both
publically deniedSIP that Iraq hadevent weapons of mass destruction
Assigning factuality values [0.81 kcohen ]
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
123 / 448
Annotation schemes: Committed belief
Diab, M. T., L. Levin, T. Mitamura, O. Rambow, V. Prabhakaran, and W. Guo
(2009) Committed belief annotation and tagging. In ACL-IJCNLP ’09: Proceedings
of the Third Linguistic Annotation Workshop, pages 68-73.
Goal
Recognize what the writer of the text intends the reader to believe about
various people’s beliefs about the world (including the writer’s own)
Assumption
Discourse participants model each other’s cognitive state during
discourse
I
They model cognitive states as beliefs, desires, and intentions
Language provides cues for the discourse participants to do the modeling
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
124 / 448
Committed belief
Annotated categories (Diab et al. 2009)
Each verbal proposition is annotated with the tags:
I
I
I
Committed belief (CB): the writer indicates in this utterance that he or
she believes the proposition
We know that GM has laid off workers
Non-committed belief (NCB): the writer identifies the proposition as
something which he or she could believe, but he or she happens not to
have a strong belief in
GM may lay off workers
Not applicable (NA): for the writer, the proposition is not of the type in
which he or she is expressing a belief, or could express a belief
F
F
F
Expressions of desire: Some wish GM would lay of workers
Questions: Will GM lay off workers?
Expressions of requirements: GM is required to lay off workers
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
125 / 448
Committed belief
Corpus (Diab et al. 2009)
10,000 words annotated for speaker belief of stated propositions.
They annotate the writer’s beliefs
Nested beliefs are excluded
Annotation at proposition level
Different domains and genres
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
126 / 448
Annotation schemes: Categorising modality
Thompson P, Venturi G, McNaught J, Montemagni S, Ananiadou S: Categorising
modality in biomedical texts. Proceedings of the LREC 2008 Workshop on Building
and Evaluating Resources for Biomedical Text Mining 2008.
Focus: epistemic modality in biomedical text
I
I
The expression of the author’s level of confidence towards a proposition
The type of knowledge, assumptions or evidence on which the proposition
is based
Corpus: 113 abstracts from MEDLINE, E. Coli, annotated with gene
regulation events
Goal: annotate a corpus with modality categories if the modality
information is under the scope of a gene regulation event
Results: 202 MEDLINE abstracts annotated, 1469 gene regulation
events
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
127 / 448
Categorising modality
Dimensions of the categorisation scheme
Knowledge Type
“The type of “knowledge” that underlies a statement, encapsulating both
whether the statement is a speculation or based on evidence and how the
evidence is to be interpreted”
Speculative: predict, hypothesis, view, in theory
Deductive: interpret, indication, infer, imply
Sensory: observation, see, appear
Demonstrative: show, confirm, demonstrate
(Speculative, deductive and demonstrative based on Palmer’s model)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
128 / 448
Categorising modality
Dimensions of the categorisation scheme
Level of certainty
“Indicating how certain the author (or cited author) is about the statement”
Absolute: certainly, known
High: likely, probaly, generally
Moderate: possibly, perhaps, may, could
Low: unlikely, unknown
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
129 / 448
Categorising modality
Dimensions of the categorisation scheme
Point of View
“Indicating whether the statement is based on the author’s own or a cited
point of view or experimental findings.”
Writer: we, our results
Other: citations
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
130 / 448
Categorising modality
(Figure from Thompson et al. (2008)
www.nactem.ac.uk/workshops/lrec08_ws/slides/Thompson_et_al.pdf)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
131 / 448
Categorising modality
Distribution per category in the annotated corpus
(Figure from Thompson et al. (2008)
www.nactem.ac.uk/workshops/lrec08_ws/slides/Thompson_et_al.pdf)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
132 / 448
Categorising modality
Distribution per subcategory in the annotated corpus
(Based on Thompson et al. (2008)
www.nactem.ac.uk/workshops/lrec08_ws/slides/Thompson_et_al.pdf)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
133 / 448
Annotation schemes: A modality lexicon
K. Baker, M. Bloodgood, B. J. Dorr , N. W. Filardo , L. Levin, and Christine Piatko.
A modality lexicon and its use in automatic tagging. Proceedings of LREC 2010,
pages 1402–1407.
Baker, K. et al. (2009) SIMT SCALE 2009 - Modality Annotation Guidelines.
Technical Report 4, Human Language Technology Center of Excellence, Johns
Hopkins University, 2010.
Goal: Exploring whether structured annotations of entities and
modalities can improve translation output in the face of sparse training
data
Focus on modal words that are related (H= holder; P=proposition) to
factivity
I
I
I
I
I
I
I
I
Requirement: does H require P?
Permissive: does H allow P?
Success: does H succeed in P?
Effort: does H try to do P?
Intention: does H intend P?
Ability: can H do P?
Want: does H want P?
Belief: with what strength does H believe P?
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
134 / 448
A modality lexicon
Annotation scheme
Three components for sentences that express modality
I
I
I
Trigger: word or string that expresses modality
Target: event, state or relation that the modality scopes over
Holder: experiencer or cognizer of the modality
Modality can be expressed without a lexical trigger
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
135 / 448
A modality lexicon
Simplifications
Scope of modality and negation. Same annotation for:
I do not believe that he left
I believe he didn’t leave
Duality of meaning require and permit
not require P to be true = Permit P to be false
not permit P to be true = Require P to be false.
Entailment between modalities. Annotators were provided a
specificity-ordered modality list
requires → permits
succeeds → tries → intends →is able → wants
Sentences without an overt trigger word are tagged as Firmly Believes
Nested modalities are not marked, only one modality is marked
The holder is not marked
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
136 / 448
A modality lexicon
Entry definition
1
A string of one or more words: for example, should or have need of
2
A part of speech for each word
3
A modality: one of the thirteen modalities
4
A head word (or trigger): the primary phrasal constituent to cover cases
where an entry is a multiword unit, e.g., the word hope in hope for
5
One or more subcategorization codes
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
137 / 448
A modality lexicon
Example entries
# Able
capable “JJ of “IN$Able & capable, JJ-of-basic, JJ-of-VBG
able “JJ to “’TO$Able & able, JJ-infinitive
can “MD$Able & can, modal-auxiliary-basic
could“MD$Able & could, modal-auxiliary-basic
ready“JJ$Able & ready, JJ-infinitive
# NotAble
powerless“JJ$NotAble & powerless, JJ-infinitive
unable“JJ$NotAble & unable, JJ-infinitive
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
138 / 448
A modality lexicon
Modality tagging example
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
139 / 448
A modality lexicon
Modality tagger
A modality tagger produces text or structured text in which modality
triggers and/or targets are identified
Tagger 1: string-based
I
I
I
Input: text with PoS
Marks spans of wordsphrases that exactly match modality trigger words
in the modality lexicon
It identifies the target by tagging the next non-auxiliary verb to the right
of the trigger
Tagger 2:structure-based
I
I
I
Input: parsed text
TSurgeon patterns are automatically generated from the verb class codes
in the modality lexicon along with a set of templates
The patterns are matched with part of a parse tree
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
140 / 448
A modality lexicon
Output of modality tagger
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
141 / 448
Annotation schemes: Speculated sentences
Medlock, B. and T. Briscoe (2007) Weakly Supervised Learning for Hedge
Classification in Scientific Literature. In Proceedings of the 45th Annual Meeting of
the Association of Computational Linguistics.
Sentence level annotation of speculation
I
6 papers from the functional genomics literature
They define what is and what is not hedging
I
Guidelines to be found in
http://www.cl.cam.ac.uk/research/nl/nl-download/hedging.html
Sentences are classified into speculative or non-speculative
Spec: This unusual substrate specificity may explain why Dronc is
resistant to inhibition by the pan-caspase inhibitor p .
380 out of 1,157 sentences are speculative
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
142 / 448
Annotation schemes: Focus of Negation
Blanco, E. and D. Moldovan (2011) Semantic representation of negation using
focus detection. Proceedings of ACL 2011, pges 581-589. ACL.
Based on distinction between scope and focus of negation (Huddleston
and Pullum 2002)
I
I
Scope is the part of the meaning that is negated
Focus is that part of the scope that is most prominently or explicitly
negated
Focus of negation annotated on 3,993 verbal negations signaled as
MNEG in PropBank
For each instance, annotators decide the focus given the full syntactic
tree, as well as the previous and next sentence
Inter-annotator agreement was 0.72
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
143 / 448
Annotation schemes: Focus of Negation
Annotation examples (Table from Blanco and Moldovan 2011)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
144 / 448
Annotation schemes: Scopes in BioScope
http://www.inf.u-szeged.hu/rgai/bioscope
Corpus annotated with negation and speculation cues and their scopes in
English biomedical texts
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
145 / 448
Scopes in BioScope
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
146 / 448
Scopes in BioScope
Example
When U937 cells were infected with HIV-1, <xcope id=”X1.6.3”><cue
type=”negation” ref=”X1.6.3”>no</cue> induction of NF-KB factor was
detected</xcope>, whereas high level of progeny virions was produced,
<xcope id=”X1.6.2”><cue type=”speculation”
ref=”X1.6.2”>suggesting</cue> that this factor was <xcope
id=”X1.6.1”><cue type=”negation” ref=”X1.6.1”>not</cue> required
for viral replication</xcope></xcope>.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
147 / 448
Scopes in BioScope
Hedges modify the factuality of an statement or reflect the author’s
attitude towards the content of the text.
Categories:
I
I
I
I
I
Auxiliaries: may, might, can, would, should, could, etc.
Verbs of hedging or verbs with speculative content: suggest, question,
presume, suspect, indicate, suppose, seem, appear, favor, etc.
Adjectives or adverbs: probable, likely, possible, unsure, etc.
Conjunctions: or, andor, either ... or, etc.
Complex keywords:
Mild bladder wall thickening raises the question of cystitis.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
148 / 448
Scopes in BioScope
Annotation strategy
Marking the keywords: the minimal unit that expresses hedging and
determines the actual strength of hedging was marked as a keyword.
Marking scope: all constituents that fell within the uncertain
interpretation were included in the scope
I
Motivation: disregarding the marked text span, the rest of the sentence
can be used for extracting factual information. In:
Mild bladder wall thickening raises the question of cystitis.
Mild bladder wall thickening is a fact
Cystitis is an uncertain fact
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
149 / 448
Scopes in BioScope
Scope and syntax: the scope of a speculative element can be determined on
the basis of syntax.
Verbs, auxiliaries, adjectives and adverbs usually starts right with the
keyword.
I
I
I
Verbal elements: verbs and auxiliaries, it ends at the end of the clause or
sentence, all complements and adjuncts are included.
Attributive adjectives: scopegenerally extends to the following noun
phrase
Predicative adjectives: scope includes the whole sentence.
Sentential adverbs have a scope over the entire sentence
The scope of other adverbs usually ends at the end of the clause or
sentence.
Conjunctions generally have a scope over the syntactic unit whose
members they coordinate.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
150 / 448
Annotation schemes: Scopes in ConanDoyle-neg
Same corpus as SemEval Task on Linking Events and
Their Participants in Discourse (Ruppenhofer et al.
2010)
Already annotated with semantic roles,
coreference and null instantiations of semantic
roles
Different domain than BioScope
Not subject to copyright
Linear narrative
But older variety of English
HB: 14 chapters (2700 sentences)
WL: 2 chapters (600 sentences)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
151 / 448
Scopes in ConanDoyle-neg
R. Morante, S. Schrauwen and W. Daelemans. Annotation of negation cues
and their scope. CLiPS Technical Report 3. University of Antwerp.
Negation cues: words that express negation
Scope of negation cues: tokens in the sentence that are affected by
the negation
Negated event or property
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
152 / 448
Scopes in ConanDoyle-neg
The most of them would by no means advance , but three of them , the
boldest , or it may be the most drunken , rode forward down the goyal . [HB
2-59]
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
153 / 448
Scopes in ConanDoyle-neg
The annotation format is based on the BioScope format, but there are
differences:
Different scope model
I
I
All participants of the event that is negated fall under the scope of the
negation cue
Discontinuous scope is allowed
Affixal negation is annotated
Negated events are annotated
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
154 / 448
Scopes in ConanDoyle-neg
Not all negations cues negate a fact
” Do you not find it interesting ? ” [HB 2.74]
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
155 / 448
Scopes in ConanDoyle-neg
Not all negations cues negate a fact
Had the prosaic finding of the coroner not finally put an end to the romantic
stories which have been whispered in connection with the affair , it might
have been difficult to find a tenant for Baskerville Hall . [HB 2.113]
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
156 / 448
Scopes in ConanDoyle-neg
Not all negations cues negate a fact
For both these reasons I thought that I was justified in telling rather less than
I knew , since no practical good could result from it , but with you there is no
reason why I should not be perfectly frank . [HB 2.127]
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
157 / 448
Scopes in ConanDoyle-neg
The annotation guidelines are published soon as a CLiPS Technical Report:
http://www.clips.ua.ac.be/
annotation-of-negation-cues-and-their-scope-guidelines-v10
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
158 / 448
Annotation schemes: High utility text
W John Wilbur, Andrey Rzhetsky, and Hagit Shatkay (2006) New
directions in biomedical text annotation: definitions, guidelines and
corpus construction. BMC Bioinformatics 2006, 7:356
Hagit Shatkay, Fengxia Pan, Andrey Rzhetsky, and W. John Wilbur
(2008) Multi-dimensional classification of biomedical text: Toward
automated, practical provision of high-utility text to diverse users.
Bioinformatics 24(18), pages 2086–2093
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
159 / 448
High utility text
The contents of scientific statements can be characterized along certain
general dimensions
In turn, the characteristics of each phrase, sentence or paragraph along
these dimensions can help to determine whether the text is useful to a
particular user with specific information needs
Different users have different information needs
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
160 / 448
High utility text
A database curator who is
looking for experimental
evidence that the gene was
expressed under certain
conditions, would only be
satisfied with sentences
discussing experimental evidence
and stating with high confidence
that the gene was indeed
expressed under the reported
conditions.
R. Morante (CLiPS - University of Antwerp )
A scientist looking for all the
information published about a
certain gene, may be satisfied by
obtaining all the papers or all
the sentences mentioning this
gene
Modality and Negation in NLP
November 8, 2011
161 / 448
High utility text
Goal
Enabling the creation of well-focused subsets of biomedical text that
have certain properties
Identifing information-bearing fragments within scientific text
I
Reducing the document search space for a specific domain in order to
improve retrieval and extraction
F
F
I
Identifying regions that are rich in experimental evidence and
methodological details
Focusing extraction efforts on these regions
Providing users with candidate sentences that:
F
F
Describe the desired phenomenon
Bear the evidence for the phenomenon or describe the methods by which
the phenomenon was identified
Differentiate informative fragments from non-informative ones
automatically
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
162 / 448
High utility text
Annotation scheme
10 000 sentences selected at random from both full-text articles and
abstracts
Each statement in the corpus is characterized and marked-up along 5
dimensions
A statement may be a sentence or just a fragment of a sentence
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
163 / 448
High utility text
Dimensions
Focus
The type of the information conveyed by the statement
Scientific (S): discussing findings and discovery
Generic (G): general state of knowledge and science outside the scope of
the paper, the structure of the paper itself or the state of the world
Methodology (M): describing a procedure or a method
annotate methodology when the sentence under annotation contains an
indication that methodology is being discussed
I
Not every sentence appearing in a Methodology section discusses
methodology, and not every sentence discussing methodology appears in
the Methodology section
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
164 / 448
High utility text
Dimensions
Polarity
A fragment with any focus can be stated either positively (P) or negatively
(N)
Certainty
Each fragment conveys a degree of certainty about the validity of the
assertion it makes.
(0) represents complete uncertainty, that is, the fragment explicitly states
that there is an uncertainty or lack of knowledge about a particular
phenomenon (”it is unknown...” or ”it is unclear whether...” etc.).
(1) represents a low certainty
(2) is assigned to high-likelihood expressions that are still short of
complete certainty.
(3) represents complete certainty, reflecting an accepted, known and/or
proven fact.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
165 / 448
High utility text
Dimensions
Evidence
This dimension indicates for any fragment if its assertion is supported by
evidence.
E0: No indication of evidence in the fragment whatsoever, or an explicit
statement in the text indicates lack of evidence.
E1: A claim of evidence, but no verifying information is explicitly given.
”Previous experiments show that...”, followed by the fragment,
”therefore, it is likely that ...”.
E2: Evidence is not given within the sentence/fragment, but explicit
reference is made to other papers (citations) to support the assertion.
E3: Evidence is provided, within the fragment, in one of the following
forms:
A reference to experiments previously reported within the body of the
paper (Our results show that ...)
I A verb within the statement indicates an observation or an experimental
finding (We found that ...)
I A reference to an experimental figure or a table of data given within the
paper.of Antwerp )
(CLiPS - University
Modality and Negation in NLP
November 8, 2011
166 / 448
I
R. Morante
High utility text
Dimensions
Direction-trend
The signs + or - indicate respectively whether the assertion reports a
qualitatively high or low level or an increase/ decrease in a specific
phenomenon, finding or activity.
”In fact, as demonstrated using several SOD assays including
pulse radiolysis, 2-ME does not inhibit SOD”
Negative polarity, negative trend (inhibit)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
167 / 448
High utility text
Examples
The binding of both forms of β-catenin to CBP is completely
inhibited by ICG-001 (Fig. 3B Top, lane 4). **1SP3E3-
We demonstrate that ICG-001 binds specifically to CBP **1SP2E3
but not the related transcriptional coactivator p300, **2SN2E3
A statement may be supported by several types of evidence:
...the overexpression of phospho-H2Av did not induce G2/M arrest or
affect DSB-dependent G2/M arrest (fig. S10) (14,21), **1SN3E23+
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
168 / 448
High utility text
Data
10000 sentences annotated by 3 annotators
For experiments, sentences in which the three annotators agree
Each dimension is examined separatedly
(From Shatkay et al. 2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
169 / 448
Modality and negation for MR
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
170 / 448
Annotation schemes: Modality and negation for MR
Goal: evaluating whether machine reading systems understand
extra-propositional aspects of meaning beyond propositional content,
focusing mostly on phenomena related to modality and negation.
Background collections: same as main QA4MRE task
Test sets: 12 texts from The Economist, 4 per topic (climate change,
AIDs, music and society)
I
Two pilot test document were released first
Questions for each document there are ten multiple choice questions
I
I
5 candidate answers
1 clearly correct answer
Evaluation: same as main task
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
171 / 448
Modality and negation for MR
Table: Test documents from The Economist
Topic
Aids
Climate
change
Music
and society
Number
1
2
3
4
1
2
3
4
1
2
3
4
R. Morante (CLiPS - University of Antwerp )
Title
All colours of the brainbow
DARC continent
Double, not quits
Win some, lose some
A record-making effort
Are economists erring on climate change?
Climate change and evolution
Climate change in black and white
The politics of hip-hop
How to sink pirates
Singing a different tune
Turn that noise off
Modality and Negation in NLP
# of words
915
817
779
1919
2841
1412
1256
2850
1004
773
1042
677
November 8, 2011
172 / 448
Modality and negation for MR
Event Description
Given a multiple choice question, systems have to choose the answer that
best characterises an event along five aspects of meaning:
Negation
Perspective
Certainty
Modality
Condition for another event or conditioned by another event
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
173 / 448
Modality and negation for MR
Negative
1. A grammatical element which, when added to a sentence expressing a
proposition, reverses the truth value of that proposition. [...] A negative
element is an operator which takes some part of its sentence as its scope;
(R.L. Trask (1993) A dictionary of grammatical terms in linguistics. Routledge.)
1
But these new types of climate action do not REPLACE the need to
reduce carbon emissions.
2
In the face of an international inability to PUT the sort of price on
carbon use that would drive its emission down, an increasing number of
policy wonks, and the politicians they advise, are taking a more serious
look at these other factors as possible ways of controlling climate change.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
174 / 448
Modality and negation for MR
Perspective
A statement is presented from the point of view of someone. By default the
statement is presented from the perspective of the author of the text, but the
author might be mentioning the view from someone else.
1
The European Union has named a dozen prefectures that need radiation
tests, yet traders in these places report a LACK of testing equipment.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
175 / 448
Modality and negation for MR
Certainty
Events can be presented with a range of certainty values, including
underspecified certainty. Here we include all not certain events under the
category of uncertain events, without distinguishing degrees.
1
. . . Even though external radiation has since returned to near-harmless
levels, Mr Sakurai fears many of Minamisoma’s evacuees may never
COME BACK.
2
As well as having charms that efforts to reduce carbon-dioxide emissions
lack, these alternatives could also IMPROVE the content and prospects
of other climate action.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
176 / 448
Modality and negation for MR
Modality
Five options:
Non-modal event. This is the default category for events that do not
fall under the modal categories below and do not have other modal
meanings. In the questions we refer to it as event.
Purpose event. Purpose, aim or goal.
I
Neighbouring South Korea expressed concern that it was not warned
about TEPCO’s decision to dump low-level radioactive waste into the sea
to MAKE room to store more toxic stuff on land.
Need event. Need or requirement.
I
The plan requires a lot of INVESTMENT in power generation and
smarter grids, best done in the context of –at long last– reformed and
competitive energy market.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
177 / 448
Modality and negation for MR
Obligation.
I
Believing that global greenhouse-gas emissions must FALL by half to
limit climate change, and that rich countries should CUT the most,
Europe has set a goal of reducing emissions by 80-95% by 2050.
Desire. Desires, intentions and plans.
I
Neighbouring South Korea expressed concern that it was not warned
about TEPCO’s decision to DUMP low-level radioactive waste into the
sea to make room to store more toxic stuff on land.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
178 / 448
Modality and negation for MR
Condition-conditioned
An event can be presented as a condition for another event or as conditioned
by another event.
1
If you are highly motivated to minimise your taxes, you can HUNT for
every possible deduction for which you’re eligible.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
179 / 448
Modality and negation for MR
Event description
An event description consists at most of one value per aspect of meaning
An event description consists at least of one modality value
I
Event, purpose event, need event, obligation event, desire event
If applicable, events can additionally be described with the following
aspects of meaning that systems have to identify:
I
I
I
I
Negated
Perspective of someone other than the author
Uncertain
Condition for another event, conditioned by another event
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
180 / 448
Modality and negation for MR
Example question
Text
Controlling black carbon by giving poor people cleaner ways to burn various
fuels could not only forestall a decade or two of global warming, it would also
save hundreds of thousands of lives currently blighted by smoke and disease.
Question
<q str>Event - -controlling black carbon by giving poor people cleaner ways
to burn various fuels <predicate>forestall</predicate> a decade or two of
global warming- - is presented in the text as:</q str>
<answer
<answer
<answer
<answer
<answer
a
a
a
a
a
id=“1”>MOD-NEED</answer>
id=“2”>MOD-WANT</answer>
id=“3”>COND-BY MOD-NON</answer>
id=“4” correct=”Yes”>UNCERT MOD-NON</answer>
id=“5”>NEG UNCERT MOD-NON</answer>
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
181 / 448
Modality and negation for MR
The combinations of codes that conform the answers to the questions
can be summarized with the following regular expression:
[COND|COND − BY ]? NEG ? PERS? UNCERT ?
MOD [−NEED| − NON| − PURP| − MUST | − WANT ]
In total there are 120 combinations, although not all of them will be
represented in the test set of 12 documents because not all of them are
equally frequent.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
182 / 448
Outline
5
Annotation schemes
6
Existing resources
7
Future directions
8
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
183 / 448
Why do we need to have annotated resources?
To have a better insight into the surface realization of negation and
modality and their role in NLP
To train systems that:
I
I
I
Detect non-factual information
Detect statements with negative polarity
Detect contrastive information
This can be useful for several NLP applications:
I
I
I
I
I
Information extraction
Opinion mining, sentiment analysis
Paraphrasing
Recognizing textual entailment
Machine translation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
184 / 448
Existing resources
Negated biomedical events
BioInfer http://mars.cs.utu.fi/BioInfer/
Genia Event http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/
home/wiki.cgi?page=Event+Annotation
BioNLP Shared Task 2010 data
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/
GREC http://www.nactem.ac.uk/GREC/
Scopes
BioScope http://www.inf.u-szeged.hu/rgai/bioscope
CoNLL Shared Task 2010 data
http://www.inf.u-szeged.hu/rgai/conll2010st/
ConanDoyle-neg http://www.clips.ua.ac.be/BiographTA/
corpora-files/ConanDoyle-neg-v1.zip
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
185 / 448
Existing resources
Meta-knowledge annotation including modality and negation
Meta-Knowledge Genia Corpus
http://www.nactem.ac.uk/meta-knowledge/
Statement Map corpus of Japanese
http://www.cl.ecei.tohoku.ac.jp/stmap/sem_corpus.html
Lexicons
Modality lexicon described in Baker et al. (2010)
http://www.umiacs.umd.edu/~bonnie/ModalityLexicon.txt
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
186 / 448
Existing resources
Negation and modality for machine reading
Test set QA4MRE pilot task 2011
http://www.clips.ua.ac.be/BiographTA/qa4mre.html
Factuality
FactBank http://www.ldc.upenn.edu/Catalog/CatalogEntry.
jsp?catalogId=LDC2009T23
Subjectivity
MPQA Opinion Corpus
http://www.cs.pitt.edu/mpqa/
Discourse
PDTB http://www.seas.upenn.edu/~pdtb/
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
187 / 448
Outline
5
Annotation schemes
6
Existing resources
7
Future directions
8
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
188 / 448
Future directions
Creating a unified annotation scheme for modality?
Merging existing annotations?
Defining guidelines?
Annotating fine-grained modality types?
Annotating larger corpora, different genres
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
189 / 448
Outline
5
Annotation schemes
6
Existing resources
7
Future directions
8
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
190 / 448
References
Baker K., M. Bloodgood, B. J. Dorr , N. W. Filardo , L. Levin, and Ch. Piatko (2010) A modality
lexicon and its use in automatic tagging. Proceedings of LREC 2010, pages 1402–1407.
Baker, K. et al. (2009) SIMT SCALE 2009 - Modality Annotation Guidelines. Technical Report 4,
Human Language Technology Center of Excellence, Johns Hopkins University, 2010.
Dalianis, H. and S. Velupillai. 2010. How certain are clinical assessments? Annotating swedish
clinical text for (un)certainties, speculations and negations. In Proceedings of the Seventh
conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta.
Diab, M. T., L. Levin, T. Mitamura, O. Rambow, V. Prabhakaran, and W. Guo (2009)
Committed belief annotation and tagging. In ACL-IJCNLP ’09: Proceedings of the Third
Linguistic Annotation Workshop, pages 68-73.
Linguistic Data Consortium. 2008. ACE (Automatic Content Extraction) English annotation
guidelines for relations. Technical Report Version 6.2 2008.04.28, LDC.
Medlock, B. and T. Briscoe (2007) Weakly Supervised Learning for Hedge Classification in
Scientific Literature. In Proceedings of the 45th Annual Meeting of the Association of
Computational Linguistics.
Medlock, B. (2006) Guidelines for Speculative Sentence Annotation.
http://www.benmedlock.co.uk/annotation.pdf
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
191 / 448
References
Morante, R., S. Schrauwen and W. Daelemans. Annotation of negation cues and their scope.
CLiPS Technical Report 3. University of Antwerp.
Nirenburg, S. and M. McShane (2008) Annotating modality. OntoSem final project report.
March.
Prasad, R. , N. Dinesh, A. Lee, A. Joshi, and B. Webber. 2006. Annotating attribution in the
Penn Discourse TreeBank. In SST ’06: Proceedings of the Workshop on Sentiment and
Subjectivity in Text, pages 31-38, Morristown, NJ, USA. Association for Computational
Linguistics.
Rubin, V.L., E. D. Liddy, and N. Kando (2005) Computing Attitude and Affect in Text: Theory
and Applications, chapter Certainty identification in texts: Categorization model and manual
tagging results. Springer-Verlag, New York.
Rubin, V.L. (2010) Epistemic modality: from uncertainty to certainty in the context of information
seeking as interactions with texts. Information processing and management 46:533-540.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
192 / 448
References
Saurı́, R. and J. Pustejovsky (2009). FactBank: a corpus annotated with event factuality.
Language Resources and Evaluation 43: 227–268.
Thompson P., Venturi G., McNaught J., Montemagni S., Ananiadou S. (2008) Categorising
modality in biomedical texts. Proceedings of the LREC 2008 Workshop on Building and
Evaluating Resources for Biomedical Text Mining 2008.
Thompson P, Nawaz, R, McNaught J, Ananiadou SEnriching a biomedical event corpus with
meta-knowledge annotation. BMC Bioinformatics 2011, 12:393
Vincze V., G. Szarvas, R. Farkas, G. Móra, and J. Csirik (2008) The BioScope corpus: biomedical
texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9,
2008.
Vincze, V., G. Szarvas, G. Móra, T. Ohta, and R. Farkas (2011) Linguistic scope-based and
biological event based speculation and negation annotations in the BioScope and Genia Event
corpora. Journal of Biomedical Semantics 2011, 2(Suppl 5):S8
Wiebe, J., Th. Wilson, and C. Cardie (2005) Annotating expressions of opinions and emotions in
language. Language Resources and Evaluation, 38:165–210.
Wilbur W.J., Rzhetsky A., Shatkay H. (2006) New directions in biomedical text annotations:
definitions, guidelines and corpus construction. BMC Bioinformatics 2006, 7:356.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
193 / 448
Part III
Tasks Related to Processing Modality and Negation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
194 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
195 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
196 / 448
Detecting speculated sentences
Defining the task
Medlock and Briscoe (2007)
Given a collection of sentences, S, the task is to label each sentence as either
speculative or nonspeculative.
Specifically, S is to be partitioned into two disjoint sets, one representing
sentences that contain some form of hedging, and the other representing
those that do not.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
197 / 448
Detecting speculated sentences
Light et al. (2004) used a handcrafted list of hedge cues to identify
speculative sentences in MEDLINE abstracts
Medlock and Briscoe (2007) used single words as input features in order
to classify sentences from biological articles (FlyBase) as speculative or
non-speculative based on semi-automatically collected training examples
Szarvas (2008) extended the methodology of Medlock and Briscoe
(2007) to use n-gram features and a semi-supervised selection of the
keyword features.
Kilicoglu and Bergler (2008) proposed a linguistically motivated
approach based on syntactic information to semi-automatically refine a
list of hedge cues
Ganter and Strube (2009) proposed an approach for the automatic
detection of sentences containing uncertainty based on Wikipedia weasel
tags and syntactic patterns
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
198 / 448
Detecting speculated sentences
M. Light, X. Ying Qiu, and P. Srinivasan. 2004. The Language of Bioscience: Facts,
Speculations, and Statements in between. In Proceedings of the HLT BioLINK.
Corpus: sentences marked as highly speculative, low speculative, or
definite.
I
I
1456 sentences
173 speculative
Bag-of-words representation of text sentences occurring in MEDLINE
abstracts
Algorithm: SVMlight
Baseline: checking whether any cue is present in the sentence
I
suggest, potential, likely, may, at least, in part, possible, potential, further
investigation, unlikely, putative, insights, point toward, promise, propose
Accuracy results for SVM = 92% vs. 89% baseline
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
199 / 448
Detecting speculated sentences
B. Medlock, and T. Briscoe. 2007. Weakly Supervised Learning for Hedge
Classification in Scientific Literature. In Proceedings of the 45th Annual Meeting of
the Association of Computational Linguistics.
System
Bag-of-words approach
Semi-supervised learning: of labelled training data, from which a
supervised classifier can subsequently be learned
Test corpus: manually annotated 380 spec sentences and 1157 nspec
sentences
I
I
I
Step 1: a weakly supervised Bayesian learning model in order to derive
the probability of each word to represent a hedge cue
Step 2: feature selection based on these probabilities, only the most
indicative features of the spec class are retained
Step 3: classifier trained on a given number of selected features
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
200 / 448
Detecting speculated sentences
Seed generation
Seeds for the spec class: all sentences from U containing either (or both)
of the terms suggest or likely. 6423 spec seeds
Nspec seeds: 7541 sentences
Results
Baseline: 0.60 recall/precision break-even point
System results: 0.76 recall/precision break-even point
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
201 / 448
Detecting speculated sentences
Error analysis
The model is unsuccessful in identifying assertive statements of
knowledge paucity which are generally marked rather syntactically than
lexically
There is no clear evidence for cytochrome c release during apoptosis in C
elegans or Drosophila
Distinguishing between a speculative assertion and one relating to a
pattern of observed non-universal behaviour is often difficult
Sentence chosen as spec:
Each component consists of a set of subcomponents that can be
localized within a larger distributed neural system
The sentence does not, in fact, contain a hedge but rather a statement
of observed non-universal behaviour.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
202 / 448
Detecting speculated sentences
Ben Medlock. 2008. Exploring hedge identification in biomedical literature. Journal
of Biomedical Informatics, 41:636-54
Same dataset as Medlock and Briscoe (2007)
Experiments with additional features:
I
I
I
Part-of-speech tags
Stems
Bigrams: in some instances combinations of terms represent more reliable
hedge cues than just single terms
SPEC: In addition several studies indicate that in mammals the Rel
proteins could probably be involved in CNS processes such as neuronal
development and synaptic plasticity
NSPEC: In the row marked dgqa the stippled exons indicate regions that
are not found in the dgqa cDNAs identified by us
Conclusions:
I
I
Adding PoS features and stems to a bag-of-words input representation
can slightly improve the accuracy
Adding bigrams brings a statistically significant improvement over a
simple bag-of-words representation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
203 / 448
Detecting speculated sentences
Learning curves stemming
Figure from Medlock (2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
204 / 448
Detecting speculated sentences
Learning curves bigrams
Figure from Medlock (2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
205 / 448
Detecting speculated sentences
Informative features
Figure from Medlock (2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
206 / 448
Detecting speculated sentences
Error analysis
20% Statements of knowledge paucity
This brings us to the largest of all mysteries, namely how the DCC is
spread along the X chromosome
Cases where speculativity is indicated by a particular term, while the
general construction of the sentence does not fit the usual spec mold
We then tested the putative RNA-binding property of MOF directly
using electromobility shift assays
Genuine hedge cues were not induced with enough certainty
Invertebrates in vivo RAG-mediated transpositions are strongly
suppressed, probably to minimize potential harm to genome function
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
207 / 448
Detecting speculated sentences
G. Szarvas. 2008. Hedge classification in biomedical texts with a weakly supervised
selection of keywords. In Proceedings of the ACL-08: HLT
Hedge detection in radiology records (newly annotated) and biomedical
texts (dataset from Medlock anb Briscoe 2007)
Complex feature selection
Maximum Entropy Model
Weakly supervised machine learning
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
208 / 448
Detecting speculated sentences
Feature selection
Ranking the features x by frequency and their class conditional
probability P(spec|x).
I
I
Select features with P(spec|x) > 0.94 and appeared in the training
dataset with reasonable frequency
Result: 2407 candidates
For trigrams, bigrams and unigrams, calculate a new classconditional
probability for each feature x, discarding those observations of x in
speculative instances where x was not among the two highest ranked
candidate.
I
I
Separately for the uni-, bi- and trigrams
Results: filtered out 85% of all the keyword candidates and kept 362 uni-,
bi-, and trigrams altogether
Re-evaluate all 362 candidates together and filtered out all phrases that
had a shorter substring of themselves among the features, with a similar
class-conditional probability on the speculative class
I
Results: discard 30% of the candidates and kept 253 uni-, bi-, and
trigrams altogether
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
209 / 448
Detecting speculated sentences
Evaluation settings
Automatic feature selection
Manual feature selection
I
I
A phrase was irrelevant if they could consider no situation in which the
phrase could be used to express hedging
63 out of the 253 keywords found to be potentially relevant in hedge
classification
Adding external dictionaries
I
I
I
Keywords used in (Light et al., 2004) and those gathered for the author’s
ICD-9-CM hedge detection module
Added only keywords found to be reliable enough by the maxent model
trained on the training dataset
From 63 to 71 features
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
210 / 448
Detecting speculated sentences
Results
Baseline 1: substring matching of Light et al. (2004)
Baseline 2: Medlock and Briscoe (2007) system
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
211 / 448
Detecting speculated sentences
Conclusions
The radiology reports had mainly unambiguous single-term hedge cues
It proved to be useful to consider bi- and trigrams as hedge cues in
scientific texts
The hedge classification task reduces to a lookup for informative single
keywords or phrases. Removing uninformaive features did not produce
any difference in the scores
The analysis of errors indicate that more complex features like
dependency structure and clausal phrase information could only help in
allocating the scope of hedge cues detected in a sentence, not the
detection of any itself
Worse results on biomedical scientific papers from a different source
showed that the portability of hedge classifiers is limited
I
The keywords possible and likely are apparently always used as
speculative terms in the FlyBase articles, while the articles from BMC
Bioinformatics frequently used such cliche phrases as all possible
combinations orless likely / more likely ...
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
212 / 448
Detecting speculated sentences
H. Kilicoglu and S. Bergler. 2008. Recognizing Speculative Language in Biomedical
Research Articles: A Linguistically Motivated Perspective. In Proceedings of Current
Trends in Biomedical Natural Language Processing (BioNLP), Columbus, Ohio, USA
Linguistically motivated approach
Lexical resources and syntactic patterns
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
213 / 448
Detecting speculated sentences
Data
Fruit fly dataset by Medlock and Briscoe (2007)
I
I
Semiautomatically annotated
Noisy and biased towards the hedging cues used as seed terms (suggest,
likely).
Manually annotated data from the fruit fly datase
I
I
523 sentences training, 213 speculative
balanced distribution of surface realization features: epistemic verbs
(30%), adverbs (20%), adjectives (16%), modal verbs (23%)
Manually annotated data from Szarvas (2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
214 / 448
Detecting speculated sentences
Methodology
Expansion of lexical hedging cues (190 entries)
I
I
I
Hyland cues
Synonyms from WordNet
Nominalizations from UMLS
Quantification of hedging strength
I
I
I
Semi-automatic weighting depending on the type of cue and how it was
obtained (SA)
Information gain weighting schemes: hedging cues that occur frequently
in the speculative sentences but never in non-speculative sentences will
have a higher IG weight
Accumulate the weights of the hedging cues found in a sentence to
assign an overall hedging score to each sentence
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
215 / 448
Detecting speculated sentences
Methodology: syntax
Identification of the most salient syntactic patterns in the train corpus
that play a role in hedging and their contribution to hedging strength
(Table from Kiligoglu and Bergler 2008)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
216 / 448
Detecting speculated sentences
Results
(Table from Kiligoglu and Bergler 2008)
Baseline 1: substring matching from Light et al (2004)
Baseline 2: substring matching, with the top 15 ranked term features
reported in reported in Medlock and Briscoe (2007)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
217 / 448
Detecting speculated sentences
Conclusions
The SA weighting scheme gives better results: “a weighting scheme
relying on the particular semantic properties of the indicators is likely to
capture the hedging strengths more accurately”
SA weighting provides relatively stable results across datasets
A larger training set will yield a more accurate weighting scheme based
on IG measure
The IG weighting scheme is less portable
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
218 / 448
Detecting speculated sentences
Error analysis: false negatives
Syntactic patterns not addressed by the method
I
Negation of “unhedgers” was used as a syntactic pattern; while this
pattern correctly identified know as an “unhedger”, it did not recognize
little as a negative quantifier
Little was known however about the specific role of the roX RNAs during
the formation of the DCC
Certain derivational forms of epistemic words
I
The adjective suggestive is not recognized as a hedging cue, even though
its base form suggest is an epistemic verb
Phenotypic differences are suggestive of distinct functions for some of
these genes in regulating dendrite arborization
Incorrect dependency relations
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
219 / 448
Detecting speculated sentences
Error analysis: false positives
Word sense ambiguity of hedging cues
Also we could not find any RAG-like sequences in the recently sequenced
sea urchin lancelet hydra and sea anemone genomes, which encode
RAG-like sequences
“Weak” hedging cues, such as epistemic deductive verbs (conclude,
estimate) as well as some adverbs (essentially, usually) and
nominalizations (implication, assumption)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
220 / 448
Detecting speculated sentences
V. Ganter and M. Strube: Finding hedges by chasing weasels: Hedge detection using
wikipedia tags and shallow linguistic features. In Proceedings of the ACL-IJCNLP
2009 Conference Short Papers, pages 173-176, Suntec, Singapore, August 2009.
Association for Computational Linguistics
Detecting speculative language in Wikipedia
Wikipedia as a source of training data for hedge classification
Adopt Wikipedia’s notion of weasel words: “Some people say”, “I
think”, “Clearly“, “is widely regarded as”, “it has been
said/suggested/noticed”, “It may be that ”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
221 / 448
Detecting speculated sentences
http://en.wikipedia.org/wiki/Weasel_word
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
222 / 448
Detecting speculated sentences
Data
Several Wikipedia dumps from the years 2006 to 2008
Only those articles that contained the string “{{weasel”.
168,923 unique sentences containing 437 weasel tags
Datasets
Development: dump completed on July 14, 2008
Test: dump completed on March 6, 2009
I
Created a balanced test set chosing one random, non-tagged sentence
per tagged sentence
246 manually annotated sentences for evaluation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
223 / 448
Detecting speculated sentences
Features
Words preceding the weasel tags. Each word within these 5-grams
receives an individual score, based on
I
I
The relative frequency of this word in weasel contexts and the corpus in
general
The average distance the word has to a weasel tag, if found in a weasel
context
Shallow linguistic features: three types of syntactic patterns:
I
I
I
Numericallly underspecified subjects (“Some people”, “Experts”,
“Many”)
Passive constructions (“It is believed”, “It is considered”)
Adverbs (“Often”, “Probably”)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
224 / 448
Detecting speculated sentences
Results
The syntactic patterns do not contribute to the regeneration of weasel
tags.
The decreasing precision of both approaches when trained on more
tagged sentences might be caused by the great number of unannotated
weasel words
The difference between wpw and asp becomes more distinct when the
manually annotated data form the test set
I
The added syntactic patterns indeed manage to detect weasels that have
not yet been tagged
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
225 / 448
Detecting speculated sentences. CoNLL ST’10
Task 1 Learning to detect sentences containing uncertainty: identify
sentences in texts which contain unreliable or uncertain
information
Task1B: Biological abstracts and full articles
Task1W: Wikipedia paragraphs
Task 2 Learning to resolve the in-sentence scope of hedge cues:
in–sentence scope resolvers have to be developed
Biological abstracts and full articles
Information source: R. Farkas, V. Vincze, G. Móra, J. Csirik, and G. Szarvas. The CoNLL-2010 Shared
Task: Learning to Detect Hedges and their Scope in Natural Language Text. Proceedings of the
Fourteenth Conference on Computational Natural Language Learning: Shared Task, pages 1-12
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
226 / 448
Detecting speculated sentences. CoNLL ST’10
Results Task 1
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
227 / 448
Detecting speculated sentences. CoNLL ST’10
Classifying Wikipedia sentences as uncertain - best system
M. Georgescul. A Hedgehop over a Max-Margin Framework Using Hedge Cues.
Proceedings of the Fourteenth Conference on Computational Natural Language
Learning: Shared Task, pages 26-31
Motivation: test whether a list of cues suffices for automatic hedge
detection
System based on SVM parameter tuning
Features: lexical information, i.e. features extracted from the list of
hedge cues provided with the training corpus.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
228 / 448
Detecting speculated sentences. CoNLL ST’10
Baseline classifying as “uncertain” any sentence that contains any of the
multi-word expressions labeled as hedge cues in the training corpus.
Small percentage of false negatives on the BioScope test data: only a
small percentage of “uncertain” sentences in the reference test dataset
do not contain a hedge cue that occurs in the training dataset.
Precision of baseline algorithm has values under 0.5 on all four datasets:
ambiguous hedge cues are frequently used in “certain” sentences.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
229 / 448
Detecting speculated sentences. CoNLL ST’10
System characteristics (Wikipedia)
Features
I
I
Frequency of each hedge cue provided with the training corpus in each
sentence
2-grams and 3- grams extracted from the list of hedge cues provided with
the training corpus
SVM
I
I
I
Gaussian Radial Basis kernel function
Width of the RBF kernel = γ 0.0625
Regularization parameter C = 10
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
230 / 448
Detecting speculated sentences. CoNLL ST’10
Results with best parameters
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
231 / 448
Detecting speculated sentences. CoNLL ST’10
Optimized results for biomedical train corpus
Learning curves show that the system is more efficient on abstracts than
on full articles
On test data the results are lower
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
232 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
233 / 448
Processing negation in biomedical texts
Related work
NegEx (Chapman et al. 2001) uses a regular expression algorithm that
implements phrases indicating negation in discharge summaries
NegFinder (Mutalik et al. 2001) uses rules to recognise negated patterns
occurring in medical narrative
Elkin et al. (2005) apply a grammar that assigns to each concept an
attribute (positive/negative/uncertain assertion)
Boytcheva et al. (2005) use negation rules based on regular expressions
to mark negated phrases (Bulgarian)
Sanchez-Graillet and Poesio (2007) develop heuristics for extracting
negative protein interactions
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
234 / 448
Processing negation in biomedical texts
Determining whether a finding, disease or concept is negated
Golding and Chapman (2003)
207 sentences from hospital reports
Naı̈ve Bayes, Decision Trees 90 F1
Averbuch et al. (2004)
Algorithm that uses information gain to learn negative context patterns
7 medical terms
97.47 F1
Huang and Lowe (2007) develop a hybrid system that combines regular
expression matching with parsing in order to locate negated concepts
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
235 / 448
Processing negation in biomedical texts: Negfinder
Negfinder is a rule-based system that recognizes a large set of negated
patterns occurring in medical narrative
Described in:
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
236 / 448
Processing negation in biomedical texts: Negfinder
Motivation
To increase the utility of concept indexing of medical documents, it is
necessary to record whether the concepts have been negated or not.
Medical personnel are trained to include pertinent negatives in their reports
Databases need to be searched to find relevant information for clinical
and research purposes
Documents pertaining to a specific domain may also be concept indexed
Phrases in the document are identified and matched to concepts in a
domain-specific thesaurus
For a medical document, however, the presence of a concept does not
necessarily make the document relevant for that concept
I
The concept may refer to a finding that was looked for but found to be
absent or that occurred in the remote past
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
237 / 448
Processing negation in biomedical texts: Negfinder
Motivation
In medical narrative, negations are direct and straightforward, since
clinicians are trained to convey the salient features of a case concisely
and unambiguously
Hypothesis: negations in dictated medical narrative are unlikely to cross
sentence boundaries and are also likely to be simple in structure.
I
Simple syntactic methods to identify negations might therefore be
reasonably successful
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
238 / 448
Processing negation in biomedical texts: Negfinder
Components
2
Concept-finding: identifies UMLS concepts
Input transformation: replace every instance of a concept or
compound concept in the original document with the UMLS ID
3
Lexing/parsing step:
1
I
I
4
Lexer: identifies a very large number of negation signals and classifies
them on the basis of properties such as whether they generally precede or
succeed the concept they negate and whether they can negate multiple
concepts
Parser: applies its grammar rules to associate the negation signal with a
single concept or with multiple concepts preceding or succeeding it
Verification step: marks up the original document by color-coding the
text to assist human validation of the program’s output
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
239 / 448
Processing negation in biomedical texts: Negfinder
(From Mutalik et al. (2001) Negation Detection to Augment Concept Indexing, JAMIA
2001 8: 598-609. )
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
240 / 448
Processing negation in biomedical texts: Negfinder
(From Mutalik et al. (2001) Negation Detection to Augment Concept Indexing, JAMIA
2001 8: 598-609.)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
241 / 448
Processing negation in biomedical texts: Negfinder
Negation complexities
The negation signals were quite heterogeneous, from single words (“no”,
“without”, “negative”) to simple phrases (“no evidence of”) and
complex verb phrases (“could not be currently identified”)
There is a large set of verbs that, when preceded by the word “not”,
negate their subject or object concept (“X is not seen”, “does not show
X”); but there are also a large number of verbs that do not do so (“X
did not decrease””, “does not affect X”). These need to be correctly
distinguished.
The negation signals may precede or succeed the concepts they have
scope over, and there may be several words between the two (“there was
absence of this type of X”, “X, in this instance, is absent”).
A single negation signal may serve to negate a whole list of concepts
either preceding or following it (“A, B, C, and D are absent” “without
evidence of A, B, C, or D”); or it may scope over some but not all of
them (“there is no A, B and C, and D seemed normal”).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
242 / 448
Processing negation in biomedical texts: Negfinder
Negation recognition
The Lexer
It recognizes 60 distinct words or patterns that express negation
It passes a specific token to the parser to represent the exact way in
which the NegP is used for negation.
A token is a combination of characteristics:
I
I
I
Does the NegP precede or follow the concepts it negates? (“No” – “not
present”)
Can the NegP negate multiple concepts? (“No”–“non”)
Is the terminal conjunction an “or” or an “and”?
“no murmurs, rubs or gallops”
“murmurs, rubs, and gallops are absent”
It outputs a “negation-termination” token
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
243 / 448
Processing negation in biomedical texts: Negfinder
The Parser
It assembles contiguous concepts into a list,
It associates a concept or a list of concepts with a negative phrase that
either precedes or follows it to form a negation
It accurately determines where the negation starts and ends.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
244 / 448
Processing negation in biomedical texts: Negfinder
Evaluation
Specificity: 91.8% Sensitivity: 95.7%
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
245 / 448
Processing negation in biomedical texts: Negfinder
Error analysis
no seizure activity throughout his detoxification: also marked
“detoxification” as being negated, because the word “throughout” was
not on its list of negation terminators
several blood cultures, six in all, had been negative: could not identify
the “blood cultures” as the concept that was being negated by the word
“negative” because it was too far away
Correctly parses some double negatives, such as
X-rays were negative except for...
but fails on others such as
The patient was unable to walk for long periods without dyspnea,
where it identified dyspnea as being negated
non-distended: does not recognize single words with contained negatives
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
246 / 448
Processing negation in biomedical texts: Context
ConText determines whether clinical conditions mentioned in clinical reports
are negated, hypothetical, historical, or experienced by someone other than
the patient
ConText can be integrated with any application that indexes clinical
conditions from text
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
247 / 448
Processing negation in biomedical texts: Context
Motivation
Clinical documents: source of information for detection and
characterization of outbreaks, decision support, recruiting patients for
clinical trials, and translational research
Improving precision of information retrieval and extraction from clinical
records by reducing false positives:
I
I
I
ruled out pneumonia
family history of pneumonia
past history of pneumonia
Most medical language processing applications index or extract
individual clinical conditions but do not model much information found
in the context of the condition
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
248 / 448
Processing negation in biomedical texts: Context
Algorithm
Assumption
A clinical condition in text is affirmed by default and that a departure from
the default value, i.e., the condition is absent, can be inferred from simple
lexical clues occurring in the context of the condition.
ConText is a regular-expression based algorithm that searches for trigger
terms preceding or following the indexed clinical conditions
If a condition falls within the scope of the trigger term, ConText changes
the default value to the value indicated by that trigger term
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
249 / 448
Processing negation in biomedical texts: Context
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
250 / 448
Processing negation in biomedical texts: Context
Trigger term
Trigger terms prompt ConText to change the default value of a contextual
property for a condition, provided the condition falls within the scope of the
trigger term
143 for negated, 10 for historical, 11 for hypothetical, and 26 for other
Pseudo-trigger terms for terms that contain trigger terms but do not act as
contextual property triggers
17 pseudo-triggers for negated (e.g., “no increase”, “not cause”’)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
251 / 448
Processing negation in biomedical texts: Context
Pseudo trigger terms
http://code.google.com/p/negex/wiki/NegExTerms
no increase
no suspicious change
no significant change
no change
no interval change
no definite change
no significant interval
change
not extend
R. Morante (CLiPS - University of Antwerp )
not cause
not drain
not certain if
not certain whether
gram negative
without difficulty
not necessarily
not only
Modality and Negation in NLP
November 8, 2011
252 / 448
Processing negation in biomedical texts: Context
Trigger terms http://code.google.com/p/negex/wiki/NegExTerms
absence of
cannot
cannot see
checked for
declined
declines
denied
denies
denying
evaluate for
fails to reveal
free of
negative for
never developed
never had
no
no abnormal
no cause of
no complaints of
no evidence
R. Morante (CLiPS - University of Antwerp )
no new evidence
no other evidence
no evidence to
suggest
no findings of
no findings to
indicate
no sign of
no significant
no signs of
no suggestion of
no suspicious
not
not appear
not appreciate
not associated
with
not complain of
not demonstrate
not exhibit
Modality and Negation in NLP
not know of
not known to have
not reveal
not see
not to be
patient was not
rather than
resolved
test for
to exclude
unremarkable for
with no
without
without any
evidence of
without evidence
without indication of
without sign of
...
November 8, 2011
253 / 448
Processing negation in biomedical texts: Context
Termination terms http://code.google.com/p/negex/wiki/NegExTerms
but
however
nevertheless
yet
though
although
still
aside from
except
apart from
secondary to
as the cause of
as the source of
as the reason of
as the etiology of
as the origin of
as the cause for
as the source for
as the reason for
as the etiology for
R. Morante (CLiPS - University of Antwerp )
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
the origin of
the cause for
the source for
the reason for
the etiology for
the origin for
the secondary cause of
the secondary source of
the secondary reason of
the secondary etiology of
the secondary origin of
the secondary cause for
the secondary source for
the secondary reason for
the secondary etiology for
the secondary origin for
a cause of
a source of
a reason of
Modality and Negation in NLP
cause of
cause for
causes of
causes for
source of
source for
sources of
sources for
reason of
reason for
reasons of
reasons for
etiology of
etiology for
trigger event for
origin of
origin for
origins of
origins for
...
November 8, 2011
254 / 448
Processing negation in biomedical texts: Context
Scope of trigger terms
The default scope of a trigger term includes all clinical conditions following
the trigger term until the end of the sentence or a termination term, but this
scope can be overridden
History of COPD, presentingtermination
R. Morante (CLiPS - University of Antwerp )
term
with shortness of breath
Modality and Negation in NLP
November 8, 2011
255 / 448
Processing negation in biomedical texts: Context
Definition of scope for negation
There is a set of 14 “left-looking” trigger terms or post-triggers. The scope
of these trigger terms runs from the trigger term leftward to the beginning of
the sentence, and can be terminated by any regular, intervening termination
term.
Eg.: “is ruled out”, “are not seen”, “negative”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
256 / 448
Processing negation in biomedical texts: Context
Algorithm
1
Mark up all trigger terms, pseudo-trigger terms, and termination terms
in the sentence.
2
Iterate through the trigger terms in the sentence from left to right:
I
I
If the trigger term is a pseudo-trigger term, skip to the next trigger term.
Otherwise, determine the scope of the trigger term and assign the
appropriate contextual property value to all indexed clinical conditions
within the scope of the trigger term.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
257 / 448
Processing negation in biomedical texts: Context
Evaluation
Performs comparably well on all report types, apart from discharge
summaries
FP in discharge summaries are due to missing terms, the pseudo-trigger
“with/without”
Access to linguistic knowledge will improve performance by making the
determination of the scope of a trigger term more precise
Lexical clues or trigger words for negation, when they occur in multiple
report types, have the same interpretation across report types
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
258 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
259 / 448
Scope resolution: Negation
Task definition
Finding the scope of a negation signal means determining at a sentence level
which words in the sentence are affected by the negation(s)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
260 / 448
Scope resolution: Negation
R. Morante and W. Daelemans (2009) A metalearning approach to
processing the scope of negation. Proceedings of the Thirteenth
Conference on Computational Natural Language Learning (CoNLL),
pages 21-29, Boulder, Colorado, June 2009. Association for
Computational Linguistics.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
261 / 448
Scope resolution: Negation
Modelling the task
We model the scope finding task as two consecutive classification tasks:
1
2
Finding negation signals: a token is classified as being at the beginning of
a negation signal, inside or outside
Finding the scope: a token is classified as being the first element or the
last of a scope sequence
Supervised machine learning approach
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
262 / 448
Scope resolution: Negation
Corpora: BioScope
Abstracts corpus:
10 fold cross-validation experiments
Clinical and full papers corpora: robustness test
I
I
Training on abstracts
Testing on clinical and full papers
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
263 / 448
Scope resolution: Negation
System architecture
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
264 / 448
Scope resolution: Negation
Preprocessing
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
265 / 448
Scope resolution: Negation
Finding negation cues
We filter out negation signals that are
unambiguous in the training corpus (17
out of 30)
For the rest a classifier predicts whether a
token is the first token of a negation
signal, inside or outside of it
I
I
I
Algorithm : IGTREE as implemented in
TiMBL (Daelemans et al. 2007)
Instances represent all tokens in a
sentence
Features about the token:
F
I
Lemma, word, POS and IOB chunk
tag
Features about the token context:
F
Word, POS and IOB chunk tag of 3
tokens to the right and 3 to the left
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
266 / 448
Scope resolution: Negation
Finding negation cues: results
Baseline: tagging as negation signals tokens that are negation signals at
least in 50% of the occurrences in the training corpus
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
267 / 448
Scope resolution: Negation
Finding negation cues: system versus baseline
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
268 / 448
Scope resolution: Negation
Finding negation cues: results in 3 corpora
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
269 / 448
Scope resolution: Negation
Discussion
Cause of lower recall on papers corpus:
Errors: not is classified as negation signal
However, programs for tRNA identification [...] do not necessarily
perform well on unknown ones
The evaluation of this ratio is difficult because not all true interactions
are known
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
270 / 448
Scope resolution: Negation
Scope finding
The features used by the object classifiers and the metalearner are
different
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
271 / 448
Scope resolution: Negation
Scope finding
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
272 / 448
Scope resolution: Negation
Scope finding: features classifiers
Of the negation signal: Chain of words
Of the paired token: Lemma, POS, chunk IOB tag, type of chunk;
lemma of the second and third tokens to the left; lemma, POS, chunk
IOB tag, and type of chunk of the first token to the left and three
tokens to the right; first word, last word, chain of words, and chain of
POSs of the chunk of the paired token and of two chunks to the left and
two chunks to the right.
Of the tokens between the negation signal and the token in focus:
Chain of POS types, distance in number of tokens, and chain of chunk
IOB tags.
Others: A feature indicating the location of the token relative to the
negation signal (pre, post, same).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
273 / 448
Scope resolution: Negation
Scope finding: features metalearner
Of the negation signal: Chain of words, chain of POS, word of the two
tokens to the right and two tokens to the left, token number divided by
the total number of tokens in the sentence.
Of the paired token: Lemma, POS, word of two tokens to the right
and two tokens to the left, token number divided by the total number of
tokens in the sentence.
Of the tokens between the negation signal and the token in focus:
Binary features indicating if there are commas, colons, semicolons,
verbal phrases or one of the following words between the negation signal
and the token in focus: Whereas, but, although, nevertheless,
notwithstanding, however, consequently, hence, therefore, thus, instead,
otherwise, alternatively, furthermore, moreover.
About the predictions of the three classifiers: prediction, previous
and next predictions of each of the classifiers, full sequence of previous
and full sequence of next predictions of each of the classifiers.
Others: A feature indicating the location of the token relative to the
negation signal (pre, post, same).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
274 / 448
Scope resolution: Negation
Scope finding: postprocesing
Scope is always a consecutive block of scope tokens, including the
negation signal
The classifiers predict the first and last token of the scope sequence:
None or more than one FIRST and one LAST elements are predicted
In the post-processing we apply some rules to select one FIRST and one
LAST token
I
Example: If more than one token has been predicted as FIRST, take as
FIRST the first token of the negation signal
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
275 / 448
Scope resolution: Negation
Scope finding: baseline
Baseline: calculating the average length of the scope to the right of the
negation signal and tagging that number of tokens as scope tokens
Motivation: 85.70 % of scopes to the right
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
276 / 448
Scope resolution: Negation
The system performs clearly better than baseline
There is a higher upperbound calculated with gold standard negation
signals
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
277 / 448
Scope resolution: Negation
Scope finding: results
The system is portable
Lower results in the papers corpus
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
278 / 448
Scope resolution: Negation
Scope finding: discussion
Clinical reports are easier to process than abstracts and papers
Negation signal no is very frequent (76.65 %) and has a high PCS
(73.10 %)
No findings to account for symptoms
No signs of tuberculosis
Sentences are shorter in clinical reports than in abstracts and papers:
I
I
Average length in clinical reports is 7.8 tokens vs. 26.43 in abstracts and
26.24 in full papers
75.85 % of the sentences have 10 or less tokens
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
279 / 448
Scope resolution: Negation
Scope finding: discussion
Papers are more difficult to process than abstracts
I
Negation signal not is frequent (53.22%) and has a low PCS (39.50) in
papers. Why?
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
280 / 448
Scope resolution: Negation
Scope finding: discussion
The metalearner performs better than the three object classifiers (except
SVMs on the clinical corpus)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
281 / 448
Scope resolution: Negation
Resolving the scope of negation for sentiment analysis
I. Councill, R. McDonald, and L. Velikovich (2010) What’s great and
what’s not: learning to classify the scope of negation for improved
sentiment analysis. Proceedings of the Workshop on Negation and
Speculation in NLP. Uppsala.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
282 / 448
Scope resolution: Negation
Goal
To construct a negation system that can correctly identify the presence or
absence of negation in spans of text that are expressions of sentiment
Focus on explicit negation mentions
Conditional Random Fields (Lafferty, McCallum and Pereira 2001)
I
Structured prediction learning framework
Features from depedency syntax
Evaluation on corpus of product reviews and BioScope corpus
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
283 / 448
Scope resolution: Negation
Datasets
BioScope corpus
Product Reviews corpus (by Google, not publicly available)
I
I
I
I
268 product reviews sampled from Google Product Search
2111 sentences, 679 sentences with negation
91% inter-annotator agreement, strict exact span
Robustness: ungrammatical sentences, misspelling
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
284 / 448
Scope resolution: Negation
Lexicon of negation cues
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
285 / 448
Scope resolution: Negation
System description
1
Negation cues are detected using a lexicon
2
Scopes are processed by the negation annotator:
I
I
Input: sentence boundary + dependency (MaltParser) annotations
Algorithm: CRF++
F
I
Label set of size two indicating whether a token is within or outside of a
negation span
Features: (next slide)
F
Only unigram features are employed, but each unigram feature vector is
expanded to include bigram and trigram representations derived from the
current token in conjunction with the prior and subsequent tokens
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
286 / 448
Scope resolution: Negation
Features
Lowercased token string
POS of a token
Linear token-wise distance to the nearest explicit negation cue to the
right of a token
Linear token-wise distance to the nearest explicit negation cue to the left
of a token
PoS of the the first order dependency of a token
Minimum number of dependency relations that must be traversed to
from the first order dependency head of a token to an explicit negation
cue
PoS of the the second order dependency of a token
The minimum number of dependency relations that must be traversed to
from the second order dependency head of a token to an explicit
negation cue
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
287 / 448
Scope resolution: Negation
Evaluation
Punctuation tokens are not counted
BioScope: 5-f cv; Reviews: 7-f cv
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
288 / 448
Scope resolution: Negation
Negation system built into a sentiment analysis pipeline
1
Sentence boundary detection: finds and scores mentions of n-grams
found in a large lexicon of sentiment terms and phrases
2
Sentiment detection:
3
Negation scope detection
Sentence sentiment scoring:
4
I
I
Determines whether any scored sentiment terms fall within the scope of a
negation, and flips the sign of the sentiment score for all negated
sentiment terms
Sums all sentiment scores within each sentence and computes overall
sentence sentiment scores
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
289 / 448
Scope resolution: Negation
Effect on sentiment classification
1135 sentences
Human raters were asked to classify each sentence as expressing one of
the following types of sentiment:
I
I
I
I
positive
negative
neutral
mixed positive and negative
216 sentences (19% contained negations)
I
I
I
I
positive: 73
negative: 114
neutral: 12
mixed positive and negative: 17
The effect of the negation system on sentiment classification was
evaluated on the smaller subset of 216 sentences
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
290 / 448
Scope resolution: Negation
Effect on sentiment classification
A significant improvement is apparent at all recall levels
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
291 / 448
Scope resolution: Negation
Effect on sentiment classification
Performance is improved
by introducing negation
scope detection
The precision of positive
sentiment predictions sees
the largest improvement,
largely due to the inherent
bias in the sentiment
scoring algorithm
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
292 / 448
Scope resolution: Hedges
Task definition
Finding the scope of a hedge cue means determining at a sentence level
which words in the sentence are affected by the hedge(s)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
293 / 448
Scope resolution: Hedges
Related work
Machine learning systems
I
I
I
Morante and Daelemans (2009a,b)
Agarwal and Yu (2010)
Zhu et al. (2010)
Rule-based systems: use syntactic information
I
I
I
Jia et al. (2009)
Özgür and Radev (2009)
Øvrelid et al. (2010)
CoNLL Shared Task 2010
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
294 / 448
Scope resolution: Hedges
R. Morante and W. Daelemans: Learning the scope of hedge cues in biomedical texts.
In Proceedings of the BioNLP 2009 Workshop, pages 28-36, Boulder, Colorado, June
2009. Association for Computational Linguistics.
System based on the system that processed negation cues
Goal: investigate whether the same system can process hedge cues
We model the scope finding task as two consecutive classification tasks:
1
2
Finding negation signals
Finding the scope
BioScope corpus
I
I
Abstracts corpus:
10 fold cross-validation experiments
Clinical and papers corpora: robustness test
Training on abstracts - Testing on clinical and papers
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
295 / 448
Scope resolution: Hedges
Data preprocessing
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
296 / 448
Scope resolution: Hedges
System architecture
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
297 / 448
Scope resolution: Hedges
Results cue finding
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
298 / 448
Scope resolution: Hedges
Results cue finding across corpora
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
299 / 448
Scope resolution: Hedges
Discussion
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
300 / 448
Scope resolution: Hedges
Results scope resolution across corpora
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
301 / 448
Scope resolution: Hedges
Results scope resolution
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
302 / 448
Scope resolution: Hedges
Discussion
Why are the results in papers lower?
I
I
41 cues (47.00%) in papers are not in abstracts
Some cues that occur in abstracts and are frequent in papers get low
scores. They are used differently.
(Ex. suggest: 92.33 PCS in abstracts vs. 62.85 PCS in papers)
Why are the results in clinical lower?
I
I
68 cues (35.45%) in clinical are not in abstracts
Frequent hedge cues in clinical are not represented in abstracts
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
303 / 448
Scope resolution: Hedges
Comparison negation - hedge processing systems
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
304 / 448
Scope resolution: Hedges
Comparison negation - hedge processing systems
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
305 / 448
Scope resolution: Hedges
A. Özgür and D.R. Radev. 2009. Detecting speculations and their scopes in scientific
text. In Proc. of EMNLP 2009, pages 1398-1407, Singapore.
Supervised hedge detection
I
Algorithm: SVM
Scope finding based on syntactic information
I
Data parsed with Stanford Dependency Parser (de Marneffe et a. 2006)
Data: BioScope corpus
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
306 / 448
Scope resolution: Hedges
Features for hedge detection
Dependency syntax
I
I
I
Clausal Complement set to 1 if the keyword has a child which is
connected to it with a clausal complement or infinitival clause
dependency type
Negation: set to 1 if the keyword (1) has a child which is connected to it
with a negation dependency type or the determiner “no” is a child of the
keyword
Auxiliary: set to 1 if the keyword has a child which is connected to it
with an auxiliary dependency type
Positional features for abstracts
Motivation: different parts of a text might have different characteristics
in terms of the usage of speculative language
I
I
For abstracts: title, first sentence, last sentence
For full papers: title, first sentence, last sentence, background, results,
methods, conclusion, legend
Co-occurring keywords
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
307 / 448
Scope resolution: Hedges
Results hedge detection in abstracts
Baseline 1: 14 keywords Light et al. (2004)
Baseline 2: keywords from train corpus
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
308 / 448
Scope resolution: Hedges
Results hedge detection in papers
Baseline 1: 14 keywords Light et al. (2004)
Baseline 2: keywords from train corpus
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
309 / 448
Scope resolution: Hedges
Resolving the scopes
Assumption: the scope of a keyword can be characterized by its
part-of-speech and the syntactic structure of the sentence in which it
occurs
Rule-based approach
I
I
I
I
I
I
The scope of a conjunction or a determiner is the syntactic phrase to
which it is attached
The scope of a modal verb is the “VP” to which it is attached
The scope of an adjective or an adverb starts with the keyword and ends
with the last token of the highest level “NP” which dominates the
adjective or the adverb
The scope of a verb followed by an infinitival clause extends to the whole
sentence
The scope of a verb in passive voice extends to the whole sentence
If none of the above rules apply, the scope of a keyword starts with the
keyword and ends at the end of the sentence
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
310 / 448
Scope resolution: Hedges
Results scope resolution
Baseline 1: assign scope to the whole sentence
Baseline 2: assign scope from keyword to the end of the sentence
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
311 / 448
Scope resolution: Hedges
Agarwal, Sh. and H. Yu (2010) Detecting hedge cues and their scope in biomedical
text with conditional random fields. J Biomed Inform. 2010 Dec;43(6):953-61
Supervised system using CRF as implemented in the ABNER library
Pipeline system: cue identification + scope resolution
Task modelled as in Morante and Daelemans (2009) and Özgür and
Radev (2009)
The corpus partitions and the evaluation measures are different. Systems
are not comparable
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
312 / 448
Scope resolution: Hedges
Systems (From Agarwal et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
313 / 448
Scope resolution: Hedges
Results (From Agarwal et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
314 / 448
Scope resolution: Hedges
http://snake.ims.uwm.edu/hedgescope/index.php
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
315 / 448
Scope resolution: Hedges
Task 1 Learning to detect sentences containing uncertainty: identify
sentences in texts which contain unreliable or uncertain
information
Task1B: Biological abstracts and full articles
Task1W: Wikipedia paragraphs
Task 2 Learning to resolve the in-sentence scope of hedge cues:
in–sentence scope resolvers have to be developed
Biological abstracts and full articles
Information source: R. Farkas, V. Vincze, G. Móra, J. Csirik, and G. Szarvas. The CoNLL-2010 Shared
Task: Learning to Detect Hedges and their Scope in Natural Language Text. Proceedings of the
Fourteenth Conference on Computational Natural Language Learning: Shared Task, pages 1-12
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
316 / 448
Scope resolution: Hedges
Approaches (Table from Farkas et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
317 / 448
Scope resolution: Hedges
Evaluation
Task 1
Task 2
Sentence level
A scope-level
F1 of the
uncertain class
F1 measure
True positives were scopes which exactly matched
the gold standard cue phrases and gold standard
scope boundaries assigned to the cue word.
Extrict match: including or excluding
punctuations, citations or some bracketed
expressions
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
318 / 448
Scope resolution: Hedges
Datasets
Biological data
Wikipedia data
Train: BioScope corpus
(abstracts from Genia corpus,
5 full articles from functional
genomics literature,
4 articles from BMC
Bioinformatics.
Train: 2186 paragraphs
(11111 sentences)
Test: 2346 paragraphs
(9634 sentences total, of which
2234 uncertain)
Test: 15 biomedical articles from
PubMedCentral
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
319 / 448
Scope resolution: Hedges
Datasets
(Tables from B. Tang, X. Wang, X. Wang, B. Yuan, and Sh. Fan. 2010. A Cascade Method for Detecting Hedges and their Scope in Natural
Language Text. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010): Shared Task, pages
25-29.)
Modality and
Negation in NLP
November 8, 2011
320 / 448
R. Morante (CLiPS - University of Antwerp )
Scope resolution: Hedges
Results Task 2 cues (Table from Farkas et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
321 / 448
Scope resolution: Hedges
Results Task 2 scopes (Table from Farkas et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
322 / 448
Scope resolution: Hedges
Approaches (Table from Farkas et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
323 / 448
Scope resolution: Hedges
Finding hedge cues in biomedical texts - best system
B. Tang, X. Wang, X. Wang, B. Yuan, and Sh. Fan. 2010. A Cascade Method for
Detecting Hedges and their Scope in Natural Language Text. In Proceedings of the
Fourteenth Conference on Computational Natural Language Learning
(CoNLL-2010): Shared Task, pages 25-29.
CRF-based system
Cascaded system: hedge detection → scope detection
First-Last classification for scope
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
324 / 448
Scope resolution: Hedges
(Figure from Tang et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
325 / 448
Scope resolution: Hedges
Features hedge detection - first layer
Word andWord Shape of the lemma
Prefix and Suffix with length 3-5.
Context of the lemma, POS and the chunk in the window [-2,2].
Combined features lemma-chunk, lemma-POS of focus token and
previous and next token
The type of a chunk; the lemma and POS sequences of it
Whether a token is a part of the pairs ”neither ... nor” and ”either ... or”
From dictionary (training corpus): whether a token can possibly be
classified into B cue, I cue or O cue; its lemma, POS and chunk tag for
each possible case:
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
326 / 448
Scope resolution: Hedges
Features hedge detection - first layer
Same as first layer
The lemma and POS sequences of the hedge predicted by each classifier.
The times of a token classified into B cue, I cue and O cue by the first
two classifiers.
Whether a token is the last token of the hedge predicted by each
classifier
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
327 / 448
Scope resolution: Hedges
Features scope detection
Same as first layer
Word
Context of the lemma, POS, the chunk, the hedge and the dependency
relation in the window [-2,2].
Combined features including L0 C0 , L0 H0 , L0 D0 , Li P0 , Pi C0 , Pi H0 , Ci H0 ,
Pi D0 ,Ci D0 , where -1 ≤ i ≥1 L denotes the lemma of a word, P denotes a
POS, C denotes a chunk tag, H denotes a hedge tag and D denotes a
dependency relation tag.
The type of a chunk; the lemma and POS sequences of it
The type of a hedge; the lemma, POS and chunk sequences of it
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
328 / 448
Scope resolution: Hedges
Features scope detection
The lemma, POS, chunk, hedge and dependency relation sequences of 1st and
2nd dependency relation edges; the lemma, POS, chunk, hedge and
dependency relation sequences of the path from a token to the root
Whether there are hedges in the 1st, 2nd dependency relation edges or path
from a token to the root
The location of a token relative to the negation signal: previous the first
hedge, in the first hedge, between two hedge cues, in the last hedge, post the
last hedge
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
329 / 448
Scope resolution: Hedges
Postprocessing
If a hedge is bracketed by a F scope and a L scope, its scope is formed
by the tokens between them
If a hedge is only bracketed by a F scope, and there is no L scope in the
sentence, search for the first possible word from the end of the sentence
according to a dictionary, which extracted from the training corpus, and
assign it as L scope.
The scope of the hedge is formed by the tokens between them.
If a hedge is only bracketed by a F scope, and there are at least one L
scope in the sentence, the last L scope is the L scope of the hedge, and
its scope is formed by the tokens between them.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
330 / 448
Scope resolution: Hedges
Postprocessing
If a hedge is only bracketed by a L scope, and there is no F scope in the
sentence, search for the first possible word from the beginning of the
sentence to the hedge according to the dictionary, and assign it as F
scope.
The scope of the hedge is formed by the tokens between them.
If a hedge is only bracketed by a L scope, and there are at least one F
scope in the sentence, search for the first possible word from the hedge
to the beginning of the sentence according to the dictionary, and think it
as the F scope of the hedge.
The scope of the hedge is formed by the tokens between them.
If a hedge is bracketed by neither of them, remove it.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
331 / 448
Scope resolution: Hedges
Results hedge detection (Table from Tang et al. 2010)
Algorithms: CRR++ and SVMLight
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
332 / 448
Scope resolution: Hedges
Results hedge detection(Table from Tang et al. 2010)
Algorithm: CRF++
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
333 / 448
Scope resolution: Hedges
Finding the scopes of hedge cues in biomedical texts - best system
Roser Morante, Vincent Van Asch, and Walter Daelemans. 2010. Memory-based
Resolution of Insentence Scopes of Hedge Cues. In Proceedings of the Fourteenth
Conference on Computational Natural Language Learning (CoNLL-2010): Shared
Task, pages 48–55.
Memory-based learning
Features from dependency trees
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
334 / 448
Scope resolution: Hedges
A sentence where scopes should be found
<The conservation from Drosophila to mammals of these two
structurally distinct but functionally similar E3 ubiquitin ligases is
likely to reflect a combination of evolutionary advantages associated
with: (i) specialized expression pattern, as evidenced by the
cell-specific expression of the neur gene in sensory organ precursor
cells [52]; (ii) specialized function, as <suggested by the role of
murine MIB in TNFa signaling> [32]; (iii) regulation of protein
stability, localization, and/or activity>.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
335 / 448
Scope resolution: Hedges
A sentence where scopes should be not found
For example, the word may in sentence 1 indicates that there is some
uncertainty about the truth of the event, whilst the phrase Our results
show that in 2) indicates that there is experimental evidence to back
up the event described by encodes
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
336 / 448
Scope resolution: Hedges
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
337 / 448
Scope resolution: Hedges
Different version of system in Morante and Daelemans (2009)
I
I
I
I
One classifier per task, instead of a metalearner combining three
classifiers
Features from the dependency tree instead of shallow features only
Better treatment of multiword cues
Postprocessing of references
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
338 / 448
Scope resolution: Hedges
Data are converted into the CoNLL format
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
339 / 448
Scope resolution: Hedges
Evaluation of the conversion of the corpus into CoNLL format
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
340 / 448
Scope resolution: Hedges
Classification 1: cues
Instances represent tokens
BIO classification of tokens
IGTree as implemented in TiMBL
Features
I
I
I
Token
Token context in string of words and dependency tree
Lexicon of cues from training data
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
341 / 448
Scope resolution: Hedges
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
342 / 448
Scope resolution: Hedges
Classification 2: scope
An instance represents a pair of a predicted cue and a token
Tokens are classified as being FIRST, LAST or none in scope sequence
for as many cues as there are in the sentence
IB1 as implemented in TiMBL
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
343 / 448
Scope resolution: Hedges
Features classification scope
Features about cue, token, and their context in the string of words and
in the dependency tree
Features indicating whether token is candidate to be the FIRST and to
be LAST
I
Values are assigned by a heuristics that takes into account detailed
information from the dependency tree (voice of clause, PoS of cue,
lemma of cue, etc.)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
344 / 448
Scope resolution: Hedges
Postprocessing steps
P-SCOPE builds a sequence of scope tokens based on 7 rules
I
Classifier predicts only FIRST and LAST element in the scope
P-REF eliminates references from the scope at the end of clause and
sentence
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
345 / 448
Scope resolution: Hedges
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
346 / 448
Scope resolution: Hedges
Error analysis
Task 1: system fails to treat Or
I
BIO papers: 3 TP, 8 FP, 49 FN
Task 2:
I
I
I
I
error propagation from Task 1
errors derived from incorrect dependency trees
errors derived from wrong encoding of features with dependency
information
subordinate clauses are kept within the scope of cues in the main clause
The test corpus contained a full paper with metalanguage
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
347 / 448
Scope resolution: Hedges
BiographTA: NeSp scope labeler
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
348 / 448
Scope resolution: Hedges
BiographTA: NeSp scope labeler
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
349 / 448
Scope resolution: Hedges
Øvrelid,L., E. Velldal, and S. Oepen (2010) Syntactic Scope Resolution in
Uncertainty Analysis Proceedings of the 23rd International Conference on
Computational Linguistics (COLING 2010) Beijing, China, 2010
Hybrid, two-level approach for hedge resolution,
I
I
A statistical classifier (MaxEnt) detects cue words
A small set of manually crafted rules operating over syntactic structures
resolve scope
Syntactic information contributes to the resolution of in-sentence scope
of hedge cues
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
350 / 448
Scope resolution: Hedges
Rules for scope resolution
Input for rules: a parsed sentence which has been further tagged with
hedge cues.
Rules operate over the dependency structures and additional features
provided by the parser (MaltParser)
Evaluation (Table from Ovrelid et al. 2010)
Default: scope from cue to end of sentence. BSE: evaluation on CoNLL test set
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
351 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
352 / 448
Finding negated and speculated events: BioNLP ST 2009
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/
J-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii (2009) Overview of BioNLP’09
Shared Task on Event Extraction Proceedings of the Workshop on BioNLP: Shared
Task, pages 1-9, Boulder, Colorado, ACL.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
353 / 448
Finding negated and speculated events: BioNLP ST 2009
Results (From Kim et al. 2009)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
354 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
355 / 448
Modality tagging
K. Baker, M. Bloodgood, B. J. Dorr , N. W. Filardo , L. Levin, and Christine Piatko.
A modality lexicon and its use in automatic tagging. Proceedings of LREC 2010,
pages 1402–1407.
Modality tagger: produces text or structured text in which modality
triggers and/or targets are identified
Two modality taggers:
I
I
String-based English tagger
Structure-based English tagger
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
356 / 448
Modality tagging
String-based modality tagger
Input: text with POS tags from a Collins-style statistical parser
Marks spans of words/phrases that exactly match modality trigger words
in the modality lexicon
Identifies the target of each modality using the heuristic of tagging the
next non-auxiliary verb to the right of the trigger
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
357 / 448
Modality tagging
Structure-based modality tagger
Input: text that has been parsed
The parsed sentences are processed by TSurgeon rules
TSurgeon rules:
I
I
Pattern: matches part of a parse tree
Finds a modality trigger word and its target
Action: alters the parse tree
Inserts tags such as TrigRequire and TargRequire for triggers and targets
for the modality Require
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
358 / 448
Modality tagging
Output from structure-based modality tagger (Figure from Baker et al.
2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
359 / 448
Modality tagging
Evaluation
Agreement between taggers (Kappa)
I
I
0.82 for triggers
0.76 for targets
Precision of structure-based tagger on 249 sentences: 86.3 %
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
360 / 448
Modality tagging
Errors
Light verbs tagged as semantic target
The decision should be taken on delayed cases on the basis of merit
“Decision” should have been marked
Wrong word sense
Sikhs attacked a train
Attack is not used in the sense of ‘try’ (e.g. attack the problem)
Coordinate structures
Non-heads of compound nouns tagged as target, instead of head
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
361 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
362 / 448
Belief categorisation: Committed belief tagging
Prabhakaran, V., O. Rambow and M. Diab (2010) Automatic committed belief
tagging. Proceedings of COLING 2010, pages 1014–1022.
“We need to abandon a simple view of text as a repository of
propositions about the world”
“the result of text processing is not a list of facts about the world, but a
list of facts about different people’s cognitive states”
Goal: to recognize what the writer of the text intends the reader to
believe about various people’s beliefs about the world (including the
writer’s own)
To determine which propositions he or she intends us to believe he or
she holds as beliefs, and with what strength
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
363 / 448
Belief categorisation: Committed belief tagging
Corpus (Diab et al. 2009)
10,000 words annotated for speaker belief of stated propositions. Each
verbal proposition is annotated with the tags:
I
I
I
Committed belief (CB): the writer indicates in this utterance that he or
she believes the proposition
We know that GM has laid off workers
Non-committed belief (NCB): the writer identifies the proposition as
something which he or she could believe, but he or she happens not to
have a strong belief in
GM may lay off workers
Not applicable (NA): for the writer, the proposition is not of the type in
which he or she is expressing a belief, or could express a belief
F
F
F
Expressions of desire: Some wish GM would lay of workers
Questions: Will GM lay off workers?
Expressions of requirements: GM is required to lay off workers
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
364 / 448
Belief categorisation: Committed belief tagging
Experiments
Algorithms:
I
I
SVM, YAMCHA (Kudo and Matsumoto, 2000) sequence labeling system
CRF implementation of the MALLET toolkit (McCallum, 2002)
Features: syntactic and lexical
Models
I
I
Joint: a four-way classifiation task where each token is tagged as one of
four classes – CB, NCB, NA, or O
Pipeline:
1
2
Identifying the propositions
Classifying each proposition as CB, NCB, or NA
Evaluation: 4-fold cv
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
365 / 448
Belief categorisation: Committed belief tagging
Results
(Table from Prabhakaran et al. 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
366 / 448
Belief categorisation: Committed belief tagging
Features that were useful
R. Morante (CLiPS - University of Antwerp )
(Table from Prabhakaran et al. 2010)
Modality and Negation in NLP
November 8, 2011
367 / 448
Belief categorisation: Committed belief tagging
Features that were not useful
R. Morante (CLiPS - University of Antwerp )
(Table from Prabhakaran et al. 2010)
Modality and Negation in NLP
November 8, 2011
368 / 448
Belief categorisation: Committed belief tagging
Some conclusions YAMCHA
Syntactic features improve the classifier performance
Syntactic features with no context improve Recall by 4.8 % over only
lexical features with context
Adding back context to lexical features further improves Precision by
4.9 %
Adding context of syntactic features improves both Precision and Recall
NCB performs much worse than the other two categories
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
369 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
370 / 448
Processing contradiction and contrast
S. Harabagiu, A. Hickl and F. Lacatusu (2006) Negation, contrast and
contradiction in text processing. Proceedings of the 21st national
conference on Artificial intelligence - Volume 1, pages 755-762
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
371 / 448
Processing contradiction and contrast
Contradictions occur whenever information that is communicated in two
different texts is incompatible
Framework for recognizing contradictions between multiple text sources
by relying on three forms of linguistic information:
I
I
I
negation
antonymy
semantic and pragmatic information associated with the discourse
relations
Contradictions need to be recognized by QA systems or by
Multi-Document Summarization (MDS) systems
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
372 / 448
Processing contradiction and contrast
(From Harabagiu et al. 2006)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
373 / 448
Processing contradiction and contrast
The recognition of contradictions is useful to fusion operators, that
consider information originating in different texts
I
When contradictory information is discovered, the answer selects
information from only one of the texts, discarding its contradiction
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
374 / 448
Processing contradiction and contrast
Two views for contradiction detection:
View 1 Contradictions are recognized by identifying and removing
negations of propositions and then testing for textual
entailment
View 2 Contradictions are recognized by deriving linguistic information
from the text inputs, including information that identifies
negations, contrasts, or oppositions and by training a classifier
based on examples
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
375 / 448
Processing contradiction and contrast
System architecture (From Harabagiu et al. 2006)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
376 / 448
Processing contradiction and contrast
Types of negation detected
I
Overt negation
F
F
F
I
the morpheme n’t and not
negative quantifiers like no (also “no one” and “nothing”)
strong negative adverbs like “never”
Indirectly licensed negation.
F
F
F
F
verbs ( “deny”, “fail”, “refuse”, “keep from”)
prepositions (“without”, “except”)
weak quantifiers (“few”, “any”, “some”)
traditional negative polarity items such as “a red cent” or “any more”
Types of negated constituents: events, states and entities
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
377 / 448
Processing contradiction and contrast
Negation detection steps
1
Preprocessing: negation markers are flagged
2
Detect negated events: filter out events without predicates marked as
negated
An predicate is negated if it falls within the scope of a negative marker
3
Detec negated entities: any noun phrase that falls within the scope of an
overt negative quantifier (“no”) or a non-veridical quntifier (“few, some,
many”)
4
Detect negated states:
I
I
Detect states based on WordNet
A state is negated if it falls within the scope of a negative marker
The system eliminates negations and reverts the polarity of negated events,
entities and states by using antonyms and paraphrases
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
378 / 448
Processing contradiction and contrast
Jung-jae Kim, Zhuo Zhang, Jong C. Park and See-Kiong Ng (2006)
BioContrasts: extracting and exploiting protein-protein contrastive
relations from biomedical literature. Bioinformatics 22 (5): 597-605.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
379 / 448
Processing contradiction and contrast
Protein and interaction databases have been compiled from experimental
data and published literature
However, the information captured in these resources are typically
individual positive facts of the kind such as ‘protein A binds to protein
B’.
Kim et al. (2006) extract contrastive information between proteins from
the biomedical literature to augment the information in current protein
databases.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
380 / 448
Processing contradiction and contrast
http://biocontrasts.biopathway.org/index.php
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
381 / 448
Processing contradiction and contrast
With the BioContrast database users can
Search for contrasts of proteins of interest with their Swiss-Prot IDs or
names
Browse and navigate networks of protein–protein contrasts graphically
Search for contrasts that are associated with KEGG pathways, InterPro
domain entries, and Gene Ontology concepts, which may be useful for
enhancement of KEGG pathway, inference over contrasts between
protein domains, and subcategorization of Gene Ontology concepts.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
382 / 448
Processing contradiction and contrast
NAT1 binds eIF4A but not eIF4E and inhibits both cap-dependent
and cap-independent translation (PMID: 90306851).
Truncated N-terminal mutant huntingtin repressed transcription,
whereas the corresponding wild-type fragment did not repress
transcription (PMID:11739372).
Parts:
1
Focused objects: a contrastive pair of two or more objects that are so
contrasted (e.g. eIF4A, eIF4E, wild-type huntingtin, mutant huntingtin)
2
Presupposed property: a biological property or process that the
contrast is based on (e.g. binding to NAT1, transcription repression).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
383 / 448
Processing contradiction and contrast
PPI Contrast
A protein-protein contrast is a contrast between two proteins A and B, called
as “focused proteins”, which indicates that A but not B is involved in a
biological property C, called as “presupposed property”, or vice versa.
Contrast information is often encoded by contrastive negation patterns
such as “A but not B” in the biomedical literature.
Such contrast:
I
I
explicitly describes a difference between focused proteins in terms of its
presupposed property
implicitly indicates that the focused proteins are semantically similar
This combination of difference and similarity between proteins is useful
for augmenting proteomics databases and also for discovering novel
knowledge.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
384 / 448
Processing contradiction and contrast
Extracting contrastive relations
Given a MEDLINE abstract:
1
The system first locates sentences that contain the negative ‘not’.
2
It then identifies contrastive expressions from these sentences using
either subclausal coordination or clause level parallelism.
3
If the contrastive expressions are, or can be reduced to, protein names,
the system produces a contrast between the two proteins.
4
It then cross-links (i.e. grounds) the contrastive protein names with
entries of a standard protein database (namely, Swiss-Prot).
The net result is a database of useful biological contrastive relations between
actual Swiss-Prot entries.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
385 / 448
Processing contradiction and contrast
Extracting contrastive relations
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
386 / 448
Processing contradiction and contrast
PMID: 10762618
Year: 2000
Sent: Immunohistochemical studies revealed that HAI-1 but not
HAI-2 was detected more strongly in regenerative epithelium
than in normal epithelium , although both proteins were
detected throughout the human gastrointestinal tract .
Positive: HAI-1
PosSprot: SPIT1 HUMAN:“HAI-1”
Negative: HAI-2
NegSprot: SPIT2 HUMAN:“HAI-2”
Property: CONTRAST OBJ was detected more strongly in
regenerative epithelium than in normal epithelium
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
387 / 448
Processing contradiction and contrast
Identifying subclausal coordinations
1
Given a sentence that contains a ‘not’, the system first tries to identify
contrastive expressions using subclausal coordination patterns
2
The system analyzes the word-level similarity by checking whether the
variable-matching phrases are semantically identical or at least in a
subsumption relation
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
388 / 448
Processing contradiction and contrast
In contrast, IFN-gamma priming did not affect the expression
of p105 transcripts but enhanced the expression of p65 mRNA
Matching pattern: ‘not V NP but V NP’
1
Match the V variables to the verbs ‘affect’ and ‘enhanced’
2
Match the NP variables to the noun phrases ‘the expression of p105
transcripts’ and ‘the expression of p65 mRNA’
3
Analyze the similarity between the verbs and the similarity between the
noun phrases
I
I
4
Synonymy and hypernymy relations in WordNet for verbs and adjectives
Biomedical databases, WordNet and own resource
Determine the presupposed property for the focused proteins by
extracting the subject phrase and the verb whose object phrases
correspond to the focused proteins
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
389 / 448
Processing contradiction and contrast
Identifying clause-level parallelisms
The system checks whether
1
the linguistic expressions that match the variables with the same
subscript (e.g. {V1 , V’1 } are either semantically identical (e.g.
{‘repress’, ‘repressed’}) or are in a subsumption relation (e.g. {‘affect’,
‘activate’})
2
the variables with the subscript ‘C’ (e.g. {SubjC , SubjC ’}), which
indicate focused objects of the pattern, are matched to semantically
similar expressions (e.g. {‘eIF4A’, ‘eIF4E’}).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
390 / 448
Processing contradiction and contrast
Truncated N-terminal mutant huntingtin repressed transcription, whereas the
corresponding wild-type fragment did not repress transcription
1
Locate the verb ‘repress’ in the subordinate clause which is negated by
‘not’.
2
Locate the positive verb ‘repressed’ of the main clause.
3
Identify the corresponding subject phrases and the object phrases in the
two clauses.
Subject phrase in the main clause = ‘Truncated N-terminal mutant huntingtin’
Object phrase= ‘transcription’
Subject phrase of the subordinate clause = ‘the corresponding wild-type fragment’,
Object phrase= ‘transcription’.
4
Check that the two verb phrases and the two object phrases are all
semantically identical.
Contrastive relation extracted here is one between the two protein names at
the corresponding subject positions with respect to the presupposed
biological property of ‘CONTRAST OBJ repressed transcription’.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
391 / 448
Processing contradiction and contrast
Evaluation
Processed data:
I
I
I
I
2.5 million corpus from MEDLINE abstracts processed
799169 pairs of contrastive expressions
11284 pairs contrastive protein names
41471 contrasts between Swiss-Prot entries (a protein maybe grounded
with multiple Swiss-Prot entries)
Test data:
I
I
I
I
I
100 pairs of constrastive proteins examined
97 % precision
61.5 % recall from previous system
91 contrastive patterns ‘A but not B’
5 parallelism patterns (40% precision)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
392 / 448
Processing contradiction and contrast
Refining pathway roles of similar proteins
In the pathway for well-studied Huntington’s disease (HD) a key node in
the pathway was labeled generically as ‘caspase’
‘Caspase’ can be resolved as caspase-3 andor caspase-6
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
393 / 448
Processing contradiction and contrast
KEGG Huntington’s disease pathway
http://www.genome.jp/kegg/pathway/hsa/hsa05016.html
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
394 / 448
Processing contradiction and contrast
Refining pathway roles of similar proteins
A contrast between caspase-3 and caspase-6 is extracted by
BioContrasts:
Importantly, Mch2, but not Yama or LAP3, is capable of cleaving
lamin A to its signature apoptotic fragment, indicating that Mch2 is
an apoptotic laminase (PMID:8663580).
It suggests that the two proteins may not function identically
An article from MEDLINE explains the difference between the two
proteins in terms of the cleavage sites at Htt:
We have previously shown that Htt is cleaved in vitro by caspase-3 at
amino acids 513 and 552, and by caspase-6 at amino-acid position 586
(PMID:10770929).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
395 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
396 / 448
Visualising negation features
D. Oelke, P. Bak, D. A. Keim, M. Last, G. Danon, Visual Evaluation of Text Features
for Document Summarization and Analysis, Proceedings of the IEEE Symposium on
Visual Analytics Science and Technology 2008 (IEEE VAST 2008), Columbus, OH,
USA, 2008.
“The major challenge in computational text analysis is the gap between
automatically computable text features and the users’ ability to control and
evaluate these features.”
Application of documentfingerprinting for visualizing text features as
part of an interactive feedback loop between evaluation and feature
engineering
Based on Literature Fingerprint (Keim and Oelke 2007)
I
I
I
Documents are represented by a pixel-based visualization in which each
pixel represents one unit of text
The color of each pixel is mapped to its feature value
The visualization takes the document structure into account
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
397 / 448
Visualising negation features
Pipeline for visual evaluation of text features applied for document summarization and
analysis
Figure from Oelke et al. 2008:76
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
398 / 448
Visualising negation features
Opinion mining experiments
Classifying reviews of digital cameras as positive or negative
Lexical approach: dictionary of negative and positive polarity words
To get values on sentence level, for each sentence the number of
negative words is substracted from the number of positive words
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
399 / 448
Visualising negation features
The visualization has been annotated with comments on some of the wrongly
classified statements. Figure from Oelke et al. (2008:78)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
400 / 448
Visualising negation features
Error analysis
Errors: negation is not taken into account, and nouns are not included in
the list of opinion words
Improvements:
I
I
Negation is taken into account by inverting the value of a word if one of
the three preceding words is a negation signal word
Nouns with negative positive connotations are added to the list of
opinion words
Evaluate whether the extensions result in improvement
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
401 / 448
Visualising negation features
Visualising the effect of the extensions. Figure from Oelke et al. (2008:78)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
402 / 448
Outline
9
Detecting speculated sentences
10
Processing negation in biomedical texts
11
Scope resolution
12
Finding negated and speculated events
13
Modality tagging
14
Belief categorisation
15
Processing contradiction and contrast
16
Visualising negation features
17
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
403 / 448
References
Detecting Speculative Language
Di Marco, Ch. and R. E. Mercer (2004) Hedging in Scientific Articles as a Means of Classifying
Citations. In Proceedings of Working Notes of AAAI Spring Symposium on Exploring Attitude
and Affect in Text: Theories and Applications, Stanford University.
Ganter, V. and M. Strube (2009) Finding hedges by chasing weasels: Hedge detection using
wikipedia tags and shallow linguistic features. In Proceedings of the ACL-IJCNLP 2009
Conference Short Papers, pages 173-176.
Kilicoglu, H. and S. Bergler (2008) Recognizing Speculative Language in Biomedical Research
Articles: A Linguistically Motivated Perspective. In Proceedings of Current Trends in Biomedical
Natural Language Processing (BioNLP), Columbus, Ohio, USA.
Light, M., X. Ying Qiu, and P. Srinivasan (2004) The Language of Bioscience: Facts,
Speculations, and Statements in between. In Proceedings of the HLT BioLINK.
Medlock, B. (2008) Exploring hedge identification in biomedical literature. Journal of Biomedical
Informatics, 41:636-54.
Medlock, B. and T. Briscoe (2007) Weakly Supervised Learning for Hedge Classification in
Scientific Literature. In Proceedings of the 45th Annual Meeting of the Association of
Computational Linguistics.
Szarvas, G. (2008) Hedge classification in biomedical texts with a weakly supervised selection of
keywords. In Proceedings of the ACL-08: HLT.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
404 / 448
References
Velldal, E. (2011) Predicting Speculation: A Simple Disambiguation Approach to Hedge
Detection in Biomedical Literature Journal of Biomedical Semantics, Vol. 2, Suppl. 5, BioMed
Central, October 2011.
Verbeke, M., P. Frasconi, V. Van Asch, R. Morante.3, W. Daelemans, and L. De Raedt.
Kernel-based Logical and Relational Learning with kLog for Hedge Cue Detection. ILP 2011.
Processing Negation in Biomedical Texts
S. Boytcheva, A. Strupchanska, E. Paskaleva, and D. Tcharaktchiev (2005) Some aspects of
negation processing in electronic health records. In Proc. of International Workshop Language
and Speech Infrastructure for Information Access in the Balkan Countries, pages 1-8, Borovets,
Bulgaria.
W.W. Chapman,W. Bridewell, P. Hanbury, G. F. Cooper, and B.G. Buchanan (2001) A simple
algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform,
34:301-310.
P. L. Elkin, S. H. Brown, B. A. Bauer, C.S. Husser, W. Carruth, L.R. Bergstrom, and D. L.
Wahner-Roedler (2005) A controlled trial of automated classification of negation from clinical
notes. BMC Medical Informatics and Decision Making, 5(13).
l. M. Goldin and W.W. Chapman (2003) Learning to detect negation with ‘Not’ in medical texts.
In Proceedings of ACM-SIGIR 2003.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
405 / 448
References
S. Goryachev, M. Sordo, Q.T. Zeng, and L. Ngo (2006) Implementation and evaluation of four
different methods of negation detection. Technical report, DSG.
Y. Huang and H.J. Lowe (2007) A novel hybrid approach to automated negation detection in
clinical radiology reports. J Am Med Inform Assoc, 14(3):304-311
A.G. Mutalik, A. Deshpande, and P.M. Nadkarni (2001) Use of general-purpose negation
detection to augment concept indexing of medical documents. a quantitative study using the
UMLS. J Am Med Inform Assoc, 8(6):598-609.
L. Rokach, R.Romano, and O. Maimon (2008) Negation recognition in medical narrative reports.
Information Retrieval Online.
O. Sanchez-Graillet and M. Poesio (2007) Negation of protein-protein interactions: analysis and
extraction. Bioinformatics, 23(13):424-432
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
406 / 448
References
Resolving the Scope of Negation
Agarwal, Sh. and H. Yu (2010) Biomedical negation scope detection with conditional random
fields. J Am Med Inform Assoc. 2010 Nov 1;17(6):696-701
Blanco, E. and D. Moldovan (2011) Semantic Representation of Negation Using Focus Detection.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages
581-589, Portland, Oregon, ACL.
Councill, I., R. McDonald, and L. Velikovich (2010) What’s great and what’s not: learning to
classify the scope of negation for improved sentiment analysis. Proceedings of the Workshop on
Negation and Speculation in NLP. Uppsala.
Morante, R., A. Liekens, and W. Daelemans (2008) Learning the Scope of Negation in Biomedical
Texts. In Proceedings of EMNLP
Morante, R. and W. Daelemans (2009) A Metalearning Approach to Processing the Scope of
Negation. In Proceedings of CoNLL
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
407 / 448
References
Resolving the Scope of Hedges
Agarwal, Sh. and H. Yu (2010) Detecting hedge cues and their scope in biomedical text with
conditional random fields. J Biomed Inform. 2010 Dec;43(6):953-61
Farkas, R., V. Vincze, G. Móra, J. Csirik, and G. Szarvas (2010) The CoNLL-2010 Shared Task:
Learning to Detect Hedges and their Scope in Natural Language Text. Proceedings of the
Fourteenth Conference on Computational Natural Language Learning: Shared Task, pages 1-12
Morante, R. and W. Daelemans (2009) Learning the scope of hedge cues in biomedical texts. In
Proceedings of the BioNLP 2009 Workshop, pages 28-36.
Morante, R., V. Van Asch, and W. Daelemans (2010) Memory-based Resolution of Insentence
Scopes of Hedge Cues. In Proceedings of the Fourteenth Conference on Computational Natural
Language Learning (CoNLL-2010): Shared Task, pages 48–55.
Özgür, A. and D.R. Radev (2009) Detecting speculations and their scopes in scientific text. In
Proceedings of EMNLP 2009, pages 1398-1407.
Øvrelid,L., E. Velldal, and S. Oepen (2010) Syntactic Scope Resolution in Uncertainty Analysis
Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)
Beijing, China, 2010
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
408 / 448
References
Tang, B., X. Wang, X. Wang, B. Yuan, and Sh. Fan. 2010. A Cascade Method for Detecting
Hedges and their Scope in Natural Language Text. In Proceedings of the Fourteenth Conference
on Computational Natural Language Learning (CoNLL-2010): Shared Task, pages 25-29.
All papers from the Proceedings of the CoNLL-2010 Shared Task: Learning to Detect Hedges and
their Scope in Natural Language Text
Committed Belief Tagging
Prabhakaran, V., O. Rambow and M. Diab (2010) Automatic committed belief tagging.
Proceedings of COLING 2010, pages 1014–1022.
Diab, M. T., L. Levin, T. Mitamura, O. Rambow, V. Prabhakaran, and W. Guo (2009)
Committed belief annotation and tagging. In ACL-IJCNLP ’09: Proceedings of the Third
Linguistic Annotation Workshop, pages 68-73.
Contradiction
Harabagiu, S., A. Hickl and F. Lacatusu (2006) Negation, contrast and contradiction in text
processing. Proceedings of the 21st national conference on Artificial intelligence - Volume 1,
pages 755-762
Kim, J., Zh. Zhang, J. C. Park and See-Kiong Ng (2006) BioContrasts: extracting and exploiting
protein-protein contrastive relations from biomedical literature. Bioinformatics 22 (5): 597-605.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
409 / 448
Part IV
Modality and Negation in Applications
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
410 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
411 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
412 / 448
Information sources
B. Pang and L. Lee (2008) Opinion mining and sentiment analysis.
Foundations and Trends in Information Retrieval, Vol. 2, Nos 1-2, 1-135.
M. Wiegand, A. Balahur, B. Roth, D. Klakow and A. Montoyo (2010) A
survey of the role of negation in sentiment analysis. Proc. of the
Workshop on Negation and Speculation in Natural Language Processing,
Uppsala, pages 60-68.
B. Liu. Opinion Mining and Sentiment Analysis: NLP Meets Social
Sciences. http:
//www.cs.uic.edu/~liub/FBS/Liu-Opinion-Mining-STSC.ppt
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
413 / 448
Sentiment analysis
Types of textual information
I
I
Facts
Opinions
Most current information processing sytems work with factual
information
In 2001 a new research area emerged: sentiment analysis
Why then?
I
I
Word-of-mouth on the web: the web contains huge amounts of
opinionated text
User-generated media: one can express opinions on anything in forums,
discussion groups, blogs, social networks, ...
(Slide adapted from B. Liu, Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences. Workshop on Social Theory and Social
Computing 2010)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
414 / 448
Sentiment analysis
Sentiment analysis (Pang and Lee 2010)
“A sizeable number of papers mentioning “sentiment analysis” focus on the
specific application of classifying reviews as to their polarity (either positive
or negative), a fact that appears to have caused some authors to suggest that
the phrase refers specifically to this narrowly defined task. However,
nowadays many construe the term more broadly to mean the computational
treatment of opinion, sentiment, and subjectivity in text.”
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
415 / 448
Sentiment classification
Document level (Pang et al. 2002, Turney 2002): classify a document as
positive or negative based on the overall sentiment expressed by opinion
holder
Sentence level (Wiebe et al. 2004): classify a sentence as
I
I
Objective or subjective
Having positive or negative polarity
Feature level (Hu and Liu 2004): finding opinions related to features of
objects
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
416 / 448
Sentiment classification feature based
“Sentiment analysis is not simply the problem of determining whether a
document, a paragraph or even a sentence expresses a positive or negative
sentiment or opinion. It is also about entities. Without such information, any
sentiment is of little practical use. So one should not only talk about
sentiment analysis of documents, paragraphs or sentences, but also about the
entities that sentiments have been expressed upon. Here an entity can be a
product, service, person, organisation, event or topic”
(Liu 2009, An Interview on Sentiment Analysis and Opinion Mining by
textAnalyticsNews.com, April 20, 2009)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
417 / 448
Negation in sentiment analysis
(Wiegand et al. 2010)
Negation words can change the polarity of an expression:
I like+ this new Nokia model – I do [not like+ ]− this new Nokia model
Not all negation words change the polarity
Not only is this phone expensive but it is also heavy and difficult to use
The presence of an actual negation word in a sentence does not mean
that all its polar opinions are inverted
[I do [not like+ ]− the design of new Nokia model] but [it contains some
intriguing+ new functions]
Surface realization of negation is variable
I
I
I
Diminishers/valence shifters:
I find the functionality of the new phone less practical
Connectives:
Perhaps it is a great phone, but I fail to see why
Modals:
In theory, the phone should have worked even under water
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
418 / 448
Negation in sentiment analysis: computational models
Model: contextual valence shifting (Polanyi and Zaenen, 2004)
The model assigns scores to polar expressions
If a polar expression is negated, its polarity score is simply inverted
clever (+2) ← not clever (-2)
For diminishers, the score is only reduced rather than shifted to the
other polarity type
efficient (+2) ← rather efficient (+1)
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
419 / 448
Negation in sentiment analysis: ML approaches
Bag of words approach (Pang et al., 2002)
Fairly effective
The supervised classifier has to figure out by itself which words in the
dataset are polar and which are not
It does not contain any explicit knowledge of polar expressions
Negation modeling: adding artificial words
I do not NOT like NOT this NOT new NOT Nokia NOT model
increases the feature space with more sparse features
The scope of negation cannot be properly modeled with this
representation
The impact of negation modeling on this level of representation is limited
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
420 / 448
Negation in sentiment analysis: benchmark
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
421 / 448
Negation in sentiment analysis: ML approaches
Expression-level polarity classification (Wilson et al. 2005, 2009)
Supervised machine learning where negation modeling is mostly encoded
as features using polar expressions
Three feature types (next slide)
Adding these three feature groups to a feature set comprising bag of
words and features counting polar expressions results in a significant
improvement
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
422 / 448
Negation in sentiment analysis: ML approaches
Features
Negation features
I
I
Check whether a negation expression occurs in a fixed window of four
words preceding the polar expression
Does the polar predicate have a negated subject?
[No politically prudent Israeli]subject could supportpolarpred either of them
I
Negation expressions are additionally disambiguated
Shifter features: binary features checking the presence of different
types of polarity shifters (e.g. little)
Polarity modification features: describe polar expressions of a
particular type modifying or being modified by other polar expressions
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
423 / 448
Negation in sentiment analysis: benchmark
http://www.cs.pitt.edu/mpqa/
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
424 / 448
Negation in sentiment analysis: benchmark
http://www.cs.pitt.edu/opinionfinderrelease/
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
425 / 448
Negation in sentiment analysis: shallow semantic
composition
Compositional semantics (Choi and Cardie 2008)
The polarity of a phrase can be computed in two steps:
I
I
The assessment of polarity of the constituents
The subsequent application of a set of previously defined inference rules.
Example of a rule:
Polarity([NP1]− [IN] [NP2]− ) = +
[lack]− NP1 [of]IN [crime]− NP2 in rural areas
They define syntactic contexts of the polar expressions
From each context a direct polarity for the entire expression can be
derived
Advantage: they restrict the scope of negation to specific constituents
rather than using the scope of the entire target expression
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
426 / 448
Negation in sentiment analysis: bad vs. not good
Polarity as a continuum (Liu and Seneff 2009)
Not bad and good may have the same polarity but they differ in their
respective polar strength, i.e. not bad is less positive than good
Unifying account for intensifiers (e.g. very), diminishers, polarity shifters
and negation words
I
I
Compositional rules for polar phrases, such as adverb-adjective or
negation-adverb-adjective are defined exclusively using the scores of the
individual words
Adverbs function like universal quantifiers scaling either up or down the
polar strength of the specific polar adjectives they modify
Polarity is treated compositionally and is interpreted as a continuum
rather than a binary classification
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
427 / 448
Negation in sentiment analysis: using negation in lexicon
induction
Lexicon induction
The process of acquiring lexical resources that compile knowledge of which
natural language expressions are polar
The observation that negations co-occur with polar expressions has been
used for inducing polarity lexicons on Chinese in an unsupervised manner
(Zagibalov and Carroll, 2008)
The model relies on the observation that a polar expression can be
negated but it occurs more frequently without the negation.
I
The distributional behaviour of an expression, i.e. significantly often
co-occurring with a negation word but significantly more often occurring
without a negation word makes up a property of a polar expression.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
428 / 448
Negation in sentiment analysis: limits of negation modeling
Many polar expressions, such as disease are ambiguous
He is a disease to every team he has gone to
Early symptoms of the disease are headaches, fevers, cold chills and body
pain
Some polar opinions are not lexicalized. World knowledge is needed
The next time I hear this song on the radio, I’ll throw my radio out of the
window
The use of irony can reflect an implicit negation of what is conveyed
through the literal use of the words (Carvalho et al. 2009)
A polarity classifier should also be able to decompose words and carry
out negation modeling within words
not-so-nice, anti-war or offensiveless
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
429 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
430 / 448
Recognizing textual entailment
de Marneffe, M. C., B. MacCartney, T. Grenager, D. Cer, A. Rafferty, and Ch. D.
Manning (2006) Learning to distinguish valid textual entailments. In Proceedings of
the Second PASCAL Challenges Workshop on Recognising Textual Entailment.
Machine learning system. Alignment is followed by a classification step.
The system uses features from polarity and modality.
I
I
I
Polarity features “capture the presence (or absence) of linguistic markers
of negative polarity contexts in both the text and the hypothesis, such as
simple negation (not), downward-monotone quantfiers (no, few),
restricting prepositions (without, except) and superlatives (tallest)”.
Modality features “‘capture simple patterns of modal reasoning”. The
text and the hypothesis is mapped to one of six modalities: possible,
not possible, actual, not actual, necessary, and not
necessary.
Factuality features: a list of factive, implicative and non-factive verbs,
clustered according to the kinds of entailments they create.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
431 / 448
Recognizing textual entailment
Snow, R., L. Vanderwende, and A. Menezes (2006) Effectively using syntax for
recognizing false entailment. In Proceedings of HLT-NAACL, pages 33-40,
Morristown, NJ, USA. ACL.
Snow et al. (2006) present a RTE system that incorporates negation and
modality in order to recognize false entailment.
I
I
The system checks whether nodes that are aligned in the hypothesis and
text sentence have a negation or modality mistmatch.
If the mismatch exists, it is predicted that the entailment is false.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
432 / 448
Recognizing textual entailment
Hickl, A. and J. Bensley (2007) A discourse commitment-based framework for
recognizing textual entailment. In Proceedings of the ACL-PASCAL Workshop on
Textual Entailment and Paraphrasing, pages 171-176, Stroudsburg, PA, USA. ACL
Hickl and Bensley 2007: system that obtained the best absolute result in
the RTE-3 challenge (80% accuracy)
I
I
I
Based on identifying the set of publicly-expressed beliefs of the author
(discourse commitments)
A set of commitments are extracted from a text-hypothesis pair, so that
the RTE task can be reduced to the identification of the commitments
from a text that support the inference of the hypothesis.
A discourse commitment represents any of the set of propositions that
can be inferred to be true, given a conventional reading of the passage.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
433 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
434 / 448
Machine translation
Baker K., M. Bloodgood, B. J. Dorr , N. W. Filardo , L. Levin, and Ch. Piatko
(2010) A modality lexicon and its use in automatic tagging. Proceedings of LREC
2010, pages 1402–1407.
Baker, K., M. Bloodgood, Ch. Vallison-Burch, B. J. Dorr, N. W. Filardo, L. Levin, S.
Miller, Ch. Piatko (2010) Semantically-Informed Syntactic Machine Translation: A
Tree-Grafting Approach. Proceedings of AMTA 2010.
Measure the effect of modality tagging on the quality of machine
translation output in Urdu-English MT.
I
I
Modality annotation: Bleu measure from from 26.4 to 26.7
Modality + NE: from 26.4 to 26.9
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
435 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
436 / 448
Negfinder
Negfinder is a rule-based system that recognizes a large set of negated
patterns occurring in medical narrative
Described in:
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
437 / 448
ConText
ConText determines whether clinical conditions mentioned in clinical reports
are negated, hypothetical, historical, or experienced by someone other than
the patient
ConText can be integrated with any application that indexes clinical
conditions from text
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
438 / 448
Text mining: BioCaster
Conway, M., S. Doan, and N. Collier (2009) Using hedges to enhance a disease
outbreak report text mining system. In Proceedings of the BioNLP 2009 workshop
2009, pages 142-143, Boulder, Colorado. ACL.
A disease outbreak report text
mining system
The system scans online news
reports for stories about infectious
disease outbreaks and sends
e-mail alerts to registered users.
Additionally, a topic classifier
filters data which are used to
populate the Global Health
Monitor.
www.biocaster.org
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
439 / 448
Text mining: BioCaster
The BioCaster corpus consists of 1,000 news articles classified as being a
disease outbreak report or not.
Conway et al. 2009 find that the frequency of hedge cues differs in the
two categories of the BioCaster corpus, being more frequent in the
documents classified as reports.
The classifier is augmented with a binary hedge feature that is true if
one of the 105 hedge cues occurs in the text within 5 words of a disease
named entity.
The accuracy of this classifer is 0.8 % better than the accuracy of a
classifier that uses only unigrams, but it does not outperform the best
classifier that incorporates feature selection.
Hedge information is also used to assign a speculative metric to the
input documents of the BioCaster system, based on the frequecy of
hedge cues in 10,000 Reuters documents.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
440 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
441 / 448
Identifying the structure of scientific articles
Grabar, N. and Th. Hamon (2009) Exploitation of speculation markers to identify the
structure of biomedical scientifc writing. In AMIA 2009 Symposium Proceedings.
Automatically categorize article sections (abstract, introduction, material
and methods, results, discussion) based on the speculation cues that the
sections contain
The features are 363 speculation cues collected from biomedical articles,
which are classified into groups acoording to their strength.
When using all features, the sections abstract, results and materials and
methods can be classified with high accuracy.
Strong cues are specific of results, discussion and abstract, and non
strong cues of materials and methods.
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
442 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
443 / 448
Trustworthiness detection
Su, Q., Ch-R. Huang, and H. Kai-yun Chen (2010) Evidentiality for text
trustworthiness detection. In Proceedings of the 2010 Workshop on NLP and
Linguistics: Finding the Common Ground, pages 10-17, Uppsala, Sweden. ACL.
Incorporate evidentiality information to predict trustworthiness of text
information in the context of collaborative question answering.
In this context, trustworthiness is useful to find the best answers of the
system.
Hypothesis: evidentials will be used in less reliable answers.
Evidentiality is incorporated in the form of lexical features of a classifier
that detects best answers and non-best answers.
Results show a 14.85% increase in performance of the classifier with
evidentiality information over the baseline classifier (bag-of-words).
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
444 / 448
Outline
18
Sentiment analysis
19
Recognizing textual entailment
20
Machine translation
21
Text mining
22
Identifying the structure of scientific articles
23
Trustworthiness detection
24
References
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
445 / 448
References: sentiment analysis
Carvalho, P., L. Sarmento, M. J. Silva, and E. de Oliveira (2009) Clues for Detecting Irony in
User-Generated Contents: Oh...!! It’s “so easy” ;-). In Proceedings of CIKM-Workshop TSA
Choi, Y. and C. Cardie ( 2008) Learning with Compositional Semantics as Structural Inference for
Subsentential Sentiment Analysis. In Proceedings of EMNLP
Minqing, H.and B. Liu (2004) Mining and summarizing customer reviews. In KDD ’04:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and
data mining, pages 168-177, New York, NY, USA. ACM.
Liu, J. and S. Seneff (2009) Review Sentiment Scoring via a Parse-and-Paraphrase Paradigm. In
Proceedings of EMNLP
Pang B., L. Lee, and S. Vaithyanathan (2002) Thumbs up? Sentiment Classification Using
Machine Learning Techniques. In Proceedings of EMNLP
Pang B. and L. Lee (2008) Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval, Vol. 2, Nos 1-2, 1-135
Polanyi L. and A. Zaenen (2004) Context Valence Shifters. In Proceedings of the AAAI Spring
Symposium on Exploring Attitude and Affect in Text
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
446 / 448
References: sentiment analysis
Turney, P. D. (2002) Thumbs Up or Thumbs Down? Semantic Orientation Applied to
Unsupervised Classification of Reviews. ACL 2002: 417-424
Wiebe J., Th. Wilson, R. Bruce, M. Bell, and M. Martin (2004) Learning subjective language.
Comput. Linguist., 30(3):277-308
Wiegand, M., A. Balahur, B. Roth, D. Klakow and A. Montoyo (2010) A survey of the role of
negation in sentiment analysis. Proc. of the Workshop on Negation and Speculation in Natural
Language Processing, Uppsala, pages 60-68
Wilson, T., J. Wiebe, and P. Hoffmann (2005) Recognizing Contextual Polarity in Phrase-level
Sentiment Analysis. In Proceedings of HLT/EMNLP
Wilson, T., J. Wiebe, and P. Hoffmann (2009) Recognizing Contextual Polarity: An Exploration
for Phrase-level Analysis. Computational Linguistics, 35:3
Zagibalov, T. and J. Carroll (2008) Automatic Seed Word Selection for Unsupervised Sentiment
Classification of Chinese Text. In Proceedings of COLING
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
447 / 448
References
Machine Translation
Baker K., M. Bloodgood, B. J. Dorr , N. W. Filardo , L. Levin, and Ch. Piatko (2010) A modality
lexicon and its use in automatic tagging. Proceedings of LREC 2010, pages 1402–1407.
Baker, K., M. Bloodgood, Ch. Vallison-Burch, B. J. Dorr, N. W. Filardo, L. Levin, S. Miller, Ch.
Piatko (2010) Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach.
Proceedings of AMTA 2010.
Textual Entailment
de Marneffe, M. C., B. Maccartney, T. Grenager, D. Cer, A. Rafferty, and Ch. D. Manning (2006)
Learning to distinguish valid textual entailments. In Proceedings of the Second PASCAL
Challenges Workshop on Recognising Textual Entailment.
Hickl, A. and J. Bensley (2007) A discourse commitment-based framework for recognizing textual
entailment. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and
Paraphrasing, pages 171-176, Stroudsburg, PA, USA. ACL
Snow, R., L. Vanderwende, and A. Menezes (2006) Effectively using syntax for recognizing false
entailment. In Proceedings of HLT-NAACL, pages 33-40, Morristown, NJ, USA. ACL.
Classifying citations
Marco, Ch, F. Kroon, and R. Mercer (2006) Using Hedges to Classify Citations in Scientific
Articles. Computing Attitude and Affect in Text Theory and Applications, pages: 247-263
R. Morante (CLiPS - University of Antwerp )
Modality and Negation in NLP
November 8, 2011
448 / 448
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement