slides - Summer School on Natural Language Generation

slides - Summer School on Natural Language Generation
,
Tutorial on Abstractive Text
Summarization
Advaith Siddharthan
NLG Summer School, Aberdeen, 22 July 2015
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Tasks in text summarization
Extractive Summarization (previous tutorial)
Sentence Selection, etc
Abstractive Summarization
Mimicing what human summarizers do
Sentence Compression and Fusion
Regenerating Referring Expressions
Template Based Summarization
Perform information extraction, then use NLG Templates
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Cut and Paste in Professional Summarization
Humans also reuse the input text to produce summaries
But they don’t just extract sentences, they do a lot of cut and
paste
corpus analysis (Barzilay et al., 1999)
300 summaries, 1,642 sentences
81% sentences were constructed by cutting and pasting
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Major Cut and Paste Operations
Sentence Compression
ABACDCDFDSGFGDA −→ ABADFDSDA
Summarizing a sentence, e.g. for headline generation
Removes peripheral information from a sentence to shorten
summary
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Major Cut and Paste Operations
Sentence Compression
ABACDCDFDSGFGDA −→ ABADFDSDA
Summarizing a sentence, e.g. for headline generation
Removes peripheral information from a sentence to shorten
summary
Sentence Fusion
ABACDCDFDSGFG + CDCGFDGFGDA −→ ABAGFDDFDS
Merge information from multiple (similar) sentences.
Reduces redundancy in summary
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Major Cut and Paste Operations
Sentence Compression
ABACDCDFDSGFGDA −→ ABADFDSDA
Summarizing a sentence, e.g. for headline generation
Removes peripheral information from a sentence to shorten
summary
Sentence Fusion
ABACDCDFDSGFG + CDCGFDGFGDA −→ ABAGFDDFDS
Merge information from multiple (similar) sentences.
Reduces redundancy in summary
Syntactic Reorganization
ABADFGS −→ DFGSABA
Often done to make the summary coherent (preserve focus,
etc)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Major Cut and Paste Operations
Sentence Compression
ABACDCDFDSGFGDA −→ ABADFDSDA
Summarizing a sentence, e.g. for headline generation
Removes peripheral information from a sentence to shorten
summary
Sentence Fusion
ABACDCDFDSGFG + CDCGFDGFGDA −→ ABAGFDDFDS
Merge information from multiple (similar) sentences.
Reduces redundancy in summary
Syntactic Reorganization
ABADFGS −→ DFGSABA
Often done to make the summary coherent (preserve focus,
etc)
Lexical Paraphrase
ABACDFGDSFD −→ ABAGHYGDSFD
Use simpler words that are easier to understand in the new
context.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Sentence Compression
A research topic in itself, too many approaches to discuss here
in depth
Typically viewed as producing a summary of a single sentence
Should be shorter
Should remain grammatical
Should keep the most important information
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Sentence Compression
(Grefenstette, 1998; Jing et al., 1998; Knight & Marcu, 2000;
Riezler et al., 2003)...
Former Democratic National Committee finance director
Richard Sullivan faced more pointed questioning from
Republicans during his second day on the witness stand
in the Senate’s fund-raising investigation.
Richard Sullivan faced pointed questioning.
Richard Sullivan faced pointed questioning from Republicans
during day on stand in Senate fund-raising investigation.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Example: Reluctant Trimmer
Developed by Nomoto (Angrosh et al., 2014) for Text
Simplification (Siddharthan & Angrosh, 2014), rather than
summarization.
Considers text as a whole and optimises global constraints for:
lexical density
ratio of difficult words
text length
Reluctant Trimmer is based on reluctant paraphrasing (Dras,
1999) “make as little change as possible to the text to satisfy
a set of constraints”
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Reluctant Trimmer - Architecture
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Reluctant Trimmer - Graphical View
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Reluctant Trimmer - Graphical View
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Reluctant Trimmer - Graphical View
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Reluctant Trimmer
Decoded using ILP
Constraints can be specified at the level of a text, not an
individual sentence.
lexical density
ratio of difficult words
text length
While developed for text simplification, it can be adapted to
summarisation tasks by changing the constraints, for example
to take into account
some notion of topic
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Sentence Fusion
1
IDF Spokeswoman did not confirm this, but said the
Palestinians fired an antitank missile at a bulldozer.
2
The clash erupted when Palestinian militants fired machine
guns and antitank missiles at a bulldozer that was building an
embankment in the area to better protect Israeli forces.
3
The army expressed regret at the loss of innocent lives but a
senior commander said troops had shot in self-defense after
being fired at while using bulldozers to build a new
embankment at an army base in the area.
(Barzilay & McKeown, 2005; Marsi & Krahmer, 2005; Filippova &
Strube, 2008; Thadani & McKeown, 2013)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Graph Intersection
Palestian militants fired antitank missile at bulldozer
(Barzilay & McKeown, 2005)
Merge Sentences by aligning nodes
Identify Intersection
Linearise graph to contruct sentence
Some hand coded rules on what
cannot be cut (subject of verb, etc)
Use language model to pick between options
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Extensions to this approach
Marsi & Krahmer (2005) allow union as well as intersection
1
2
Posttraumatic stress disorder (PTSD) is a psychological
disorder which is classified as an anxiety disorder in the
DSM-IV.
Posttraumatic stress disorder (abbrev. PTSD) is a
psychological disorder caused bya mental trauma (also called
psychotrauma) that can develop after exposure to a terrifying
event.
Intersection: Posttraumatic stress disorder (PTSD) is a psychological
disorder.
Union: Posttraumatic stress disorder (PTSD) is a psychological
disorder, which is classified as an anxiety disorder in the
DSM-IV, caused by a mental trauma (also called
psychotrauma) that can develop after exposure to a terrifying
event.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Extensions to this approach
(Filippova & Strube, 2008)
Include topic model for deciding which nodes to keep
Encode semantic constraints for union through coordination:
Coordinated concepts have to be related, but not synonyms or
hyponyms, etc.
(Thadani & McKeown, 2013)
Supervised approach based on corpus of fused sentences
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Computational Approaches to Summarization
Bottom-Up
What is in these texts? Give me the gist.
User needs: anything that is important
System needs: generic importance metrics
Techniques: Extractive summarization, sentence compression
and fusion, etc.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Computational Approaches to Summarization
Bottom-Up
What is in these texts? Give me the gist.
User needs: anything that is important
System needs: generic importance metrics
Techniques: Extractive summarization, sentence compression
and fusion, etc.
Top-Down
I know what I want – Find it for me.
User needs: only certain types of information
System needs: particular criteria of interest, used to focus
search
Techniques: Information Extraction and Template-based
generation
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Top-Down Summaries
Information Extraction (IE)
Create Template for a particular type of story
Fields and values
Instantiate Fields from documents
Use Natural Language Generation to generate sentences from
Template
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
IE Summarisation Strategy
Instantiate Template by finding evidence – Pattern matching
on text
Thousands of people are feared dead following a powerful
earthquake that hit Afghanistan today. The quake registered
6.9 on the Richter scale.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Template for Natural Disasters
Disaster Type: earthquake
location: Afghanistan
magnitude: 6.9
epicenter: a remote part of the country
Damage:
human-effect:
number: Thousands of people
outcome: dead
confidence: medium
confidence-marker: feared
physical-effect:
object: entire villages
outcome: damaged
confidence: medium
confidence-marker: reports say
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
RIPTIDES (White et al., 2001)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Problems with Template approach
Templates are domain dependent
Manual effort in creating a template
Manual effort in designing a system that can generate
sentences from a template
Cannot create a template for every possible news story this way
Recent work attempts to learn such templates
Template Bank from historical texts (Schilder et al., 2013)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Templates, Generation and Reference
Error correction for Multilingual Summarization
Extractive approaches are limited in how they can address
noisy input (output of machine transation)
Replace sentences with similar ones from extraneous English
Documents (Evans et al., 2004)
Improves Readability
Exact Matches hard to find, so can change meaning/emphasis
Siddharthan & McKeown (2005); Siddharthan & Evans
(2005):
Apply a template approach to clean up referring expressions
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
References to People
Distribution on premodifying Words
In initial references to people in DUC human summaries
(monolingual task 2001-2004) Siddharthan et al. (2004)
71%
22%
Role: Prime Minister or Physicist
Time: former or designate
‘
Country, State, Location or Organization
Our task is to:
1
Collect all references to the person in different translations of each
document in the set
2
Identify above attributes, filtering any noise
3
Generate a reference
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Automatic semantic tagging
organization, location, person name
BBN’s IdentiFinder
country, state
CIA factsheet: includes adjectival forms
eg. United Kingdom/U.K./British/Briton
role
WordNet hyponyms of person
2371 entries including multiword expressions
eg. chancellor of the exchequer, brother in law etc.
Sequences of roles are conflated
temporal modifier
Also from WordNet, eg. former, designate
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Example of Analysis
...<NP> <ROLE> representative </ROLE> of <COUNTRY>
Iraq </COUNTRY> of the <ORG> United Nations </ORG>
<PERSON> Nizar Hamdoon </PERSON> </NP> that <NP>
thousands of people </NP> killed or wounded in <NP> the
<TIME> next </TIME> few days four of the aerial
bombardment of <COUNTRY> Iraq </COUNTRY> </NP>...


name
Nizar Hamdoon
 role

representative


 country

Iraq (arg1)
organization United Nations (arg2)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Identifying redundancy
Coreference by comparing AVMs

name
Nizar Hamdoon(2)
 role
representative(2)

 country
Iraq(2) (arg1)
organization United Nations(2) (arg2)




Numbers in brackets represent the counts of this value across all
references
The arg values now represent the most frequent ordering in the
input references
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Another Example








name
role
country
organization
time
Zeroual(24), Liamine Zeroual(20)
president(23), leader(2)
Algeria(18) (arg1)
Renovation Party(2) (arg1),
AFP(1) (arg1)
former(1)








Common issues:
Multiple roles and affiliations
Noise due to Errors from Tokenization, chunking, NE tools etc.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Removing Noise
1
Select the most frequent name with more than one word (this is
the most likely full name).
2
Select the most frequent role.
3
Prune the AVM of values that occur with a frequency below an
empirically determined threshold.

name
Zeroual(24), Liamine Zeroual(20)
 role
president(23), leader(2)

 country
Algeria(18) (arg1)

 organization Renovation Party(2) (arg1), AFP(1) (arg1)
time
former(1)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE






,
Generating references
Input Semantics:

name
 role

 country
organization

name
 role
country

Nizar Hamdoon

representative


Iraq (arg1)
United Nations (arg2)

Liamine Zeroual

president
Algeria (arg1)
To Generate,
Need knowledge of syntax
Determined by syntactic frames of role
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Acquiring frames
Acquire Frames for each role from semantic analysis of the Reuters
News corpus
ROLE=ambassador
(prob=.35) COUNTRY ambassador PERSON
(.18) ambassador PERSON
(.12) COUNTRY ORG ambassador PERSON
(.12) COUNTRY ambassador to COUNTRY PERSON
(.06) ORG ambassador PERSON
(.06) COUNTRY ambassador to LOCATION PERSON
(.06) COUNTRY ambassador to ORG PERSON
(.03) COUNTRY ambassador in LOCATION PERSON
(.03) ambassador to COUNTRY PERSON
Frames provide us with the required syntactic information
Word Order, Preposition Choice
Use most probable frame that matches
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Example
the representative of Iraq in the United Nations Nizar Hamdoon
+1
representative of Iraq of the United Nations Nizar
HAMDOON
↓


name
Nizar Hamdoon
 role

representative


 country

Iraq (arg1)
organization United Nations (arg2)
↓
Iraqi United Nations representative Nizar Hamdoon
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Automatic Evaluation
Compared with Model References:
First References to same person in Human translation
Data: DUC 2004 multilingual task
24 sets
6 used for development
18 used for evaluation
Baselines
Base1: most frequent initial reference to the person
Base2: randomly selected initial reference to the person
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Results
1-GRAMS
Generated
Base1
Base2
2-GRAMS
Generated
Base1
Base2
3-GRAMS
Generated
Base1
Base2
Pav
0.847*@
0.753*
0.681
Pav
0.684*@
0.598*
0.492
Pav
0.514*@
0.424*
0.338
Rav
0.786
0.805
0.767
Rav
0.591
0.612
0.550
Rav
0.417
0.432
0.359
Fav
0.799*@
0.746*
0.688
Fav
0.615*
0.562*
0.475
Fav
0.443*
0.393*
0.315
@ Significantly better than Base1
* Significantly better than Base2
(unpaired t-test at 95% confidence)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Redundancy and Error Correction
1
- Generated
-- -- -- Base1
- - - - - - - - - - - - - - - - Base2
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Back to Monolingual Multi-Doc Summarization
Nenkova et al. (2005); Siddharthan et al. (2011)
Task definition
In the Document Understanding Conference context:
Input : Cluster of 10 news reports on same event(s)
Output: 100 Word (or 665 byte) Summary
Data compression of around 50:1
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Scope for post-editing extractive summaries
News Reports
Av. Sentence Length: 21.4 words
Human Summaries
Av. Sentence Length: 17.4 words
Machine Summaries
Av. Sentence Length: 28.8 words
Data source: Document Understanding Conference (DUC)
2001–2004
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Sentence Compression
Grefenstette (1998), Knight & Marcu (2000), Riezler et al.
(2003)...
Former Democratic National Committee finance director Richard
Sullivan faced more pointed questioning from Republicans during
his second day on the witness stand in the Senate’s fund-raising
investigation.
Richard Sullivan faced pointed questioning.
Richard Sullivan faced pointed questioning from Republicans
during day on stand in Senate fund-raising investigation.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
But...
Lin (2003) showed that statistical sentence-shortening approaches
like Knight & Marcu (2000) do not improve content selection in
summaries.
Shortening approaches appear to remove the wrong words from a
summary...
Q: What are the right words to remove?
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Syntactic Simplification
PAL, which has been unable to make payments on dlrs 2.1 billion
in debt, was devastated by a pilots’ strike in June and by the
region’s currency crisis, which reduced passenger numbers and
inflated costs.
PAL has been unable to make payments on dlrs 2.1 billion in debt
PAL was devastated by a pilots’ strike in June and by the region’s
currency crisis.
The crisis reduced passenger numbers and inflated costs.
Does Syntactic Simplification help?
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
The Summary Genre
News Reports
One appositive or relative clause every 3.9 sentences
Human Summaries
One appositive or relative clause every 8.9 sentences
Machine Summaries
One appositive or relative clause every 3.6 sentences
Data source: Document Understanding Conference (DUC)
2001–2004
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Results (Siddharthan et al., 2004)
Removing Parentheticals improves content selection
PAL, which has been unable to make payments on dlrs 2.1 billion
in debt, was devastated by a pilots’ strike in June and by the
region’s currency crisis, which reduced passenger numbers and
inflated costs.
Shorter Sentences −→ Tighter Clusters:
1
PAL was devastated by a pilots’ strike in June and by the region’s
currency crisis.
2
In June, PAL was embroiled in a crippling three-week pilots’ strike.
3
The majority of PAL’s pilots staged a devastating strike in June.
4
In June, PAL was embroiled in a crippling three-week pilots’ strike.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Description and Content
Do machine summarizers err on side of too much description?
Removing Relative Clauses and Apposition from Input:
Siddharthan et al. (2004) and Conroy & Schlesinger (2004) report
significant improvement.
Removing Parentheticals improves content selection - Possibly at
expense of Coherence
Referring expressions require a formal treatment
inclusion of parentheticals just one aspect...
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Referring Expressions in Summaries
A Machine Summary
Turkey has been trying to form a new government since a
coalition government led by Yilmaz collapsed last month over
allegations that he rigged the sale of a bank. Ecevit refused
even to consult with Kutan during his efforts to form a government. Demirel consulted Turkey’s party leaders immediately
after Ecevit gave up.
Familiarity?
Minimal Description?
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Multi-Doc Summarization
In 100 words
Important events need to be summarized
Protagonists need to be described
There is therefore a tradeoff
Too little description −→ Incoherence
Too much description −→ Compromised content
What is the ideal level of description?
How much reference shortenning can we get away with?
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Information Status
Inferring Information Status for Referring Expression Generation
a. Federal Reserve Chairman Alan Greenspan suggested that the
Senate make the tax-cut permanent.
b. Greenspan suggested that the Senate make the tax-cut
permanent.
c. The Federal Reserve Chairman suggested that the Senate make
the tax-cut permanent.
Discourse new / Discourse old
Hearer new / Hearer old
Major / Minor
In 100 word summaries, you don’t want to waste space describing
entities that are hearer old
Or naming minor characters
Can information
status
be learnt?
Sentence Compression
Sentence
Fusion
Templates and NLG
Introduction
GRE
,
The Experiment
Assumptions
Writers of news reports have some idea of who the intended
readership is familiar with
This is reflected in how they describe people in the story
Information status can be learnt
Methodology
Label data with Information Status (this is the clever bit)
Perform lexical and syntactic analysis of references in news reports
Learn information status using features derived from above
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Acquiring Labeled Data
120 document sets (10 news reports each) and manual summaries
from DUC 2001–2004
In manual summaries:
Hearer Old/New
Marked entities as hearer old if first mention was title+last name or
only name.
Marked the rest as hearer new
Major/Minor Character
Marked entities as major if mentioned by name in at least one
summary
Marked as minor if not mentioned by name in any summary
118 examples of hearer-old, 140 of hearer-new.
258 examples of major characters, 3926 of minor.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Syntactic Analysis
[IR] Nobel laureate Andrei D. Sakharov ; [CO] Sakharov ; [CO]
Sakharov ; [CO] Sakharov ; [CO] Sakharov ; [PR] his ; [CO] Sakharov
; [PR] his ; [CO] Sakharov ; [RC] who acted as an unofficial Kremlin
envoy to the troubled Transcaucasian region last month ; [PR] he ;
[PR] He ; [CO] Sakharov ;
[IR] Andrei Sakharov ; [AP] , 68 , a Nobel Peace Prize winner and a
human rights activist , ; [CO] Sakharov ; [IS] a physicist ; [PR] his ;
[CO] Sakharov ;
Information collected for Andrei Sakharov from two news report.
IR = initial reference
CO = subsequent noun co-reference
PR = pronoun reference AP = apposition
RC = relative clause
IS = Copula
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Lexical Analysis
Unigram and Bigram models of Premodifiers
Obtained from 2 months worth of news articles from the web
Independent of DUC data - from Newsblaster logs
Formed list of 20 most frequent premodifying unigrams and
bigrams
Intuition:
Presidents more likely to be hearer old than judges...
Americans more likely to be hearer old than Turks...
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Classification Results
Major or Minor?
Algorithm
Weka (J48)
Majority class prediction
Accuracy
0.96
0.94
Familiarity (Hearer old or new?)
Algorithm
SVM (SMO Algorithm)
Majority class prediction (always hearer new)
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
Accuracy
0.76
0.54
GRE
,
The Generation Task
Two aspects to deciding initial references:
What (if any) premodifiers to use
What (if any) postmodifiers to use
Analysis of Premodifiers in DUC Human summaries
72% words were:
Role or Title (eg.Prime Minister, Physicist or Dr)
Or reference modifying adjectives such as former that have to be
included with the role.
DUC summarisers tended to follow journalistic convention and
incude these words for everyone.
But for greater compression, the role or title can be omitted for
hearer-old persons; eg. Margaret Thatcher instead of Former Prime
Minister Margaret Thatcher.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
The Generation Algorithm
1
If Minor Character Then:
1
2
Exclude name from reference and only Include role, temporal
modification and affiliation
Else If Major Character:
1
Include name
2
Include role and any temporal modifier, to follow journalistic
conventions
3
IF Hearer-old Then:
1
2
4
Else If Hearer-new Then:
If the person’s affiliation has already been mentioned And is the
most salient organization in the discourse at the point where the
reference needs to be generated Then Exclude affiliation Else
Sentence
CompressionAffiliation
Sentence Fusion
Templates and NLG
GRE
Include
1
Introduction
Exclude other modifiers including affiliation
Exclude any post-modification such as apposition or relative
clauses
,
Predictive Power
Successfully modelled variations in the initial references used by
different human summarizers for the same document set
1
Brazilian President Fernando Henrique Cardoso was re-elected in
the...
[hearer new and Brazil not in context]
2
Brazil’s economic woes dominated the political scene as President
Cardoso...
[hearer new and Brazil most salient country in context]
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Predictive Power
Successfully models variations in the initial reference to the same
person across summaries of different document sets
1
It appeared that Iraq’s President Saddam Hussein was determined
to solve his countries financial problems and territorial ambitions...
[hearer new for this document set and Iraq not in context]
2
...A United States aircraft battle group moved into the Arabian
Sea. Saddam Hussein warned the Iraqi populace that United States
might attack...
[hearer old for this document set]
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Predictive Power
For predicted hearer-old people, there was no postmodification in
any gold standard summary.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Reference Accuracy
Generation Decision
Prediction Accuracy
Discourse-new references
Include Name
.74 (rising to .92 when
unanimity among human
rizers)
Include Role & temporal mods .79
Include Affiliation
.79
Include Post-Modification
.72 (rising to 1.00 when
unanimity among human
rizers)
Discourse-old references
Include Only Surname
.70
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
there is
summa-
there is
summa-
,
Impact on Summaries
Preference of experimental participants for one summary-type over
the other (140 comparisons):
Extractive
Rewritten
No difference
More informative
46
23
71
More coherent
22
79
39
More preferred
37
69
34
Rewriting References:
Shortened Summaries by 11 words on average
Led to more coherent summaries (p¡0.01)
Led to more preferred summaries (p¡0.01)
Led to less informative summaries - but correlated with length of
summary rho=0.8; p<0.001).
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
References
Angrosh, M., T. Nomoto, & A. Siddharthan. 2014. Lexico-syntactic text simplification and compression with typed
dependencies. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics:
Technical Papers.
Barzilay, R., & K. McKeown. 2005. Sentence fusion for multidocument news summarization. Computational
Linguistics 31(3): 297–328.
Barzilay, R., K. R. McKeown, & M. Elhadad. 1999. Information fusion in the context of multi-document
summarization. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on
Computational Linguistics.
Conroy, J. M., & J. D. Schlesinger. 2004. Left-Brain/Right-Brain Multi-Document Summarization. 4th Document
Understanding Conference (DUC 2004) at HLT/NAACL 2004, Boston, MA.
Dras, M. 1999. Tree adjoining grammar and the reluctant paraphrasing of text. Ph.D. thesis, Macquarie University
NSW 2109 Australia.
Filippova, K., & M. Strube. 2008. Sentence fusion via dependency graph compression. Proceedings of the
Conference on Empirical Methods in Natural Language Processing .
Grefenstette, G. 1998. Producing Intelligent Telegraphic Text Reduction to Provide an Audio Scanning Service for
the Blind. Intelligent Text Summarization, AAAI Spring Symposium Series.
Jing, H., R. Barzilay, K. McKeown, & M. Elhadad. 1998. Summarization Evaluation Methods: Experiments and
Analysis. AAAI Symposium on Intelligent Summarization.
Knight, K., & D. Marcu. 2000. Statistics-Based Summarization — Step One: Sentence Compression. Proceeding
of The American Association for Artificial Intelligence Conference (AAAI-2000).
Lin, C.-Y. 2003. Improving Summarization Performance by Sentence Compression - A Pilot Study. In Proceedings
of the Sixth International Workshop on Information Retrieval with Asian Languages (IRAL 2003).
Marsi, E., & E. Krahmer. 2005. Explorations in sentence fusion. Proceedings of the European Workshop on
Natural Language Generation.
Nenkova, A., A. Siddharthan, & K. McKeown. 2005. Automatically learning cognitive status for multi-document
summarization of newswire. Proceedings of HLT/EMNLP 2005 .
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
,
Riezler, S., T. H. King, R. Crouch, & A. Zaenen. 2003. Statistical Sentence Condensation using Ambiguity Packing
and Stochastic Disambiguation Methods for Lexical-Functional Grammar. Proceedings of the Human Language
Technology Conference and the 3rd Meeting of the North American Chapter of the Association for
Computational Linguistics (HLT-NAACL’03).
Schilder, F., B. Howald, & R. Kondadadi. 2013. Gennext: A consolidated domain adaptable nlg system.
Proceedings of the 14th European Workshop on Natural Language Generation.
Siddharthan, A., A. Nenkova, & K. McKeown. 2004. Syntactic Simplification for Improving Content Selection in
Multi-Document Summarization. Proceedings of the 20th International Conference on Computational
Linguistics (COLING 2004).
Siddharthan, A., & M. Angrosh. 2014. Hybrid Text Simplification using Synchronous Dependency Grammars with
Hand-written and Automatically Harvested Rules. Proceedings of the 14th Conference of the European
Chapter of the Association for Computational Linguistics (EACL’14).
Siddharthan, A., & D. Evans. 2005. Columbia University at MSE2005. 2005 Multilingual Summarization
Evaluation Workshop, Ann Arbor, MI, June 29th 2005 .
Siddharthan, A., & K. McKeown. 2005. Improving Multilingual Summarization: Using Redundancy in the Input to
Correct MT errors. Proceedings of HLT/EMNLP 2005 .
Siddharthan, A., A. Nenkova, & K. McKeown. 2011. Information status distinctions and referring expressions: An
empirical study of references to people in news summaries. Computational Linguistics 37(4): 811–842.
Thadani, K., & K. McKeown. 2013. Supervised sentence fusion with single-stage inference. Proceedings of the
Sixth International Joint Conference on Natural Language Processing .
White, M., T. Korelsky, C. Cardie, V. Ng, D. Pierce, & K. Wagstaff. 2001. Multidocument summarization via
information extraction. Proceedings of the first international conference on Human language technology
research.
Introduction
Sentence Compression
Sentence Fusion
Templates and NLG
GRE
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement