Text Summarization
22.11.2016 Matej Gallo
What?
An automatic summary is a text generated by a software, that is
coherent and contains a significant amount of relevant information
from the source text. Its compression rate τ is less than a third of the
length of the original document.
22.11.2016 Matej Gallo
• Produced from one or more documents
• Preserve important information
• Short
Why?
“too much information kills information”
• Professional summarizers
• Expensive
• Lacks expertise
• Reduce reading time
• Easier selection of documents
• Improves effectiveness of indexing
• Less biased
• Personalized summaries for QA systems
22.11.2016 Matej Gallo
Summary Categorization
• Extractive
• Abstractive
• Single-document
• Multi-document
• Indicative
• Informative
• Headline summarization
• Ultra-summarization
• Keyword summarization
• Generic
• Query-focused
• Update
22.11.2016 Matej Gallo
Summary Categorization
• Monolingual
• Multi-lingual
• Cross-lingual
• News
• Specialized
• Literary
• Encyclopedic…
• Author
• Expert
• Professional
• Multimedia
22.11.2016 Matej Gallo
Abstractive Summarization
• Understands the text, generate summary (NLG)
• Abstract
• Very difficult
• Compression
• Fusion
• Information Extraction
22.11.2016 Matej Gallo
Extractive Summarization
• Selects sentences from source document
• Extract
• Cohesion
• Coherence
• Unresolved co-references
• Discourse relations
22.11.2016 Matej Gallo
Extractive Summarization
• Intermediate representation
• Scoring sentences
• Selecting summary
22.11.2016 Matej Gallo
Intermediate Representation
• Topic representation
• VSM, lexical chains, LSA, Bayesian topic models
• Indicator representation
• sentence length, sentence location, proper nouns, numerical data…
• Graph representation
• directed forward (backward), undirected
22.11.2016 Matej Gallo
Scoring Methods
• Topic representation
• ability of a sentence to express topic
• Indicator representation
• machine learning
• Graph representation
• stochastic methods
Examples [http://www.sciencedirect.com/science/article/pii/S0957417413002601]
22.11.2016 Matej Gallo
Selecting a summary
• Length constraint
• best n approach
• Maximal marginal relevance
• Global selection
• Maximize importance, maximize coherence, minimize redundancy
22.11.2016 Matej Gallo
Evaluation
• Manual
• Semi-automatic
• ROUGE-n
• Automatic
• ROUGE-n
• Lexical level
• Abbreviations (BEwT-E, PYRAMID)
22.11.2016 Matej Gallo
ROUGE − n =
σn−grams ∈ Sumcan ∩ Sumref
σn−grams ∈ Sumref
Frequent Patterns
• Single-document
• Monolingual
• Graph representation
• Dynamic graph – mimicking reading
• DGRMiner
22.11.2016 Matej Gallo