Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Automatic summarization
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====TextRank and LexRank==== The unsupervised approach to summarization is also quite similar in spirit to unsupervised keyphrase extraction and gets around the issue of costly training data. Some unsupervised summarization approaches are based on finding a "[[centroid]]" sentence, which is the mean word vector of all the sentences in the document. Then the sentences can be ranked with regard to their similarity to this centroid sentence. A more principled way to estimate sentence importance is using random walks and eigenvector centrality. LexRank<ref>Gรผneล Erkan and Dragomir R. Radev: ''LexRank: Graph-based Lexical Centrality as Salience in Text Summarization [https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html]''</ref> is an algorithm essentially identical to TextRank, and both use this approach for document summarization. The two methods were developed by different groups at the same time, and LexRank simply focused on summarization, but could just as easily be used for keyphrase extraction or any other NLP ranking task. In both LexRank and TextRank, a graph is constructed by creating a vertex for each sentence in the document. The edges between sentences are based on some form of semantic similarity or content overlap. While LexRank uses [[cosine similarity]] of [[TF-IDF]] vectors, TextRank uses a very similar measure based on the number of words two sentences have in common ([[Quantile normalization|normalized]] by the sentences' lengths). The LexRank paper explored using unweighted edges after applying a threshold to the cosine values, but also experimented with using edges with weights equal to the similarity score. TextRank uses continuous [[similarity score]]s as weights. In both algorithms, the sentences are ranked by applying PageRank to the resulting graph. A summary is formed by combining the top ranking sentences, using a threshold or length cutoff to limit the size of the summary. It is worth noting that TextRank was applied to summarization exactly as described here, while LexRank was used as part of a larger summarization system ([[MEAD]]) that combines the LexRank score (stationary probability) with other features like sentence position and length using a [[linear combination]] with either user-specified or automatically tuned weights. In this case, some training documents might be needed, though the TextRank results show the additional features are not absolutely necessary. Unlike TextRank, LexRank has been applied to multi-document summarization.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)