Editing Automatic summarization (section)

===Document summarization===
Like keyphrase extraction, document summarization aims to identify the essence of a text. The only real difference is that now we are dealing with larger text units—whole sentences instead of words and phrases.

====Supervised learning approaches====
Supervised text summarization is very much like supervised keyphrase extraction. Basically, if you have a collection of documents and human-generated summaries for them, you can learn features of sentences that make them good candidates for inclusion in the summary. Features might include the position in the document (i.e., the first few sentences are probably important), the number of words in the sentence, etc. The main difficulty in supervised extractive summarization is that the known summaries must be manually created by extracting sentences so the sentences in an original training document can be labeled as "in summary" or "not in summary". This is not typically how people create summaries, so simply using journal abstracts or existing summaries is usually not sufficient. The sentences in these summaries do not necessarily match up with sentences in the original text, so it would be difficult to assign labels to examples for training. Note, however, that these natural summaries can still be used for evaluation purposes, since ROUGE-1 evaluation only considers unigrams.

====Maximum entropy-based summarization====
During the DUC 2001 and 2002 evaluation workshops, [[Netherlands Organisation for Applied Scientific Research|TNO]] developed a sentence extraction system for multi-document summarization in the news domain. The system was based on a hybrid system using a [[Naive Bayes classifier]] and statistical language models for modeling salience. Although the system exhibited good results, the researchers wanted to explore the effectiveness of a [[maximum entropy classifier|maximum entropy]] (ME) classifier for the meeting summarization task, as ME is known to be robust against feature dependencies. Maximum entropy has also been applied successfully for summarization in the broadcast news domain.

==== Adaptive summarization ====
A promising approach is adaptive document/text summarization.<ref>{{Cite journal |last1=Yatsko |first1=V. A. |last2=Starikov |first2=M. S. |last3=Butakov |first3=A. V. |year=2010 |title=Automatic genre recognition and adaptive text summarization |journal=Automatic Documentation and Mathematical Linguistics |volume=44 |issue=3 |pages=111–120 |doi=10.3103/S0005105510030027 |s2cid=1586931}}</ref> It involves first recognizing the text genre and then applying summarization algorithms optimized for this genre. Such software has been created.<ref>[http://yatsko.zohosites.com/universal-summarizer-unis.html UNIS (Universal Summarizer)]</ref>

====TextRank and LexRank====
The unsupervised approach to summarization is also quite similar in spirit to unsupervised keyphrase extraction and gets around the issue of costly training data. Some unsupervised summarization approaches are based on finding a "[[centroid]]" sentence, which is the mean word vector of all the sentences in the document. Then the sentences can be ranked with regard to their similarity to this centroid sentence.

A more principled way to estimate sentence importance is using random walks and eigenvector centrality. LexRank<ref>Güneş Erkan and Dragomir R. Radev: ''LexRank: Graph-based Lexical Centrality as Salience in Text Summarization [https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html]''</ref> is an algorithm essentially identical to TextRank, and both use this approach for document summarization. The two methods were developed by different groups at the same time, and LexRank simply focused on summarization, but could just as easily be used for keyphrase extraction or any other NLP ranking task.

In both LexRank and TextRank, a graph is constructed by creating a vertex for each sentence in the document.

The edges between sentences are based on some form of semantic similarity or content overlap. While LexRank uses [[cosine similarity]] of [[TF-IDF]] vectors, TextRank uses a very similar measure based on the number of words two sentences have in common ([[Quantile normalization|normalized]] by the sentences' lengths). The LexRank paper explored using unweighted edges after applying a threshold to the cosine values, but also experimented with using edges with weights equal to the similarity score. TextRank uses continuous [[similarity score]]s as weights.

In both algorithms, the sentences are ranked by applying PageRank to the resulting graph. A summary is formed by combining the top ranking sentences, using a threshold or length cutoff to limit the size of the summary.

It is worth noting that TextRank was applied to summarization exactly as described here, while LexRank was used as part of a larger summarization system ([[MEAD]]) that combines the LexRank score (stationary probability) with other features like sentence position and length using a [[linear combination]] with either user-specified or automatically tuned weights. In this case, some training documents might be needed, though the TextRank results show the additional features are not absolutely necessary.

Unlike TextRank, LexRank has been applied to multi-document summarization.

====Multi-document summarization====
{{Main|Multi-document summarization}}
'''Multi-document summarization''' is an automatic procedure aimed at extraction of information from multiple texts written about the same topic. Resulting summary report allows individual users, such as professional information consumers, to quickly familiarize themselves with information contained in a large cluster of documents. In such a way, multi-document summarization systems are complementing the [[news aggregators]] performing the next step down the road of coping with [[information overload]]. Multi-document summarization may also be done in response to a question.<ref>"[https://www.academia.edu/2475776/Versatile_question_answering_systems_seeing_in_synthesis Versatile question answering systems: seeing in synthesis]", International Journal of Intelligent Information Database Systems, 5(2), 119-142, 2011.</ref><ref name="Afzal_et_al">Afzal M, Alam F, Malik KM, Malik GM, [https://www.jmir.org/2020/10/e19810/ Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation], J Med Internet Res 2020;22(10):e19810, DOI: 10.2196/19810, PMID 33095174</ref>

Multi-document summarization creates information reports that are both concise and comprehensive. With different opinions being put together and outlined, every topic is described from multiple perspectives within a single document. While the goal of a brief summary is to simplify information search and cut the time by pointing to the most relevant source documents, comprehensive multi-document summary should itself contain the required information, hence limiting the need for accessing original files to cases when refinement is required. Automatic summaries present information extracted from multiple sources algorithmically, without any editorial touch or subjective human intervention, thus making it completely unbiased. {{dubious|date=June 2018}}

=====Diversity=====
Multi-document extractive summarization faces a problem of redundancy. Ideally, we want to extract sentences that are both "central" (i.e., contain the main ideas) and "diverse" (i.e., they differ from one another). For example, in a set of news articles about some event, each article is likely to have many similar sentences. To address this issue, LexRank applies a heuristic post-processing step that adds sentences in rank order, but discards sentences that are too similar to ones already in the summary. This method is called Cross-Sentence Information Subsumption (CSIS). These methods work based on the idea that sentences "recommend" other similar sentences to the reader. Thus, if one sentence is very similar to many others, it will likely be a sentence of great importance. Its importance also stems from the importance of the sentences "recommending" it. Thus, to get ranked highly and placed in a summary, a sentence must be similar to many sentences that are in turn also similar to many other sentences. This makes intuitive sense and allows the algorithms to be applied to an arbitrary new text. The methods are domain-independent and easily portable. One could imagine the features indicating important sentences in the news domain might vary considerably from the biomedical domain. However, the unsupervised "recommendation"-based approach applies to any domain.

A related method is Maximal Marginal Relevance (MMR),<ref>Carbonell, Jaime, and Jade Goldstein. "[https://www.cs.cmu.edu/afs/.cs.cmu.edu/Web/People/jgc/publication/MMR_DiversityBased_Reranking_SIGIR_1998.pdf The use of MMR, diversity-based reranking for reordering documents and producing summaries]." Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1998.</ref> which uses a general-purpose graph-based ranking algorithm like Page/Lex/TextRank that handles both "centrality" and "diversity" in a unified mathematical framework based on [[absorbing Markov chain]] random walks (a random walk where certain states end the walk). The algorithm is called GRASSHOPPER.<ref>Zhu, Xiaojin, et al. "[http://www.aclweb.org/anthology/N07-1013 Improving Diversity in Ranking using Absorbing Random Walks]." HLT-NAACL. 2007.</ref> In addition to explicitly promoting diversity during the ranking process, GRASSHOPPER incorporates a prior ranking (based on sentence position in the case of summarization).

The state of the art results for multi-document summarization are obtained using mixtures of submodular functions. These methods have achieved the state of the art results for Document Summarization Corpora, DUC 04 - 07.<ref>Hui Lin, Jeff Bilmes. "[https://arxiv.org/abs/1210.4871 Learning mixtures of submodular shells with application to document summarization]</ref> Similar results were achieved with the use of determinantal point processes (which are a special case of submodular functions) for DUC-04.<ref>Alex Kulesza and Ben Taskar, [http://www.nowpublishers.com/article/DownloadSummary/MAL-044 Determinantal point processes for machine learning]. Foundations and Trends in Machine Learning, December 2012.</ref>

A new method for multi-lingual multi-document summarization that avoids redundancy generates ideograms to represent the meaning of each sentence in each document, then evaluates similarity by comparing ideogram shape and position. It does not use word frequency, training or preprocessing. It uses two user-supplied parameters: equivalence (when are two sentences to be considered equivalent?) and relevance (how long is the desired summary?).