Editing Automatic summarization (section)

====Multi-document summarization====
{{Main|Multi-document summarization}}
'''Multi-document summarization''' is an automatic procedure aimed at extraction of information from multiple texts written about the same topic. Resulting summary report allows individual users, such as professional information consumers, to quickly familiarize themselves with information contained in a large cluster of documents. In such a way, multi-document summarization systems are complementing the [[news aggregators]] performing the next step down the road of coping with [[information overload]]. Multi-document summarization may also be done in response to a question.<ref>"[https://www.academia.edu/2475776/Versatile_question_answering_systems_seeing_in_synthesis Versatile question answering systems: seeing in synthesis]", International Journal of Intelligent Information Database Systems, 5(2), 119-142, 2011.</ref><ref name="Afzal_et_al">Afzal M, Alam F, Malik KM, Malik GM, [https://www.jmir.org/2020/10/e19810/ Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation], J Med Internet Res 2020;22(10):e19810, DOI: 10.2196/19810, PMID 33095174</ref>

Multi-document summarization creates information reports that are both concise and comprehensive. With different opinions being put together and outlined, every topic is described from multiple perspectives within a single document. While the goal of a brief summary is to simplify information search and cut the time by pointing to the most relevant source documents, comprehensive multi-document summary should itself contain the required information, hence limiting the need for accessing original files to cases when refinement is required. Automatic summaries present information extracted from multiple sources algorithmically, without any editorial touch or subjective human intervention, thus making it completely unbiased. {{dubious|date=June 2018}}

=====Diversity=====
Multi-document extractive summarization faces a problem of redundancy. Ideally, we want to extract sentences that are both "central" (i.e., contain the main ideas) and "diverse" (i.e., they differ from one another). For example, in a set of news articles about some event, each article is likely to have many similar sentences. To address this issue, LexRank applies a heuristic post-processing step that adds sentences in rank order, but discards sentences that are too similar to ones already in the summary. This method is called Cross-Sentence Information Subsumption (CSIS). These methods work based on the idea that sentences "recommend" other similar sentences to the reader. Thus, if one sentence is very similar to many others, it will likely be a sentence of great importance. Its importance also stems from the importance of the sentences "recommending" it. Thus, to get ranked highly and placed in a summary, a sentence must be similar to many sentences that are in turn also similar to many other sentences. This makes intuitive sense and allows the algorithms to be applied to an arbitrary new text. The methods are domain-independent and easily portable. One could imagine the features indicating important sentences in the news domain might vary considerably from the biomedical domain. However, the unsupervised "recommendation"-based approach applies to any domain.

A related method is Maximal Marginal Relevance (MMR),<ref>Carbonell, Jaime, and Jade Goldstein. "[https://www.cs.cmu.edu/afs/.cs.cmu.edu/Web/People/jgc/publication/MMR_DiversityBased_Reranking_SIGIR_1998.pdf The use of MMR, diversity-based reranking for reordering documents and producing summaries]." Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1998.</ref> which uses a general-purpose graph-based ranking algorithm like Page/Lex/TextRank that handles both "centrality" and "diversity" in a unified mathematical framework based on [[absorbing Markov chain]] random walks (a random walk where certain states end the walk). The algorithm is called GRASSHOPPER.<ref>Zhu, Xiaojin, et al. "[http://www.aclweb.org/anthology/N07-1013 Improving Diversity in Ranking using Absorbing Random Walks]." HLT-NAACL. 2007.</ref> In addition to explicitly promoting diversity during the ranking process, GRASSHOPPER incorporates a prior ranking (based on sentence position in the case of summarization).

The state of the art results for multi-document summarization are obtained using mixtures of submodular functions. These methods have achieved the state of the art results for Document Summarization Corpora, DUC 04 - 07.<ref>Hui Lin, Jeff Bilmes. "[https://arxiv.org/abs/1210.4871 Learning mixtures of submodular shells with application to document summarization]</ref> Similar results were achieved with the use of determinantal point processes (which are a special case of submodular functions) for DUC-04.<ref>Alex Kulesza and Ben Taskar, [http://www.nowpublishers.com/article/DownloadSummary/MAL-044 Determinantal point processes for machine learning]. Foundations and Trends in Machine Learning, December 2012.</ref>

A new method for multi-lingual multi-document summarization that avoids redundancy generates ideograms to represent the meaning of each sentence in each document, then evaluates similarity by comparing ideogram shape and position. It does not use word frequency, training or preprocessing. It uses two user-supplied parameters: equivalence (when are two sentences to be considered equivalent?) and relevance (how long is the desired summary?).