Editing Automatic summarization (section)

==Approaches==

There are two general approaches to automatic summarization: [[Information extraction|extraction]] and [[abstract (summary)|abstraction]].

===Extraction-based summarization===

Here, content is extracted from the original data, but the extracted content is not modified in any way. Examples of extracted content include key-phrases that can be used to "tag" or index a text document, or key sentences (including headings) that collectively comprise an abstract, and representative images or video segments, as stated above. For text, extraction is analogous to the process of skimming, where the summary (if available), headings and subheadings, figures, the first and last paragraphs of a section, and optionally the first and last sentences in a paragraph are read before one chooses to read the entire document in detail.<ref>Richard Sutz, Peter Weverka. How to skim text. https://www.dummies.com/education/language-arts/speed-reading/how-to-skim-text/ Accessed Dec 2019.</ref> Other examples of extraction that include key sequences of text in terms of clinical relevance (including patient/problem, intervention, and outcome).<ref name="Afzal_et_al"/>

===Abstractive-based summarization===

Abstractive summarization methods generate new text that did not exist in the original text.<ref>{{Cite book |last=Zhai |first=ChengXiang |url=https://www.worldcat.org/oclc/957355971 |title=Text data management and analysis : a practical introduction to information retrieval and text mining |date=2016 |others=Sean Massung |isbn=978-1-970001-19-8 |page=321 |location=[New York, NY] |oclc=957355971}}</ref> This has been applied mainly for text. Abstractive methods build an internal semantic representation of the original content (often called a language model), and then use this representation to create a summary that is closer to what a human might express. Abstraction may transform the extracted content by [[automated paraphrasing|paraphrasing]] sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both [[natural language processing]] and often a deep understanding of the domain of the original text in cases where the original document relates to a special field of knowledge. "Paraphrasing" is even more difficult to apply to images and videos, which is why most summarization systems are extractive.

===Aided summarization===

Approaches aimed at higher summarization quality rely on combined software and human effort. In Machine Aided Human Summarization, extractive techniques highlight candidate passages for inclusion (to which the human adds or removes text). In Human Aided Machine Summarization, a human post-processes software output, in the same way that one edits the output of automatic translation by Google Translate.