Editing Automatic summarization (section)

=== Inter-textual versus intra-textual ===
Intra-textual evaluation assess the output of a specific summarization system, while inter-textual evaluation focuses on contrastive analysis of outputs of several summarization systems.

Human judgement often varies greatly in what it considers a "good" summary, so creating an automatic evaluation process is particularly difficult. Manual evaluation can be used, but this is both time and labor-intensive, as it requires humans to read not only the summaries but also the source documents. Other issues are those concerning [[coherence (linguistics)|coherence]] and coverage.

The most common way to evaluate summaries is [[ROUGE (metric)|ROUGE]] (Recall-Oriented Understudy for Gisting Evaluation). It is very common for summarization and translation systems in [[NIST]]'s Document Understanding Conferences.[https://web.archive.org/web/20060408135021/http://haydn.isi.edu/ROUGE/] ROUGE is a recall-based measure of how well a summary covers the content of human-generated summaries known as references. It calculates [[n-gram]] overlaps between automatically generated summaries and previously written human summaries. It is recall-based to encourage inclusion of all important topics in summaries. Recall can be computed with respect to unigram, bigram, trigram, or 4-gram matching. For example, ROUGE-1 is the fraction of unigrams that appear in both the reference summary and the automatic summary out of all unigrams in the reference summary. If there are multiple reference summaries, their scores are averaged. A high level of overlap should indicate a high degree of shared concepts between the two summaries.

ROUGE cannot determine if the result is coherent, that is if sentences flow together in a sensibly. High-order n-gram ROUGE measures help to some degree.

Another unsolved problem is [[anaphora (linguistics)|Anaphor resolution]]. Similarly, for image summarization, Tschiatschek et al., developed a Visual-ROUGE score which judges the performance of algorithms for image summarization.<ref>Sebastian Tschiatschek, Rishabh Iyer, Hoachen Wei and Jeff Bilmes, [http://papers.nips.cc/paper/5415-learning-mixtures-of-submodular-functions-for-image-collection-summarization.pdf Learning Mixtures of Submodular Functions for Image Collection Summarization], In Advances of Neural Information Processing Systems (NIPS), Montreal, Canada, December - 2014. (PDF)</ref>