Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Automatic summarization
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Inter-textual versus intra-textual === Intra-textual evaluation assess the output of a specific summarization system, while inter-textual evaluation focuses on contrastive analysis of outputs of several summarization systems. Human judgement often varies greatly in what it considers a "good" summary, so creating an automatic evaluation process is particularly difficult. Manual evaluation can be used, but this is both time and labor-intensive, as it requires humans to read not only the summaries but also the source documents. Other issues are those concerning [[coherence (linguistics)|coherence]] and coverage. The most common way to evaluate summaries is [[ROUGE (metric)|ROUGE]] (Recall-Oriented Understudy for Gisting Evaluation). It is very common for summarization and translation systems in [[NIST]]'s Document Understanding Conferences.[https://web.archive.org/web/20060408135021/http://haydn.isi.edu/ROUGE/] ROUGE is a recall-based measure of how well a summary covers the content of human-generated summaries known as references. It calculates [[n-gram]] overlaps between automatically generated summaries and previously written human summaries. It is recall-based to encourage inclusion of all important topics in summaries. Recall can be computed with respect to unigram, bigram, trigram, or 4-gram matching. For example, ROUGE-1 is the fraction of unigrams that appear in both the reference summary and the automatic summary out of all unigrams in the reference summary. If there are multiple reference summaries, their scores are averaged. A high level of overlap should indicate a high degree of shared concepts between the two summaries. ROUGE cannot determine if the result is coherent, that is if sentences flow together in a sensibly. High-order n-gram ROUGE measures help to some degree. Another unsolved problem is [[anaphora (linguistics)|Anaphor resolution]]. Similarly, for image summarization, Tschiatschek et al., developed a Visual-ROUGE score which judges the performance of algorithms for image summarization.<ref>Sebastian Tschiatschek, Rishabh Iyer, Hoachen Wei and Jeff Bilmes, [http://papers.nips.cc/paper/5415-learning-mixtures-of-submodular-functions-for-image-collection-summarization.pdf Learning Mixtures of Submodular Functions for Image Collection Summarization], In Advances of Neural Information Processing Systems (NIPS), Montreal, Canada, December - 2014. (PDF)</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)