Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Readability
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Artificial intelligence === Unlike the traditional readability formulas, [[artificial intelligence]] approaches to readability assessment (also known as Automatic Readability Assessment) incorporate myriad linguistic features and construct statistical prediction models to predict text readability.<ref name="Text Readability Assessment for Sec">{{cite journal |last1=Xia |first1=Menglin |last2=Kochmar |first2=Ekaterina |last3=Briscoe |first3=Ted |date=June 2016 |title=Text Readability Assessment for Second Language Learners |url=https://www.aclweb.org/anthology/W16-0502 |journal=Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications |pages=12–22 |arxiv=1906.07580 |doi=10.18653/v1/W16-0502 |doi-access=free}}</ref><ref name="aclweb.org">{{cite journal |last1=Lee |first1=Bruce W. |last2=Lee |first2=Jason |title=LXPER Index 2.0: Improving Text Readability Assessment Model for L2 English Students in Korea |journal=Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications |date=Dec 2020 |pages=20–24 |doi=10.18653/v1/2020.nlptea-1.3 |arxiv=2010.13374 |url=https://www.aclweb.org/anthology/2020.nlptea-1.3}}</ref> These approaches typically consist of three steps: 1. a training corpus of individual texts, 2. a set of linguistic features to be computed from each text, and 3. a [[machine learning]] model to predict the readability, using the computed linguistic feature values.<ref>{{cite journal |last1=Feng |first1=Lijun |last2=Jansche |first2=Martin |last3=Huernerfauth |first3=Matt |last4=Elhadad |first4=Noémie |title=A Comparison of Features for Automatic Readability Assessment |journal=Coling 2010: Posters |date=August 2010 |pages=276–284 |url=https://www.aclweb.org/anthology/C10-2032}}</ref><ref name="On Improving the Accuracy of Readab">{{cite journal |last1=Vajjala |first1=Sowmya |last2=Meurers |first2=Detmar |title=On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition |journal=Proceedings of the Seventh Workshop on Building Educational Applications Using NLP |date=June 2012 |pages=163–173 |url=https://www.aclweb.org/anthology/W12-2019}}</ref><ref name="aclweb.org" /> In 2008, it was shown that syntactic complexity is correlated with longer processing times in text comprehension.<ref>{{cite journal |last1=Gibson |first1=Edward |title=Linguistic complexity: locality of syntactic dependencies |journal=Cognition |date=1998 |volume=68 |issue=1 |pages=1–76|doi=10.1016/S0010-0277(98)00034-1 |pmid=9775516 |s2cid=377292 }}</ref> It is common to use a rich set of these syntactic features to predict the readability of a text. The more advanced variants of syntactic readability features are frequently computed from [[parse tree]]. Emily Pitler ([[University of Pennsylvania]]) and Ani Nenkova (University of Pennsylvania) are considered pioneers in evaluating the parse-tree syntactic features and making it widely used in readability assessment.<ref>{{cite journal |last1=Pitler |first1=Emily |last2=Nenkova |first2=Ani |title=Revisiting Readability: A Unified Framework for Predicting Text Quality |journal=Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing |date=October 2008 |pages=186–195 |url=https://www.aclweb.org/anthology/D08-1020}}</ref><ref name="Computational assessment of text re" /> Some examples include: *Average sentence length *Average parse tree height *Average number of noun phrases per sentence *Average number of verb phrases per sentence Lijun Feng proposed some cognitively-motivated features (mostly lexical) in 2009. This was during her [[doctorate]] study at the [[City University of New York]].<ref>{{cite book |last1=Feng |first1=Lijun |last2=Elhadad |first2=Noémie |last3=Huenerfauth |first3=Matt |chapter=Cognitively motivated features for readability assessment |title=Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on – EACL '09 |date=March 2009 |pages=229–237 |doi=10.3115/1609067.1609092 |s2cid=13888774 |chapter-url=https://dl.acm.org/doi/10.5555/1609067.1609092|doi-access=free }}</ref> The cognitively-motivated Although cognitively-motivated features were originally designed to ensure comprehension by adults with [[intellectual disability]], Feng showed such features also improved reading comprehension among the general population. In combination with a [[logistic regression]] model, cognitively-motivated features can correct the average error of [[Flesch–Kincaid readability tests|Flesch–Kincaid grade-level]] by more than 70%. The newly discovered features by Feng include: *Number of [[lexical chain]]s in document *Average number of unique entities per sentence *Average number of entity mentions per sentence *Total number of unique entities in document *Total number of entity mentions in document *Average lexical chain length *Average lexical chain span In 2012, Sowmya Vajjala at the [[University of Tübingen]] created the WeeBit corpus by combining educational articles from the [[Weekly Reader]] website and [[BBC Bitesize]] website, which provide texts for different age groups.<ref name="On Improving the Accuracy of Readab" /> In total, there are 3125 articles that are divided into five readability levels (from age 7 to 16). Weebit corpus has been used in several AI-based readability assessment research.<ref name="Computational assessment of text re">{{cite journal |last1=Collins-Thompson |first1=Kevyn |title=Computational assessment of text readability: A survey of current and future research |journal=International Journal of Applied Linguistics |date=2015 |volume=165 |issue=2 |pages=97–135|doi=10.1075/itl.165.2.01col |s2cid=17571866 }}</ref> Wei Xu ([[University of Pennsylvania]]), Chris Callison-Burch ([[University of Pennsylvania]]), and Courtney Napoles ([[Johns Hopkins University]]) introduced the [[Newsela]] corpus to the academic field in 2015.<ref>{{cite journal |last1=Xu |first1=Wei |last2=Callison-Burch |first2=Chris |last3=Napoles |first3=Courtney |title=Problems in Current Text Simplification Research: New Data Can Help |journal=Transactions of the Association for Computational Linguistics |date=2015 |volume=3 |pages=283–297|doi=10.1162/tacl_a_00139 |s2cid=17817489 |doi-access=free }}</ref> The corpus is a collection of thousands of news articles professionally leveled to different reading complexities by professional editors at [[Newsela]]. The corpus was originally introduced for [[text simplification]] research, but was also used for text readability assessment.<ref>{{cite journal |last1=Deutsch |first1=Tovly |last2=Jasbi |first2=Masoud |last3=Shieber |first3=Stuart |title=Linguistic Features for Readability Assessment |journal=Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications |date=July 2020 |pages=1–17 |doi=10.18653/v1/2020.bea-1.1 |arxiv=2006.00377 |url=https://www.aclweb.org/anthology/2020.bea-1.1|doi-access=free }}</ref> Advanced semantic or semantic features' influence on text readability was pioneered by Bruce W. Lee during his study at the ([[University of Pennsylvania]]), in 2021. Whilst introducing his features hybridization method, he also explored handcrafted advanced semantic features which aim to measure the amount of knowledge contained in a given text.<ref>{{cite book |last1=Lee |first1=Bruce W. |last2=Jang |first2=Yoo Sung |last3=Lee |first3=Jason Hyung-Jong |chapter=Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features |title=Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing |series=EMNLP '21 |date=November 2021 |pages=10669–10686 |doi=10.18653/v1/2021.emnlp-main.834 |s2cid=237940206 |chapter-url=https://aclanthology.org/2021.emnlp-main.834/|arxiv=2109.12258 }}</ref> *Semantic Richness : <math>\sum_{i=1}^{n} p_i \cdot i</math> *Semantic Clarity : <math>\frac{1}{n} \cdot \sum_{i=1}^{n} max(p) - p_{i}</math> *Semantic Noise : <math>n \cdot \frac{\sum_{i=1}^{n} (p_i - \bar{p})^4}{(\sum_{i=1}^{n} (p_i - \bar{p})^2)^2}</math> where the count of discovered topics (n) and topic probability (p)
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)