Editing Readability (section)

== Readability formulas ==

=== Gray and Leary ===

Gray and Leary analyzed 228 variables that affect reading ease and divided them into four types: content, style, format, and organization. They found that content was most important, followed closely by style. Third was format, followed closely by organization. They found no way to measure content, format, or organization—but they could measure variables of style. Among the 17 significant measurable style variables, they selected five to create a formula:

* Average [[sentence (linguistics)|sentence length]]
* Number of different hard words
* Number of [[personal pronoun]]s
* Percentage of unique words
* Number of [[prepositional phrase]]s

=== Flesch formulas ===
{{Main|Flesch–Kincaid readability tests}}

The original formula is:

:Reading Ease score = 206.835 − (1.015 × ASL) − (84.6 × ASW)
::Where: ASL = average sentence length (number of words divided by number of sentences)
:: ASW = average word length in syllables (number of syllables divided by number of words)

The modified formula is:

:New reading ease score = 1.599nosw − 1.015sl − 31.517
::Where: nosw = number of one-syllable words per 100 words and
::sl = average sentence length in words.<ref name="Farr">Farr, J. N., J. J. Jenkins, and D. G. Paterson. 1951. "Simplification of the Flesch Reading Ease Formula." ''Journal of Applied Psychology.'' 35, no. 5:333–357.</ref>

=== Dale–Chall formula ===
{{Main|Dale–Chall readability formula}}
To apply the formula:

# Select several 100-word samples throughout the text.
# Compute the average sentence length in words (divide the number of words by the number of sentences).
# Compute the percentage of words NOT on the Dale–Chall word list of 3,000 easy words.
# Compute this equation from 1948:
#: Raw score = 0.1579*(PDW) + 0.0496*(ASL) if the percentage of PDW is less than 5%, otherwise compute
#: Raw score = 0.1579*(PDW) + 0.0496*(ASL) + 3.6365

Where:
:Raw score = uncorrected reading grade of a student who can answer one-half of the test questions on a passage.
:PDW = Percentage of difficult words not on the Dale–Chall word list.
:ASL = Average sentence length

Finally, to compensate for the "grade-equivalent curve", apply the following chart for the Final Score:
{{Aligned table
|class=wikitable
|row1header=y
|Raw score |           Final score
|4.9 and below |    Grade 4 and below
|5.0–5.9 |           Grades 5–6
|6.0–6.9 |            Grades 7–8
|7.0–7.9 |            Grades 9–10
|8.0–8.9 |            Grades 11–12
|9.0–9.9 |            Grades 13–15 (college)
|10 and above |      Grades  16 and above.}}<ref name="Dale-Chall">Dale, E. and J. S. Chall. 1948. '"A formula for predicting readability". ''Educational research bulletin'' January 21 and February 17, 27:1–20, 37–54.</ref>

The new Dale-Chall formula is:

:Raw score = 64 – 0.95 *(PDW) – 0.69 *(ASL)

=== Gunning fog formula ===
{{Main|Gunning fog index}}
The Gunning fog formula is one of the most reliable and simplest to apply:

:Grade level= 0.4 * ( (average sentence length) + (percentage of Hard Words) )

:Where: Hard Words = words with more than two syllables.<ref name="Gunning2">Gunning, R. 1952. ''The Technique of Clear Writing''. New York: McGraw–Hill.</ref>

=== Fry readability graph ===
{{Main|Fry readability formula}}

=== McLaughlin's SMOG formula ===
{{Main|SMOG}}
:SMOG grading = 3 + {{Sqrt|polysyllable count}}.

:Where: polysyllable count = number of words of more than two syllables in a sample of 30 sentences.<ref name="McLaughlin1969">McLaughlin, G. H. 1969. "SMOG grading-a new readability formula." ''Journal of reading'' 22:639–646.</ref>

=== FORCAST formula ===

The formula is:

:Grade level = 20 − (''N'' / 10)

:Where N = number of single-syllable words in a 150-word sample.<ref name="forcast">Caylor, J. S., T. G. Stitch, L. C. Fox, and J. P. Ford. 1973.  ''Methodologies for determining reading requirements of military occupational specialties: Technical report No. 73-5''. Alexander, VA: [[Human Resources Research Organization]].</ref>

=== Golub Syntactic Density Score ===
{{unsourced section|date=September 2024}}
The Golub Syntactic Density Score was developed by Lester Golub in 1974. It is among a smaller subset of readability formulas that concentrate on the syntactic features of a text. To calculate the reading level of a text, a sample of several hundred words is taken from the text. The number of words in the sample is counted, as are the number of T-units. A T-unit is defined as an independent clause and any dependent clauses attached to it. Other syntactical units are then counted and entered into the following table:

{| class="wikitable"
|-
| 1. || Words/T-unit || .95 || X _________ || ___
|-
| 2. || Subordinate clauses/T-unit || .90 || X _________ || ___
|-
| 3. || Main clause word length (mean) || .20 || X _________ || ___
|-
| 4. || Subordinate clause length (mean) || .50 || X _________ || ___
|-
| 5. || Number of Modals (will, shall, can, may, must, would...) || .65 || X _________ || ___
|-
| 6. || Number of ''Be'' and ''Have'' forms in the auxiliary || .40 || X _________ || ___
|-
| 7. || Number of Prepositional Phrases || .75 || X _________ || ___
|-
| 8. || Number of Possessive nouns and pronouns || .70 || X _________ || ___
|-
| 9. || Number of Adverbs of Time (when, then, once, while...) || .60 || X _________ || ___
|-
| 10. || Number of gerunds, participles, and absolutes Phrases || .85 || X _________ || ___
|}

Users add the numbers in the right hand column and divide the total by the number of T-units. Finally, the quotient is entered into the following table to arrive at a final readability score.

{| class="wikitable"
!SDS
|0.5
|1.3
|2.1
|2.9
|3.7
|4.5
|5.3
|6.1
|6.9
|7.7
|8.5
|9.3
|10.1
|10.9
|-
!Grade
|1
|2
|3
|4
|5
|6
|7
|8
|9
|10
|11
|12
|13
|14
|}

=== Lexico-semantic ===
The type-token ratio is one of the features that are often used to captures the lexical richness, which is a measure of vocabulary range and diversity. To measure the lexical difficulty of a word, the relative frequency of the word in a representative corpus like the [[Corpus of Contemporary American English]] (COCA) is often used. Below includes some examples for lexico-semantic features in readability assessment.<ref name="Computational assessment of text re" />
*Average number of syllables per word
*Out-of-vocabulary rate, in comparison to the full corpus
*Type-token ratio: the ratio of unique terms to total terms observed
*Ratio of function words, in comparison to the full corpus
*Ratio of pronouns, in comparison to the full corpus
*Language model perplexity (comparing the text to generic or genre-specific models)