Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Yates's correction for continuity
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Statistical method}} In [[statistics]], '''Yates's correction for continuity''' (or '''Yates's chi-squared test''') is a [[statistical test]] commonly used when analyzing count data organized in a [[contingency table]], particularly when sample sizes are small. It is specifically designed for testing whether two [[Categorical variable|categorical variables]] are related or [[Independence (probability theory)|independent]] of each other. The correction modifies the standard [[chi-squared test]] to account for the fact that a continuous distribution ([[Chi-squared distribution|chi-squared]]) is used to approximate discrete data. Almost exclusively applied to 2Γ2 contingency tables, it involves subtracting 0.5 from the absolute difference between observed and expected frequencies before squaring the result. Unlike the standard [[Pearson's chi-squared test|Pearson chi-squared statistic]], Yates's correction is approximately [[Bias of an estimator|unbiased]] for small sample sizes. It is considered more conservative than the uncorrected chi-squared test, as it increases the [[p-value]] and thus reduces the likelihood of rejecting the [[null hypothesis]] when it is true. While widely taught in introductory [[statistics]] courses, modern [[Statistical computing|computational methods]] like [[Fisher's exact test]] may be preferred for analyzing small samples in 2Γ2 tables, with Yates's correction serving as a middle ground between uncorrected chi-squared tests and Fisher's exact test. The correction was first published by [[Frank Yates]] in 1934.<ref name="Yates" /> ==Correction for approximation error== Using the [[chi-squared distribution]] to interpret [[Pearson's chi-squared test|Pearson's chi-squared statistic]] requires one to assume that the [[Discrete probability distribution|discrete]] probability of observed [[binomial distribution|binomial frequencies]] in the table can be approximated by the continuous [[chi-squared distribution]]. This assumption is not quite correct, and introduces some error. To reduce the error in approximation, [[Frank Yates]], an [[England|English]] [[statistician]], suggested a correction for continuity that adjusts the formula for [[Pearson's chi-squared test]] by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table.<ref name=Yates>[[Frank Yates|Yates, F]] (1934). "Contingency table involving small numbers and the Ο<sup>2</sup> test". ''Supplement to the [[Journal of the Royal Statistical Society]]'' '''1'''(2): 217–235. {{JSTOR|2983604}}</ref> This reduces the chi-squared value obtained and thus increases its [[p-value]]. The effect of Yates's correction is to prevent overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected count smaller than 5. :<math> \sum_{i=1}^N O_i = 20 \, </math> The following is Yates's corrected version of [[Pearson's chi-squared test|Pearson's chi-squared statistics]]: :<math> \chi_\text{Yates}^2 = \sum_{i=1}^{N} {(|O_i - E_i| - 0.5)^2 \over E_i}</math> where: :''O<sub>i</sub>'' = an observed frequency :''E<sub>i</sub>'' = an expected (theoretical) frequency, asserted by the null hypothesis :''N'' = number of distinct events == 2 × 2 table == As a short-cut, for a 2 Γ 2 table with the following entries: {| class="wikitable" ! !! S !! F !! |- ! A | ''a'' || ''b'' || a+b |- ! B | ''c'' || ''d'' || c+d |- ! | a+c || b+d || ''N'' |} : <math>\chi_\text{Yates}^2 = \frac{N(|ad - bc| - N/2)^2}{(a+b) (c+d) (a+c) (b+d)}.</math> In some cases, this is better. : <math>\chi_\text{Yates}^2 = \frac{N( \max(0, |ad - bc| - N/2) )^2}{N_S N_F N_A N_B}.</math> Yates's correction should always be applied, as it will tend to improve the accuracy of the p-value obtained.{{Citation needed|date=February 2007}} However, in situations with large sample sizes, using the correction will have little effect on the value of the test statistic, and hence the p-value. == See also == * [[Continuity correction]] * [[Binomial proportion confidence interval#Wilson score interval with continuity correction|Wilson score interval with continuity correction]] ==References== {{reflist}} [[Category:Statistical hypothesis testing]] [[Category:Theory of probability distributions]] [[Category:Computational statistics]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation needed
(
edit
)
Template:JSTOR
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)