Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Chi-squared test
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == In the 19th century, statistical analytical methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a [[normal distribution]], such as [[Sir George Airy]] and [[Mansfield Merriman]], whose works were criticized by [[Karl Pearson]] in his 1900 paper.<ref name = Pearson1900> {{cite journal | last = Pearson | first = Karl | author-link = Karl Pearson | title = On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling | journal = Philosophical Magazine |series=Series 5 | volume = 50 | issue = 302 | year = 1900 | pages = 157β175 | url = https://www.tandfonline.com/doi/abs/10.1080/14786440009463897 | doi = 10.1080/14786440009463897 }}</ref> At the end of the 19th century, Pearson noticed the existence of significant [[skewness]] within some biological observations. In order to model the observations regardless of being normal or skewed, Pearson, in a series of articles published from 1893 to 1916,<ref name = Pearson1893> {{cite journal | last = Pearson | first = Karl | author-link = Karl Pearson | title = Contributions to the mathematical theory of evolution [abstract] | journal = Proceedings of the Royal Society | volume = 54 | year = 1893 | pages = 329β333 | jstor = 115538 | doi = 10.1098/rspl.1893.0079 | doi-access = free }} </ref><ref name = Pearson1895> {{cite journal | last = Pearson | first = Karl | author-link = Karl Pearson | title = Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material | journal = Philosophical Transactions of the Royal Society | volume = 186 | year = 1895 | pages = 343β414 | bibcode = 1895RSPTA.186..343P | jstor = 90649 | doi = 10.1098/rsta.1895.0010 | url = https://zenodo.org/record/1432104 | doi-access = free }} </ref><ref name = Pearson1901> {{cite journal | last = Pearson | first = Karl | author-link = Karl Pearson | title = Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation | journal = Philosophical Transactions of the Royal Society A | volume = 197 | issue = 287β299 | year = 1901 | pages = 443β459 | bibcode = 1901RSPTA.197..443P | jstor = 90841 | doi = 10.1098/rsta.1901.0023 | doi-access = }} </ref><ref name = Pearson1916> {{cite journal | last = Pearson | first = Karl | author-link = Karl Pearson | title = Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation | journal = Philosophical Transactions of the Royal Society A | volume = 216 | issue = 538β548 | year = 1916 | pages = 429β457 | bibcode = 1916RSPTA.216..429P | jstor = 91092 | doi = 10.1098/rsta.1916.0009 | doi-access = free }} </ref> devised the [[Pearson distribution]], a family of continuous [[probability distribution]]s, which includes the normal distribution and many skewed distributions, and proposed a method of statistical analysis consisting of using the Pearson distribution to model the observation and performing a test of goodness of fit to determine how well the model really fits to the observations. === Pearson's chi-squared test === {{See also|Pearson's chi-squared test}} In 1900, Pearson published a paper<ref name = Pearson1900 /> on the {{math|Ο<sup>2</sup>}} test which is considered to be one of the foundations of modern statistics.<ref name = Cochran1952> {{cite journal | last = Cochran | first = William G. | author-link = William G. Cochran | title = The Chi-square Test of Goodness of Fit | journal = The Annals of Mathematical Statistics | volume = 23 | issue = 3 | year = 1952 | pages = 315β345 | jstor = 2236678 | doi=10.1214/aoms/1177729380 | doi-access = free }} </ref> In this paper, Pearson investigated a test of goodness of fit. Suppose that {{mvar|n}} observations in a random sample from a population are classified into {{mvar|k}} mutually exclusive classes with respective observed numbers of observations {{mvar|x<sub>i</sub>}} (for {{math|''i'' {{=}} 1,2,β¦,''k''}}), and a null hypothesis gives the probability {{mvar|p<sub>i</sub>}} that an observation falls into the {{mvar|i}}th class. So we have the expected numbers {{math|''m<sub>i</sub>'' {{=}} ''np<sub>i</sub>''}} for all {{mvar|i}}, where :<math>\begin{align} & \sum^k_{i=1}{p_i} = 1 \\[8pt] & \sum^k_{i=1}{m_i} = n\sum^k_{i=1}{p_i} = n \end{align}</math> Pearson proposed that, under the circumstance of the null hypothesis being correct, as {{math|''n'' β β}} the limiting distribution of the quantity given below is the {{math|Ο<sup>2</sup>}} distribution. :<math>X^2=\sum^k_{i=1}{\frac{(x_i-m_i)^2}{m_i}}=\sum^k_{i=1}{\frac{x_i^2}{m_i}-n}</math> Pearson dealt first with the case in which the expected numbers {{mvar|m<sub>i</sub>}} are large enough known numbers in all cells assuming every observation {{mvar|x<sub>i</sub>}} may be taken as [[normal distribution|normally distributed]], and reached the result that, in the limit as {{mvar|n}} becomes large, {{math|''X''{{isup|2}}}} follows the {{math|Ο<sup>2</sup>}} distribution with {{math|''k'' β 1}} degrees of freedom. However, Pearson next considered the case in which the expected numbers depended on the parameters that had to be estimated from the sample, and suggested that, with the notation of {{mvar|m<sub>i</sub>}} being the true expected numbers and {{math|''m''β²<sub>''i''</sub>}} being the estimated expected numbers, the difference :<math>X^2-{X'}^2=\sum^k_{i=1}{\frac{x_i^2}{m_i}}-\sum^k_{i=1}{\frac{x_i^2}{m'_i}}</math> will usually be positive and small enough to be omitted. In a conclusion, Pearson argued that if we regarded {{math|''X''β²{{isup|2}}}} as also distributed as {{math|Ο<sup>2</sup>}} distribution with {{math|''k'' β 1}} degrees of freedom, the error in this approximation would not affect practical decisions. This conclusion caused some controversy in practical applications and was not settled for 20 years until Fisher's 1922 and 1924 papers.<ref name = Fisher1922> {{cite journal | last = Fisher | first = Ronald A. | author-link = Ronald A. Fisher | title = On the Interpretation of {{math|Ο<sup>2</sup>}} from Contingency Tables, and the Calculation of P | journal = Journal of the Royal Statistical Society | volume = 85 | issue = 1 | year = 1922 | pages = 87β94 | jstor = 2340521 | doi=10.2307/2340521 }} </ref><ref name = Fisher1924> {{cite journal | last = Fisher | first = Ronald A. | author-link = Ronald A. Fisher | title = The Conditions Under Which {{math|Ο<sup>2</sup>}} Measures the Discrepancey Between Observation and Hypothesis | journal = Journal of the Royal Statistical Society | volume = 87 | issue = 3 | year = 1924 | pages = 442β450 | jstor = 2341149 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)