Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Chi-squared test
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Example chi-squared test for categorical data== Suppose there is a city of 1,000,000 residents with four neighborhoods: {{math|''A''}}, {{math|''B''}}, {{math|''C''}}, and {{math|''D''}}. A random sample of 650 residents of the city is taken and their occupation is recorded as [[Collar workers|"white collar", "blue collar", or "no collar"]]. The null hypothesis is that each person's neighborhood of residence is independent of the person's occupational classification. The data are tabulated as: :{| class="wikitable" style="text-align: right;" |- ! !! {{math|''A''}} !! {{math|''B''}} !! {{math|''C''}} !! {{math|''D''}} !! Total |- |style="text-align: left;"| White collar || 90 || 60 || 104 || 95 || 349 |- |style="text-align: left;"| Blue collar || 30 || 50 || 51 || 20 || 151 |- |style="text-align: left;"| No collar || 30 || 40 || 45 || 35 || 150 |- !style="text-align: left;"| Total || 150 || 150 || 200 || 150 || 650 |} Let us take the sample living in neighborhood {{math|''A''}}, 150, to estimate what proportion of the whole 1,000,000 live in neighborhood {{math|''A''}}. Similarly we take {{sfrac|349|650}} to estimate what proportion of the 1,000,000 are white-collar workers. By the assumption of independence under the hypothesis we should "expect" the number of white-collar workers in neighborhood {{math|''A''}} to be : <math> 150\times\frac{349}{650} \approx 80.54 </math> Then in that "cell" of the table, we have : <math>\frac{\left(\text{observed}-\text{expected}\right)^2}{\text{expected}} = \frac{\left(90-80.54\right)^2}{80.54} \approx 1.11</math> The sum of these quantities over all of the cells is the test statistic; in this case, <math> \approx 24.57 </math>. Under the null hypothesis, this sum has approximately a chi-squared distribution whose number of degrees of freedom is : <math> (\text{number of rows}-1)(\text{number of columns}-1) = (3-1)(4-1) = 6 </math> If the test statistic is improbably large according to that chi-squared distribution, then one rejects the null hypothesis of independence. A related issue is a test of homogeneity. Suppose that instead of giving every resident of each of the four neighborhoods an equal chance of inclusion in the sample, we decide in advance how many residents of each neighborhood to include. Then each resident has the same chance of being chosen as do all residents of the same neighborhood, but residents of different neighborhoods would have different probabilities of being chosen if the four sample sizes are not proportional to the populations of the four neighborhoods. In such a case, we would be testing "homogeneity" rather than "independence". The question is whether the proportions of blue-collar, white-collar, and no-collar workers in the four neighborhoods are the same. However, the test is done in the same way.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)