Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Contingency table
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Measures of association== The degree of association between the two variables can be assessed by a number of coefficients. The following subsections describe a few of them. For a more complete discussion of their uses, see the main articles linked under each subsection heading. ===Odds ratio=== {{main|Odds ratio}} The simplest measure of association for a 2 × 2 contingency table is the [[odds ratio]]. Given two events, A and B, the odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due to symmetry), the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the odds ratio is 1; if the odds ratio is greater than 1, the events are positively associated; if the odds ratio is less than 1, the events are negatively associated. The odds ratio has a simple expression in terms of probabilities; given the joint probability distribution: :<math> \begin{array}{c|cc} & B = 1 & B = 0 \\ \hline A = 1 & p_{11} & p_{10} \\ A = 0 & p_{01} & p_{00} \end{array} </math> the odds ratio is: :<math>OR = \frac{p_{11}p_{00}}{p_{10}p_{01}}.</math> ===Phi coefficient=== {{main|Phi coefficient}} A simple measure, applicable only to the case of 2 × 2 contingency tables, is the [[phi coefficient]] (φ) defined by : <math>\phi=\pm\sqrt{\frac{\chi^2}{N}},</math> where {{math|χ<sup>2</sup>}} is computed as in [[Pearson's chi-squared test]], and ''N'' is the grand total of observations. φ varies from 0 (corresponding to no association between the variables) to 1 or −1 (complete association or complete inverse association), provided it is based on frequency data represented in 2 × 2 tables. Then its sign equals the sign of the product of the [[main diagonal]] elements of the table minus the product of the off–diagonal elements. φ takes on the minimum value −1.0 or the maximum value of +1.0 [[if and only if]] every marginal proportion is equal to 0.5 (and two diagonal cells are empty).<ref>Ferguson, G. A. (1966). ''Statistical analysis in psychology and education''. New York: McGraw–Hill.</ref> ===Cramér's ''V'' and the contingency coefficient ''C''=== {{Main|Cramér's V}} Two alternatives are the ''contingency coefficient'' ''C'', and [[Cramér's V]]. The formulae for the ''C'' and ''V'' coefficients are: : <math>C=\sqrt{\frac{\chi^2}{N+\chi^2}}</math> and : <math>V=\sqrt{\frac{\chi^2}{N(k-1)}},</math> ''k'' being the number of rows or the number of columns, whichever is less. ''C'' suffers from the disadvantage that it does not reach a maximum of 1.0, notably the highest it can reach in a 2 × 2 table is 0.707 . It can reach values closer to 1.0 in contingency tables with more categories; for example, it can reach a maximum of 0.870 in a 4 × 4 table. It should, therefore, not be used to compare associations in different tables if they have different numbers of categories.<ref>Smith, S. C., & Albaum, G. S. (2004) ''Fundamentals of marketing research''. Sage: Thousand Oaks, CA. p. 631</ref> ''C'' can be adjusted so it reaches a maximum of 1.0 when there is complete association in a table of any number of rows and columns by dividing ''C'' by <math>\sqrt{\frac{k-1}{k}}</math> where ''k'' is the number of rows or columns, when the table is square {{citation needed|date=June 2020}}, or by <math>\sqrt[\scriptstyle 4]{{r - 1 \over r} \times {c - 1 \over c}}</math> where ''r'' is the number of rows and ''c'' is the number of columns.<ref>Blaikie, N. (2003) ''Analyzing Quantitative Data''. Sage: Thousand Oaks, CA. p. 100</ref> ===Tetrachoric correlation coefficient=== {{Main|Polychoric correlation}} Another choice is the [[polychoric correlation|tetrachoric correlation coefficient]] but it is only applicable to 2 × 2 tables. [[Polychoric correlation]] is an extension of the tetrachoric correlation to tables involving variables with more than two levels. Tetrachoric correlation assumes that the variable underlying each [[dichotomy|dichotomous]] measure is normally distributed.<ref>Ferguson.{{full citation needed|date=April 2019}}</ref> The coefficient provides "a convenient measure of [the Pearson product-moment] correlation when graduated measurements have been reduced to two categories."<ref>Ferguson, 1966, p. 244</ref> The tetrachoric correlation coefficient should not be confused with the [[Pearson correlation coefficient]] computed by assigning, say, values 0.0 and 1.0 to represent the two levels of each variable (which is mathematically equivalent to the φ coefficient). === Lambda coefficient === {{main|Goodman and Kruskal's lambda}} The [[Goodman and Kruskal's lambda|lambda coefficient]] is a measure of the strength of association of the cross tabulations when the variables are measured at the [[Level of measurement|nominal level]]. Values range from 0.0 (no association) to 1.0 (the maximum possible association). Asymmetric lambda measures the percentage improvement in predicting the dependent variable. Symmetric lambda measures the percentage improvement when prediction is done in both directions. === Uncertainty coefficient === {{Main|Uncertainty coefficient}} The [[uncertainty coefficient]], or Theil's U, is another measure for variables at the nominal level. Its values range from −1.0 (100% negative association, or perfect inversion) to +1.0 (100% positive association, or perfect agreement). A value of 0.0 indicates the absence of association. Also, the uncertainty coefficient is conditional and an asymmetrical measure of association, which can be expressed as :<math> U(X|Y) \neq U(Y|X) </math>. This asymmetrical property can lead to insights not as evident in symmetrical measures of association.<ref>{{Cite web|url=https://towardsdatascience.com/the-search-for-categorical-correlation-a1cf7f1888c9|title = The Search for Categorical Correlation|date = 26 December 2019}}</ref> === Others === {{Main|Goodman and Kruskal's gamma|Kendall rank correlation coefficient}} *[[Goodman and Kruskal's gamma|Gamma test]]: No adjustment for either table size or ties. *[[Kendall rank correlation coefficient|Kendall's tau]]: Adjustment for ties. **[[Kendall rank correlation coefficient#Tau-b|Tau-b]]: Used for square tables. **[[Kendall rank correlation coefficient#Tau-c|Tau-c]]: Used for rectangular tables.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)