Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Fisher's exact test
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Purpose and scope== [[File:Nice Cup of Tea.jpg|thumb|A [[teapot]], a [[Creamer (vessel)|creamer]] and [[teacup]] full of tea with [[milk]]—can a taster tell if the milk went in first?]] The test is useful for [[categorical data]] that result from classifying objects in two different ways; it is used to examine the significance of the association (contingency) between the two kinds of classification. So in Fisher's original example, one criterion of classification could be whether milk or tea was put in the cup first; the other could be whether Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated—that is, whether Bristol really can tell whether milk or tea was poured in first. Most uses of the Fisher test involve, like this example, a 2 × 2 contingency table (discussed below). The [[p-value|''p''-value]] from the test is computed as if the margins of the table are fixed, i.e. as if, in the tea-tasting example, Bristol knows the number of cups with each treatment (milk or tea first) and will therefore provide guesses with the correct number in each category. As pointed out by Fisher, this leads under a null hypothesis of independence to a [[hypergeometric distribution]] of the numbers in the cells of the table. This setting is however rare in scientific practice and the test is conservative, when one or both margins are random variables themselves<ref name="campbell2007" /> With large samples, a [[chi-squared test]] (or better yet, a [[G-test]]) can be used in this situation. However, the significance value it provides is only an approximation, because the [[sampling distribution]] of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is poor when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the "expected values") being low. The usual rule for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one [[degrees of freedom (statistics)|degree of freedom]] (this rule is now known to be overly conservative<ref name="Larntz1978">{{Cite journal | doi = 10.2307/2286650 | last = Larntz | first = Kinley | year = 1978 | title = Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics | journal = Journal of the American Statistical Association | volume = 73 | issue = 362 | pages = 253–263 | jstor = 2286650 }}</ref>). In fact, for small, sparse, or unbalanced data, the exact and asymptotic ''p''-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest.<ref name="Mehta1984">{{Cite journal | last1 = Mehta | first1 = Cyrus R | last2 = Patel | first2 = Nitin R | last3 = Tsiatis | first3 = Anastasios A | year = 1984 | title = Exact significance testing to establish treatment equivalence with ordered categorical data | journal = Biometrics | volume = 40 | issue = 3 | pages = 819–825 | doi = 10.2307/2530927 | pmid = 6518249 | jstor = 2530927 }}</ref><ref name="Mehta1995">Mehta, C. R. 1995. SPSS 6.1 Exact test for Windows. Englewood Cliffs, NJ: Prentice Hall.</ref> In contrast the Fisher exact test is, as its name states, exact as long as the experimental procedure keeps the row and column totals fixed, and it can therefore be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate. For hand calculations, the test is feasible only in the case of a 2 × 2 contingency table. However the principle of the test can be extended to the general case of an ''m'' × ''n'' table,<ref>{{cite journal |author1=Mehta C.R. |author2=Patel N.R. | year = 1983 | title = A Network Algorithm for Performing Fisher's Exact Test in ''r ''X''c'' Contingency Tables | journal = Journal of the American Statistical Association | volume = 78 | issue = 382| pages = 427–434 | doi = 10.2307/2288652 |jstor=2288652 }}</ref><ref>[http://mathworld.wolfram.com/FishersExactTest.html mathworld.wolfram.com] Page giving the formula for the general form of Fisher's exact test for ''m'' × ''n'' contingency tables</ref> and some [[statistical packages]] provide a calculation (sometimes using a [[Monte Carlo method]] to obtain an approximation) for the more general case.<ref>{{cite journal|author1=Cyrus R. Mehta |author2=Nitin R. Patel | title= ALGORITHM 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables|journal= ACM Trans. Math. Softw. |volume=12| issue= 2 |year=1986| pages=154–161|doi=10.1145/6497.214326|s2cid=207666979 |doi-access=free}}</ref> The test can also be used to quantify the ''overlap'' between two sets. For example, in enrichment analyses in statistical genetics one set of genes may be annotated for a given phenotype and the user may be interested in testing the overlap of their own set with those. In this case a 2 × 2 contingency table may be generated and Fisher's exact test applied through identifying # Genes that are provided in both lists # Genes that are provided in the first list and not the second # Genes that are provided in the second list and not the first # Genes that are not provided in either list The test assumes genes in either list are taken from a broader set of genes (e.g. all remaining genes). A ''p''-value may then be calculated, summarizing the significance of the overlap between the two lists.<ref>{{cite journal |doi=10.1038/nprot.2013.092|title=Large-scale gene function analysis with the PANTHER classification system |year=2013 |last1=Mi |first1=Huaiyu |last2=Muruganujan |first2=Anushya |last3=Casagrande |first3=John T. |last4=Thomas |first4=Paul D. |journal=Nature Protocols |volume=8 |issue=8 |pages=1551–1566 |pmid=23868073 |pmc=6519453 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)