Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Multinomial distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Occurrence and applications == === Confidence intervals for the difference in matched-pairs binary data (using multinomial with ''k=4'') === For the case of matched-pairs binary data, a common task is to build the confidence interval of the difference of the proportion of the matched events. For example, we might have a test for some disease, and we may want to check the results of it for some population at two points in time (1 and 2), to check if there was a change in the proportion of the positives for the disease during that time. Such scenarios can be represented using a two-by-two [[contingency table]] with the number of elements that had each of the combination of events. We can use small ''f'' for sampling frequencies: <math>f_{11}, f_{10}, f_{01}, f_{00}</math>, and capital ''F'' for population frequencies: <math>F_{11}, F_{10}, F_{01}, F_{00}</math>. These four combinations could be modeled as coming from a multinomial distribution (with four potential outcomes). The sizes of the sample and population can be ''n'' and ''N'' respectively. And in such a case, there is an interest in building a confidence interval for the difference of proportions from the marginals of the following (sampled) contingency table: {| class="wikitable" style="text-align:center; margin:1em auto;" |- | || Test 2 positive || Test 2 negative || Row total |- | Test 1 positive || <math>f_{11}</math> || <math>f_{10}</math> || <math>f_{1*} = f_{11} + f_{10}</math> |- | Test 1 negative || <math>f_{01}</math> || <math>f_{00}</math> || <math>f_{0*} = f_{01} + f_{00}</math> |- | Column total || <math>f_{*1} = f_{11} + f_{01}</math> || <math>f_{*0} = f_{10} + f_{00}</math> || <math>n</math> |} In this case, checking the difference in marginal proportions means we are interested in using the following definitions: <math>p_{1*} = \frac{F_{1*}}{N} = \frac{F_{11} + F_{10}}{N}</math>, <math>p_{*1} = \frac{F_{*1}}{N} = \frac{F_{11} + F_{01}}{N}</math>. And the difference we want to build confidence intervals for is: <math>p_{*1} - p_{1*} = \frac{F_{11} + F_{01}}{N} - \frac{F_{11} + F_{10}}{N} = \frac{F_{01}}{N} - \frac{F_{10}}{N} = p_{01} - p_{10}</math> Hence, a confidence intervals for the marginal positive proportions (<math>p_{*1} - p_{1*}</math>) is the same as building a confidence interval for the difference of the proportions from the secondary diagonal of the two-by-two contingency table (<math>p_{01} - p_{10}</math>). Calculating a [[p-value]] for such a difference is known as [[McNemar's test]]. Building confidence interval around it can be constructed using methods described above for [[Multinomial distribution#Confidence intervals for the difference of two proportions|Confidence intervals for the difference of two proportions]]. The Wald confidence intervals from the previous section can be applied to this setting, and appears in the literature using alternative notations. Specifically, the SE often presented is based on the contingency table frequencies instead of the sample proportions. For example, the Wald confidence intervals, provided above, can be written as:<ref name=pass_sample_size_software />{{rp|102–3}} <math> \widehat{\operatorname{SE}(p_{*1} - p_{1*})} = \widehat{\operatorname{SE}(p_{01} - p_{10})} = \frac{\sqrt{n(f_{10} + f_{01}) - (f_{10} - f_{01})^2}}{n\sqrt{n}} </math> Further research in the literature has identified several shortcomings in both the Wald and the Wald with continuity correction methods, and other methods have been proposed for practical application.<ref name=pass_sample_size_software /> One such modification includes ''Agresti and Min’s Wald+2'' (similar to some of their other works<ref>{{Cite journal | last1 = Agresti | first1 = A. | last2 = Caffo | first2 = B. | title = Simple and effective confidence intervals for proportions and difference of proportions result from adding two successes and two failures | journal = The American Statistician | year = 2000 | volume = 54 | issue = 4 | pages = 280–288 | doi = 10.1080/00031305.2000.10474560 }}</ref>) in which each cell frequency had an extra <math>\frac{1}{2}</math> added to it.<ref name=Agresti2005/> This leads to the ''Wald+2'' confidence intervals. In a Bayesian interpretation, this is like building the estimators taking as prior a [[dirichlet distribution]] with all parameters being equal to 0.5 (which is, in fact, the [[Jeffreys prior]]). The ''+2'' in the name ''wald+2'' can now be taken to mean that in the context of a two-by-two contingency table, which is a multinomial distribution with four possible events, then since we add 1/2 an observation to each of them, then this translates to an overall addition of 2 observations (due to the prior). This leads to the following modified SE for the case of matched pairs data: <math> \widehat{\operatorname{SE}(p_{*1} - p_{1*})} = \frac{\sqrt{(n+2)(f_{10} + f_{01} + 1) - (f_{10} - f_{01})^2}}{(n+2)\sqrt{(n+2)}} </math> Which can just be plugged into the original Wald formula as follows: <math>(p_{*1} - p_{1*})\frac{n}{n+2} \pm z_{\alpha/2} \cdot \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}_{wald+2}</math> Other modifications include ''Bonett and Price’s Adjusted Wald'', and ''Newcombe’s Score''.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)