Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Multinomial distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Confidence intervals for the difference of two proportions === In the setting of a multinomial distribution, constructing confidence intervals for the difference between the proportions of observations from two events, <math>p_i-p_j</math>, requires the incorporation of the negative covariance between the sample estimators <math>\hat{p}_i = \frac{X_i}{n} </math> and <math>\hat{p}_j = \frac{X_j}{n}</math>. Some of the literature on the subject focused on the use-case of matched-pairs binary data, which requires careful attention when translating the formulas to the general case of <math>p_i-p_j</math> for any multinomial distribution. Formulas in the current section will be generalized, while formulas in the next section will focus on the matched-pairs binary data use-case. Wald's standard error (SE) of the difference of proportion can be estimated using:<ref>{{Cite book | last1 = Fleiss | first1 = Joseph L. | last2 = Levin | first2 = Bruce | last3 = Paik | first3 = Myunghee Cho | title = Statistical Methods for Rates and Proportions | edition = 3rd | publisher = J. Wiley | year = 2003 | isbn = 9780471526292 | location = Hoboken, N.J | pages = 760 }}</ref>{{rp|378}}<ref>{{Cite journal | last1 = Newcombe | first1 = R. G. | title = Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods | journal = Statistics in Medicine | year = 1998 | volume = 17 | issue = 8 | pages = 873–890 | doi = 10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO;2-I | pmid = 9595617 }}</ref> <math> \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)} = \sqrt{\frac{(\hat{p}_i + \hat{p}_j) - (\hat{p}_i - \hat{p}_j)^2}{n}} </math> For a <math>100(1 - \alpha)\%</math> [[Confidence interval#Approximate confidence intervals|approximate confidence interval]], the [[margin of error]] may incorporate the appropriate quantile from the [[standard normal distribution]], as follows: <math>(\hat{p}_i - \hat{p}_j) \pm z_{\alpha/2} \cdot \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}</math> {{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}} As the sample size (<math>n</math>) increases, the sample proportions will approximately follow a [[multivariate normal distribution]], thanks to the [[Central limit theorem#Multidimensional CLT|multidimensional central limit theorem]] (and it could also be shown using the [[Cramér–Wold theorem]]). Therefore, their difference will also be approximately normal. Also, these estimators are [[Consistent estimator|weakly consistent]] and plugging them into the SE estimator makes it also weakly consistent. Hence, thanks to [[Slutsky's theorem]], the [[pivotal quantity]] <math>\frac{(\hat{p}_i - \hat{p}_j) - (p_i - p_j)}{\widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}}</math> approximately follows the [[standard normal distribution]]. And from that, the above [[Confidence interval#Approximate confidence intervals|approximate confidence interval]] is directly derived. The SE can be constructed using the calculus of [[Variance#Addition and multiplication by a constant|the variance of the difference of two random variables]]: <math> \begin{align} \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)} & = \sqrt{\frac{\hat{p}_i (1 - \hat{p}_i)}{n} + \frac{\hat{p}_j (1 - \hat{p}_j)}{n} - 2\left(-\frac{\hat{p}_i \hat{p}_j}{n}\right)} \\ & = \sqrt{\frac{1}{n} \left(\hat{p}_i + \hat{p}_j - \hat{p}_i^2 - \hat{p}_j^2 + 2\hat{p}_i \hat{p}_j\right)} \\ & = \sqrt{\frac{(\hat{p}_i + \hat{p}_j) - (\hat{p}_i - \hat{p}_j)^2}{n}} \end{align} </math> {{hidden end}} A modification which includes a [[continuity correction]] adds <math>\frac{1}{n}</math> to the margin of error as follows:<ref name=pass_sample_size_software>{{Cite web|url=https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Confidence_Intervals_for_the_Difference_Between_Two_Correlated_Proportions.pdf|title=Confidence Intervals for the Difference Between Two Correlated Proportions|publisher=NCSS|access-date=2022-03-22}}</ref>{{rp|102–3}} <math>(\hat{p}_i - \hat{p}_j) \pm \left(z_{\alpha/2} \cdot \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)} + \frac{1}{n}\right)</math> Another alternative is to rely on a Bayesian estimator using [[Jeffreys prior]] which leads to using a [[dirichlet distribution]], with all parameters being equal to 0.5, as a prior. The posterior will be the calculations from above, but after adding 1/2 to each of the ''k'' elements, leading to an overall increase of the sample size by <math>\frac{k}{2}</math>. This was originally developed for a multinomial distribution with four events, and is known as ''wald+2'', for analyzing matched pairs data (see the next section for more details).<ref name=Agresti2005>{{Cite journal | last1 = Agresti | first1 = Alan | last2 = Min | first2 = Yongyi | title = Simple improved confidence intervals for comparing matched proportions | journal = Statistics in Medicine | year = 2005 | volume = 24 | issue = 5 | pages = 729–740 | doi = 10.1002/sim.1781 | pmid = 15696504 | url = https://users.stat.ufl.edu/~aa/articles/agresti_min_2005b.pdf }}</ref> This leads to the following SE: <math> \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}_{wald+\frac{k}{2}} = \sqrt{\frac{\left(\hat{p}_i + \hat{p}_j + \frac{1}{n}\right)\frac{n}{n+\frac{k}{2}} - \left(\hat{p}_i - \hat{p}_j\right)^2 \left(\frac{n}{n+\frac{k}{2}}\right)^2 }{n+\frac{k}{2}}} </math> {{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}} <math> \begin{align} \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}_{wald+\frac{k}{2}} & = \sqrt{\frac{\left(\frac{x_i+1/2}{n+\frac{k}{2}} + \frac{x_j+1/2}{n+\frac{k}{2}}\right) - \left(\frac{x_i+1/2}{n+\frac{k}{2}} - \frac{x_j+1/2}{n+\frac{k}{2}}\right)^2}{n+\frac{k}{2}}} \\ & = \sqrt{\frac{\left(\frac{x_i}{n} + \frac{x_j}{n} + \frac{1}{n}\right)\frac{n}{n+\frac{k}{2}} - \left(\frac{x_i}{n} - \frac{x_j}{n}\right)^2 \left(\frac{n}{n+\frac{k}{2}}\right)^2 }{n+\frac{k}{2}}} \\ & = \sqrt{\frac{\left(\hat{p}_i + \hat{p}_j + \frac{1}{n}\right)\frac{n}{n+\frac{k}{2}} - \left(\hat{p}_i - \hat{p}_j\right)^2 \left(\frac{n}{n+\frac{k}{2}}\right)^2 }{n+\frac{k}{2}}} \end{align} </math> {{hidden end}} Which can just be plugged into the original Wald formula as follows: <math>(p_i - p_j)\frac{n}{n+\frac{k}{2}} \pm z_{\alpha/2} \cdot \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}_{wald+\frac{k}{2}}</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)