Editing Multinomial distribution (section)

== Statistical inference ==
{{Expand section|date=March 2024|with=A new sub-section about simultaneous confidence intervals (with proper citations, e.g.: [https://www.stat.cmu.edu/technometrics/59-69/VOL-07-02/v0702247.pdf]).}}

===Equivalence tests for multinomial distributions===
The goal of equivalence testing is to establish the agreement between a theoretical multinomial distribution and  observed counting frequencies. The theoretical distribution may be a fully specified multinomial distribution or a parametric family of multinomial distributions.

Let <math>q</math> denote a theoretical multinomial distribution and let <math>p</math> be a true underlying distribution. The distributions <math>p</math> and <math>q</math> are considered equivalent if <math>d(p,q)<\varepsilon</math>  for a distance <math>d</math> and a tolerance parameter <math>\varepsilon>0</math>. The equivalence test problem is <math>H_0=\{d(p,q)\geq\varepsilon\}</math> versus  <math>H_1=\{d(p,q)<\varepsilon\}</math>. The true underlying distribution <math>p</math> is unknown. Instead, the counting frequencies <math>p_n</math> are observed, where <math>n</math> is a sample size. An equivalence test uses <math>p_n</math> to reject <math>H_0</math>. If <math>H_0</math> can be rejected then the equivalence between <math>p</math> and <math>q</math> is shown at a given significance level. The equivalence test for Euclidean distance can be found in text book of Wellek (2010).<ref>{{Cite book|title=Testing statistical hypotheses of equivalence and noninferiority|last=Wellek|first=Stefan|publisher=Chapman and Hall/CRC|year=2010|isbn=978-1439808184}}</ref> The equivalence test for the total variation distance is developed in Ostrovski (2017).<ref>{{cite journal|last1=Ostrovski|first1=Vladimir|date=May 2017|title=Testing equivalence of multinomial distributions|journal=Statistics & Probability Letters|volume=124|pages=77–82|doi=10.1016/j.spl.2017.01.004|s2cid=126293429}}[http://dx.doi.org/10.1016/j.spl.2017.01.004 Official web link (subscription required)]. [https://www.researchgate.net/publication/312481284_Testing_equivalence_of_multinomial_distributions Alternate, free web link].</ref> The exact equivalence test for the specific cumulative distance is proposed in Frey (2009).<ref>{{cite journal|last1=Frey|first1=Jesse|date=March 2009|title=An exact multinomial test for equivalence|journal=The Canadian Journal of Statistics|volume=37|pages=47–59|doi=10.1002/cjs.10000|s2cid=122486567 }}[http://www.jstor.org/stable/25653460 Official web link (subscription required)].</ref>

The distance between the true underlying distribution <math>p</math> and a family of the multinomial distributions <math>\mathcal{M}</math> is defined by <math>d(p, \mathcal{M})=\min_{h\in\mathcal{M}}d(p,h)  </math>. Then the equivalence test problem is given by <math>H_0=\{d(p,\mathcal{M})\geq \varepsilon\}</math> and <math>H_1=\{d(p,\mathcal{M})< \varepsilon\}</math>. The distance <math>d(p,\mathcal{M})</math> is usually computed using numerical optimization. The tests for this case are developed recently in Ostrovski (2018).<ref>{{cite journal|last1=Ostrovski|first1=Vladimir|date=March 2018|title=Testing equivalence to families of multinomial distributions with application to the independence model|journal=Statistics & Probability Letters|volume=139|pages=61–66|doi=10.1016/j.spl.2018.03.014|s2cid=126261081}}[https://doi.org/10.1016/j.spl.2018.03.014 Official web link (subscription required)]. [https://www.researchgate.net/publication/324124605_Testing_equivalence_to_families_of_multinomial_distributions_with_application_to_the_independence_model Alternate, free web link].</ref>

=== Confidence intervals for the difference of two proportions ===

In the setting of a multinomial distribution, constructing confidence intervals for the difference between the proportions of observations from two events, <math>p_i-p_j</math>, requires the incorporation of the negative covariance between the sample estimators <math>\hat{p}_i  = \frac{X_i}{n} </math> and <math>\hat{p}_j  = \frac{X_j}{n}</math>.

Some of the literature on the subject focused on the use-case of matched-pairs binary data, which requires careful attention when translating the formulas to the general case of <math>p_i-p_j</math> for any multinomial distribution. Formulas in the current section will be generalized, while formulas in the next section will focus on the matched-pairs binary data use-case.

Wald's standard error (SE) of the difference of proportion can be estimated using:<ref>{{Cite book
 | last1 = Fleiss
 | first1 = Joseph L.
 | last2 = Levin
 | first2 = Bruce
 | last3 = Paik
 | first3 = Myunghee Cho
 | title = Statistical Methods for Rates and Proportions
 | edition = 3rd
 | publisher = J. Wiley
 | year = 2003
 | isbn = 9780471526292
 | location = Hoboken, N.J
 | pages = 760
}}</ref>{{rp|378}}<ref>{{Cite journal
 | last1 = Newcombe
 | first1 = R. G.
 | title = Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods
 | journal = Statistics in Medicine
 | year = 1998
 | volume = 17
 | issue = 8
 | pages = 873–890
 | doi = 10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO;2-I
| pmid = 9595617
 }}</ref>

<math>
\widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)} = \sqrt{\frac{(\hat{p}_i + \hat{p}_j) - (\hat{p}_i - \hat{p}_j)^2}{n}}
</math>

For a <math>100(1 - \alpha)\%</math> [[Confidence interval#Approximate confidence intervals|approximate confidence interval]], the [[margin of error]] may incorporate the appropriate quantile from the [[standard normal distribution]], as follows:

<math>(\hat{p}_i - \hat{p}_j) \pm z_{\alpha/2} \cdot \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}</math>

{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}}
As the sample size (<math>n</math>) increases, the sample proportions will approximately follow a [[multivariate normal distribution]], thanks to the [[Central limit theorem#Multidimensional CLT|multidimensional central limit theorem]] (and it could also be shown using the [[Cramér–Wold theorem]]). Therefore, their difference will also be approximately normal. Also, these estimators are [[Consistent estimator|weakly consistent]] and plugging them into the SE estimator makes it also weakly consistent. Hence, thanks to [[Slutsky's theorem]], the [[pivotal quantity]]  <math>\frac{(\hat{p}_i - \hat{p}_j) - (p_i - p_j)}{\widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}}</math>  approximately follows the [[standard normal distribution]]. And from that, the above [[Confidence interval#Approximate confidence intervals|approximate confidence interval]] is directly derived.

The SE can be constructed using the calculus of [[Variance#Addition and multiplication by a constant|the variance of the difference of two random variables]]:
<math>
\begin{align}
\widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)} & = \sqrt{\frac{\hat{p}_i (1 - \hat{p}_i)}{n} + \frac{\hat{p}_j (1 - \hat{p}_j)}{n} - 2\left(-\frac{\hat{p}_i \hat{p}_j}{n}\right)} \\
& = \sqrt{\frac{1}{n} \left(\hat{p}_i + \hat{p}_j - \hat{p}_i^2 - \hat{p}_j^2 + 2\hat{p}_i \hat{p}_j\right)} \\
& = \sqrt{\frac{(\hat{p}_i + \hat{p}_j) - (\hat{p}_i - \hat{p}_j)^2}{n}}
\end{align}
</math>

{{hidden end}}

A modification which includes a [[continuity correction]] adds <math>\frac{1}{n}</math> to the margin of error as follows:<ref name=pass_sample_size_software>{{Cite web|url=https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Confidence_Intervals_for_the_Difference_Between_Two_Correlated_Proportions.pdf|title=Confidence Intervals for the Difference Between Two Correlated Proportions|publisher=NCSS|access-date=2022-03-22}}</ref>{{rp|102–3}}

<math>(\hat{p}_i - \hat{p}_j) \pm \left(z_{\alpha/2} \cdot \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)} + \frac{1}{n}\right)</math>

Another alternative is to rely on a Bayesian estimator using [[Jeffreys prior]] which leads to using a [[dirichlet distribution]], with all parameters being equal to 0.5, as a prior. The posterior will be the calculations from above, but after adding 1/2 to each of the ''k'' elements, leading to an overall increase of the sample size by <math>\frac{k}{2}</math>. This was originally developed for a multinomial distribution with four events, and is known as ''wald+2'', for analyzing matched pairs data (see the next section for more details).<ref name=Agresti2005>{{Cite journal
 | last1 = Agresti
 | first1 = Alan
 | last2 = Min
 | first2 = Yongyi
 | title = Simple improved confidence intervals for comparing matched proportions
 | journal = Statistics in Medicine
 | year = 2005
 | volume = 24
 | issue = 5
 | pages = 729–740
 | doi = 10.1002/sim.1781
 | pmid = 15696504
 | url = https://users.stat.ufl.edu/~aa/articles/agresti_min_2005b.pdf
}}</ref>

This leads to the following SE:

<math>
\widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}_{wald+\frac{k}{2}} = 
\sqrt{\frac{\left(\hat{p}_i + \hat{p}_j + \frac{1}{n}\right)\frac{n}{n+\frac{k}{2}} - 
\left(\hat{p}_i - \hat{p}_j\right)^2 \left(\frac{n}{n+\frac{k}{2}}\right)^2 }{n+\frac{k}{2}}}

</math>

{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}}
<math>
\begin{align}
\widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}_{wald+\frac{k}{2}} & = \sqrt{\frac{\left(\frac{x_i+1/2}{n+\frac{k}{2}} + \frac{x_j+1/2}{n+\frac{k}{2}}\right) - \left(\frac{x_i+1/2}{n+\frac{k}{2}} - \frac{x_j+1/2}{n+\frac{k}{2}}\right)^2}{n+\frac{k}{2}}} \\
 & = 
\sqrt{\frac{\left(\frac{x_i}{n} + \frac{x_j}{n} + \frac{1}{n}\right)\frac{n}{n+\frac{k}{2}} - \left(\frac{x_i}{n} - \frac{x_j}{n}\right)^2 \left(\frac{n}{n+\frac{k}{2}}\right)^2 }{n+\frac{k}{2}}} \\
& = \sqrt{\frac{\left(\hat{p}_i + \hat{p}_j + \frac{1}{n}\right)\frac{n}{n+\frac{k}{2}} - \left(\hat{p}_i - \hat{p}_j\right)^2 \left(\frac{n}{n+\frac{k}{2}}\right)^2 }{n+\frac{k}{2}}} 
\end{align}
</math>
{{hidden end}}

Which can just be plugged into the original Wald formula as follows:

<math>(p_i - p_j)\frac{n}{n+\frac{k}{2}} \pm z_{\alpha/2} \cdot \widehat{\operatorname{SE}(\hat{p}_i - \hat{p}_j)}_{wald+\frac{k}{2}}</math>