Editing Sufficient statistic (section)

==Examples==
===Bernoulli distribution===

If ''X''<sub>1</sub>,&nbsp;....,&nbsp;''X''<sub>''n''</sub> are independent [[Bernoulli trial|Bernoulli-distributed]] random variables with expected value ''p'', then the sum ''T''(''X'') =&nbsp;''X''<sub>1</sub>&nbsp;+&nbsp;...&nbsp;+&nbsp;''X''<sub>''n''</sub> is a sufficient statistic for ''p'' (here 'success' corresponds to ''X''<sub>''i''</sub>&nbsp;=&nbsp;1 and 'failure' to  ''X''<sub>''i''</sub>&nbsp;=&nbsp;0; so ''T'' is the total number of successes)

This is seen by considering the joint probability distribution:

:<math> \Pr\{X=x\}=\Pr\{X_1=x_1,X_2=x_2,\ldots,X_n=x_n\}.</math>

Because the observations are independent, this can be written as

:<math>
p^{x_1}(1-p)^{1-x_1} p^{x_2}(1-p)^{1-x_2}\cdots p^{x_n}(1-p)^{1-x_n} </math>

and, collecting powers of ''p'' and 1&nbsp;−&nbsp;''p'',  gives

:<math>
p^{\sum x_i}(1-p)^{n-\sum x_i}=p^{T(x)}(1-p)^{n-T(x)}
</math>

which satisfies the factorization criterion, with ''h''(''x'')&nbsp;=&nbsp;1 being just a constant.

Note the crucial feature: the unknown parameter ''p'' interacts with the data ''x'' only via the statistic ''T''(''x'') =&nbsp;Σ&nbsp;''x''<sub>''i''</sub>.

As a concrete application, this gives a procedure for distinguishing a [[Fair coin#Fair results from a biased coin|fair coin from a biased coin]].

===Uniform distribution===
{{see also|German tank problem}}
If ''X''<sub>1</sub>, ...., ''X''<sub>''n''</sub> are independent and [[uniform distribution (continuous)|uniformly distributed]] on the interval [0,''θ''], then ''T''(''X'') = max(''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub>) is sufficient for θ — the [[sample maximum]] is a sufficient statistic for the population maximum.

To see this, consider the joint [[probability density function]] of ''X''&nbsp;&nbsp;(''X''<sub>1</sub>,...,''X''<sub>''n''</sub>). Because the observations are independent, the pdf can be written as a product of individual densities

:<math>\begin{align}
f_{\theta}(x_1,\ldots,x_n)
  &= \frac{1}{\theta}\mathbf{1}_{\{0\leq x_1\leq\theta\}} \cdots
     \frac{1}{\theta}\mathbf{1}_{\{0\leq x_n\leq\theta\}} \\[5pt]
  &= \frac{1}{\theta^n} \mathbf{1}_{\{0\leq\min\{x_i\}\}}\mathbf{1}_{\{\max\{x_i\}\leq\theta\}}
\end{align}</math>

where '''1'''<sub>{''...''}</sub> is the [[indicator function]].  Thus the density takes form required by the Fisher–Neyman factorization theorem, where ''h''(''x'')&nbsp;=&nbsp;'''1'''<sub>{min{''x<sub>i</sub>''}≥0}</sub>, and the rest of the expression is a function of only ''θ'' and ''T''(''x'')&nbsp;=&nbsp;max{''x<sub>i</sub>''}.

In fact, the [[minimum-variance unbiased estimator]] (MVUE) for ''θ'' is

:<math> \frac{n+1}{n}T(X). </math>

This is the sample maximum, scaled to correct for the [[bias of an estimator|bias]], and is MVUE by the [[Lehmann–Scheffé theorem]]. Unscaled sample maximum ''T''(''X'') is the [[maximum likelihood estimator]] for ''θ''.

===Uniform distribution (with two parameters)===

If <math>X_1,...,X_n</math> are independent and [[Uniform distribution (continuous)|uniformly distributed]] on the interval <math>[\alpha, \beta]</math> (where <math>\alpha</math> and <math>\beta</math> are unknown parameters), then <math>T(X_1^n)=\left(\min_{1 \leq i \leq n}X_i,\max_{1 \leq i \leq n}X_i\right)</math> is a two-dimensional sufficient statistic for <math>(\alpha\, , \, \beta)</math>.

To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\ldots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e.

:<math>\begin{align}
f_{X_1^n}(x_1^n)
  &= \prod_{i=1}^n \left({1 \over \beta-\alpha}\right) \mathbf{1}_{ \{ \alpha \leq x_i \leq \beta \} }
  = \left({1 \over \beta-\alpha}\right)^n \mathbf{1}_{ \{ \alpha \leq x_i \leq \beta, \, \forall \, i = 1,\ldots,n\}} \\
  &= \left({1 \over \beta-\alpha}\right)^n \mathbf{1}_{ \{ \alpha \, \leq \, \min_{1 \leq i \leq n}X_i \} } \mathbf{1}_{ \{ \max_{1 \leq i \leq n}X_i \, \leq \, \beta \} }.
\end{align}</math>

The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting

:<math>\begin{align}
h(x_1^n)= 1, \quad
g_{(\alpha, \beta)}(x_1^n)= \left({1 \over \beta-\alpha}\right)^n \mathbf{1}_{ \{ \alpha \, \leq \, \min_{1 \leq i \leq n}X_i \} } \mathbf{1}_{ \{ \max_{1 \leq i \leq n}X_i \, \leq \, \beta \} }.
\end{align}</math>

Since <math>h(x_1^n)</math> does not depend on the parameter <math>(\alpha, \beta)</math> and <math>g_{(\alpha \, , \, \beta)}(x_1^n)</math> depends only on <math>x_1^n</math> through the function <math>T(X_1^n)= \left(\min_{1 \leq i \leq n}X_i,\max_{1 \leq i \leq n}X_i\right),</math>

the Fisher–Neyman factorization theorem implies <math>T(X_1^n) = \left(\min_{1 \leq i \leq n}X_i,\max_{1 \leq i \leq n}X_i\right)</math> is a sufficient statistic for <math>(\alpha\, , \, \beta)</math>.

===Poisson distribution===

If ''X''<sub>1</sub>,&nbsp;....,&nbsp;''X''<sub>''n''</sub> are independent and have a [[Poisson distribution]] with parameter ''λ'', then the sum ''T''(''X'') =&nbsp;''X''<sub>1</sub>&nbsp;+&nbsp;...&nbsp;+&nbsp;''X''<sub>''n''</sub> is a sufficient statistic for&nbsp;''λ''.

To see this, consider the joint probability distribution:

:<math>
\Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).
</math>

Because the observations are independent, this can be written as

:<math>
{e^{-\lambda} \lambda^{x_1} \over x_1 !} \cdot
{e^{-\lambda} \lambda^{x_2} \over x_2 !} \cdots
{e^{-\lambda} \lambda^{x_n} \over x_n !}
</math>

which may be written as

:<math>
e^{-n\lambda} \lambda^{(x_1+x_2+\cdots+x_n)} \cdot
{1 \over x_1 ! x_2 !\cdots x_n ! }
</math>

which shows that the factorization criterion is satisfied, where ''h''(''x'') is the reciprocal of the product of the factorials.  Note the parameter λ interacts with the data only through its sum ''T''(''X'').

===Normal distribution===

If <math>X_1,\ldots,X_n</math> are independent and [[Normal distribution|normally distributed]] with expected value <math>\theta</math> (a parameter) and known finite variance <math>\sigma^2,</math> then

:<math>T(X_1^n)=\overline{x}=\frac1n\sum_{i=1}^nX_i</math>

is a sufficient statistic for <math>\theta.</math>

To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\dots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e.

:<math>\begin{align}
f_{X_1^n}(x_1^n)
  & = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left (-\frac{(x_i-\theta)^2}{2\sigma^2} \right ) \\ [6pt]
   &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left ( -\sum_{i=1}^n \frac{(x_i-\theta)^2}{2\sigma^2} \right ) \\ [6pt]
  & = (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left (-\sum_{i=1}^n \frac{ \left ( \left (x_i-\overline{x} \right ) - \left (\theta-\overline{x} \right ) \right )^2}{2\sigma^2} \right ) \\ [6pt]
  & = (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \left(\sum_{i=1}^n(x_i-\overline{x})^2 + \sum_{i=1}^n(\theta-\overline{x})^2 -2\sum_{i=1}^n(x_i-\overline{x})(\theta-\overline{x})\right) \right) \\ [6pt] 
  &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \left (\sum_{i=1}^n(x_i-\overline{x})^2 + n(\theta-\overline{x})^2 \right ) \right ) && \sum_{i=1}^n(x_i-\overline{x})(\theta-\overline{x})=0 \\ [6pt]
  &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \sum_{i=1}^n (x_i-\overline{x})^2 \right ) \exp \left (-\frac{n}{2\sigma^2} (\theta-\overline{x})^2 \right )

\end{align}</math>

The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting

:<math>\begin{align}
h(x_1^n) &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \sum_{i=1}^n (x_i-\overline{x})^2 \right ) \\[6pt]
g_\theta(x_1^n) &=  \exp \left (-\frac{n}{2\sigma^2} (\theta-\overline{x})^2 \right )
\end{align}</math>

Since <math>h(x_1^n)</math> does not depend on the parameter <math>\theta</math> and <math>g_{\theta}(x_1^n)</math> depends only on <math>x_1^n</math> through the function

:<math>T(X_1^n)=\overline{x}=\frac1n\sum_{i=1}^nX_i,</math>

the Fisher–Neyman factorization theorem implies <math>T(X_1^n)</math> is a sufficient statistic for <math>\theta</math>.

If <math> \sigma^2 </math> is unknown and since <math>s^2  = \frac{1}{n-1} \sum_{i=1}^n \left(x_i - \overline{x} \right)^2 </math>,  the above likelihood can be rewritten as

:<math>\begin{align}
f_{X_1^n}(x_1^n)= (2\pi\sigma^2)^{-n/2} \exp \left( -\frac{n-1}{2\sigma^2}s^2 \right) \exp \left (-\frac{n}{2\sigma^2} (\theta-\overline{x})^2 \right ) .
\end{align}</math>

The Fisher–Neyman factorization theorem still holds and implies that <math>(\overline{x},s^2)</math> is a joint sufficient statistic for <math> ( \theta , \sigma^2) </math>.

===Exponential distribution===

If <math>X_1,\dots,X_n</math> are independent and [[Exponential distribution|exponentially distributed]] with expected value ''θ'' (an unknown real-valued positive parameter), then <math>T(X_1^n)=\sum_{i=1}^nX_i</math> is a sufficient statistic for θ.

To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\dots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e.

:<math>\begin{align}
f_{X_1^n}(x_1^n)
  &= \prod_{i=1}^n {1 \over \theta}  \, e^{ {-1 \over \theta}x_i }
   =               {1 \over \theta^n}\, e^{ {-1 \over \theta} \sum_{i=1}^nx_i }.
\end{align}</math>

The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting

:<math>\begin{align}
h(x_1^n)= 1,\,\,\,
g_{\theta}(x_1^n)= {1 \over \theta^n}\, e^{ {-1 \over \theta} \sum_{i=1}^nx_i }.
\end{align}</math>

Since <math>h(x_1^n)</math> does not depend on the parameter <math>\theta</math> and <math>g_{\theta}(x_1^n)</math> depends only on <math>x_1^n</math> through the function <math>T(X_1^n)=\sum_{i=1}^nX_i</math>

the Fisher–Neyman factorization theorem implies <math>T(X_1^n)=\sum_{i=1}^nX_i</math> is a sufficient statistic for <math>\theta</math>.

===Gamma distribution===

If <math>X_1,\dots,X_n</math> are independent and distributed as a [[Gamma distribution|<math>\Gamma(\alpha \, , \, \beta) </math>]], where <math>\alpha</math> and <math>\beta</math> are unknown parameters of a [[Gamma distribution]], then <math>T(X_1^n) = \left( \prod_{i=1}^n{X_i} , \sum_{i=1}^n X_i \right)</math> is a two-dimensional sufficient statistic for <math>(\alpha, \beta)</math>.

To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\dots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e.

:<math>\begin{align}
f_{X_1^n}(x_1^n)
  &= \prod_{i=1}^n \left({1 \over \Gamma(\alpha) \beta^\alpha}\right) x_i^{\alpha -1} e^{(-1/\beta)x_i} \\[5pt]
  &= \left({1 \over \Gamma(\alpha) \beta^\alpha}\right)^n \left(\prod_{i=1}^n x_i\right)^{\alpha-1} e^{{-1 \over \beta} \sum_{i=1}^n x_i}.
\end{align}</math>

The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting

:<math>\begin{align}
h(x_1^n)= 1,\,\,\,
g_{(\alpha \, , \, \beta)}(x_1^n)= \left({1 \over \Gamma(\alpha) \beta^{\alpha}}\right)^n \left(\prod_{i=1}^n x_i\right)^{\alpha-1} e^{{-1 \over \beta} \sum_{i=1}^n x_i}.
\end{align}</math>

Since <math>h(x_1^n)</math> does not depend on the parameter <math>(\alpha\, , \, \beta)</math> and <math>g_{(\alpha \, , \, \beta)}(x_1^n)</math> depends only on <math>x_1^n</math> through the function <math>T(x_1^n)= \left( \prod_{i=1}^n x_i, \sum_{i=1}^n x_i \right),</math>

the Fisher–Neyman factorization theorem implies <math>T(X_1^n)= \left( \prod_{i=1}^n X_i, \sum_{i=1}^n X_i \right)</math> is a sufficient statistic for <math>(\alpha\, , \, \beta).</math>