Boole's inequality

Template:Short description Template:More footnotes needed Template:Probability fundamentals

In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events. This inequality provides an upper bound on the probability of occurrence of at least one of a countable number of events in terms of the individual probabilities of the events. Boole's inequality is named for its discoverer, George Boole.<ref>Template:Cite book</ref>

Formally, for a countable set of events A₁, A₂, A₃, ..., we have

<math>{\mathbb P}\left(\bigcup_{i=1}^{\infty} A_i \right) \le \sum_{i=1}^{\infty} {\mathbb P}(A_i).</math>

In measure-theoretic terms, Boole's inequality follows from the fact that a measure (and certainly any probability measure) is σ-sub-additive. Thus Boole's inequality holds not only for probability measures <math>{\mathbb P}</math>, but more generally when <math>{\mathbb P}</math> is replaced by any finite measure.

ProofEdit

Proof using inductionEdit

Boole's inequality may be proved for finite collections of <math>n</math> events using the method of induction.Template:Citation needed

For the <math>n=1</math> case, it follows that

<math>\mathbb P(A_1) \le \mathbb P(A_1).</math>

For the case <math>n</math>, we have

<math>{\mathbb P}\left(\bigcup_{i=1}^{n} A_i \right) \le \sum_{i=1}^{n} {\mathbb P}(A_i).</math>

Since <math>\mathbb P(A \cup B) = \mathbb P(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B),</math> and because the union operation is associative, we have

<math>\mathbb{P}\left(\bigcup_{i=1}^{n+1}A_i\right) = \mathbb{P}\left(\bigcup_{i=1}^n A_i\right) + \mathbb{P}(A_{n+1}) -\mathbb{P}\left(\bigcup_{i=1}^n A_i \cap A_{n+1}\right).</math>

Since

<math>{\mathbb P}\left(\bigcup_{i=1}^n A_i \cap A_{n+1}\right) \ge 0,</math>

by the first axiom of probability, we have

<math>\mathbb{P}\left(\bigcup_{i=1}^{n+1} A_i \right) \le \mathbb{P} \left(\bigcup_{i=1}^n A_i\right) + \mathbb{P}(A_{n+1}),</math>

and therefore

<math>\mathbb{P}\left(\bigcup_{i=1}^{n+1} A_i \right) \le \sum_{i=1}^{n} \mathbb{P}(A_i) + \mathbb{P}(A_{n+1}) = \sum_{i=1}^{n+1} \mathbb{P}(A_i).</math>

Proof without using inductionEdit

Let events <math>A_1, A_2, A_3, \dots </math>in our probability space be given. The countable additivity of the measure <math>\mathbb{P}</math> states that if <math>B_1, B_2, B_3, \dots</math> are pairwise disjoint events, then

<math>\mathbb{P}\left(\bigcup_{i} B_i\right) = \sum_i \mathbb P(B_i).</math>

Set

<math>B_i := A_i - \bigcup^{i-1}_{j=1} A_j.</math>

Then <math>B_1, B_2, B_3, \dots</math> are pairwise disjoint. We claim that:

<math>\bigcup^{\infty}_{i=1} A_i = \bigcup^{\infty}_{i=1} B_i.</math>

One inclusion is clear. Indeed, since <math>B_i \subset A_i</math> for all i, thus <math>\bigcup^{\infty}_{i=1} B_i \subset \bigcup^{\infty}_{i=1} A_i</math>.

For the other inclusion, let <math>x \in \bigcup^{\infty}_{i=1} A_i</math> be given. Write <math>k</math> for the minimum positive integer such that <math>x \in A_k</math>. Then <math>x \in A_k - \bigcup^{k-1}_{j=1} A_j = B_k</math>. Thus <math>x \in \bigcup^{\infty}_{i=1} B_i</math>. Therefore <math>\bigcup^{\infty}_{i=1} A_i \subset \bigcup^{\infty}_{i=1} B_i</math>.

Therefore

<math>\mathbb P\left(\bigcup_iA_i\right) = \mathbb P\left(\bigcup_iB_i\right) = \sum_i \mathbb P (B_i) \leq \sum_i \mathbb P(A_i),</math>

where the last inequality holds because <math>B_i \subset A_i</math> implies that <math>\mathbb P (B_i) \leq \mathbb P(A_i),</math> for all i.

Bonferroni inequalitiesEdit

Boole's inequality for a finite number of events may be generalized to certain upper and lower bounds on the probability of finite unions of events.<ref>Template:Cite book</ref> These bounds are known as Bonferroni inequalities, after Carlo Emilio Bonferroni; see Template:Harvtxt.

Let

<math>S_1 := \sum_{i=1}^n {\mathbb P}(A_i), \quad S_2 := \sum_{1\le i_1 < i_2\le n} {\mathbb P}(A_{i_1} \cap A_{i_2} ),\quad \ldots,\quad S_k := \sum_{1\le i_1<\cdots<i_k\le n} {\mathbb P}(A_{i_1}\cap \cdots \cap A_{i_k} ) </math>

for all integers k in {1, ..., n}.

Then, when <math>K \leq n </math> is odd:

<math> \sum_{j=1}^K (-1)^{j-1} S_j \geq \mathbb{P}\Big(\bigcup_{i=1}^n A_i\Big) = \sum_{j=1}^n (-1)^{j-1} S_j </math>

holds, and when <math>K \leq n</math> is even:

<math> \sum_{j=1}^K (-1)^{j-1} S_j \leq \mathbb{P}\Big(\bigcup_{i=1}^n A_i\Big) = \sum_{j=1}^n (-1)^{j-1} S_j </math>

holds.

The inequalities follow from the inclusion–exclusion principle, and Boole's inequality is the special case of <math>K=1</math>. Since the proof of the inclusion-exclusion principle requires only the finite additivity (and nonnegativity) of <math>\mathbb{P}</math>, thus the Bonferroni inequalities holds more generally <math>\mathbb{P}</math> is replaced by any finite content, in the sense of measure theory.

Proof for odd KEdit

Let <math> E = \bigcap_{i=1}^n B_i </math>, where <math> B_i \in \{A_i, A_i^c\} </math> for each <math> i = 1, \dots, n </math>. These such <math> E </math> partition the sample space, and for each <math> E </math> and every <math> i </math>, <math> E </math> is either contained in <math> A_i </math> or disjoint from it.

If <math> E = \bigcap_{i=1}^n A_i^c </math>, then <math> E </math> contributes 0 to both sides of the inequality.

Otherwise, assume <math> E </math> is contained in exactly <math> L </math> of the <math> A_i </math>. Then <math> E </math> contributes exactly <math> \mathbb{P}(E) </math> to the right side of the inequality, while it contributes

<math> \sum_{j=1}^K (-1)^{j-1} {L \choose j} \mathbb{P}(E) </math>

to the left side of the inequality. However, by Pascal's rule, this is equal to

<math> \sum_{j=1}^K (-1)^{j-1} \Big({L-1 \choose j-1} + {L-1 \choose j} \Big)\mathbb{P}(E) </math>

which telescopes to

<math> \Big( 1 + {L-1 \choose K}\Big) \mathbb{P}(E) \geq \mathbb{P}(E) </math>

Thus, the inequality holds for all events <math> E </math>, and so by summing over <math> E </math>, we obtain the desired inequality:

<math> \sum_{j=1}^K (-1)^{j-1} S_j \geq \mathbb{P}\Big(\bigcup_{i=1}^n A_i\Big) </math>

The proof for even <math> K </math> is nearly identical.<ref>Template:Cite book</ref>

ExampleEdit

Suppose that you are estimating five parameters based on a random sample, and you can control each parameter separately. If you want your estimations of all five parameters to be good with a chance 95%, what should you do to each parameter?

Tuning each parameter's chance to be good to within 95% is not enough because "all are good" is a subset of each event "Estimate i is good". We can use Boole's Inequality to solve this problem. By finding the complement of event "all five are good", we can change this question into another condition:

P(at least one estimation is bad) = 0.05 ≤ P(A₁ is bad) + P(A₂ is bad) + P(A₃ is bad) + P(A₄ is bad) + P(A₅ is bad)

One way is to make each of them equal to 0.05/5 = 0.01, that is 1%. In other words, you have to guarantee each estimate good to 99%( for example, by constructing a 99% confidence interval) to make sure the total estimation to be good with a chance 95%. This is called the Bonferroni Method of simultaneous inference.

ReferencesEdit

Template:Reflist