Editing Bernoulli process (section)

== Law of large numbers, binomial distribution and central limit theorem==
{{main article|Law of large numbers|Central limit theorem|Binomial distribution}}
Let us assume the canonical process with <math> H </math> represented by <math> 1 </math> and <math> T </math> represented by <math> 0 </math>. The [[law of large numbers]] states that the average of the sequence, i.e., <math> \bar{X}_{n}:=\frac{1}{n}\sum_{i=1}^{n}X_{i} </math>, will approach the [[expected value]] almost certainly, that is, the events which do not satisfy this limit have zero probability. The [[expectation value]] of flipping ''heads'', assumed to be represented by 1, is given by <math>p</math>. In fact, one has
:<math>\mathbb{E}[X_i]=\mathbb{P}([X_i=1])=p,</math>
for any given random variable <math>X_i</math> out of the infinite sequence of [[Bernoulli trial]]s that compose the Bernoulli process.

One is often interested in knowing how often one will observe ''H'' in a sequence of ''n'' coin flips.  This is given by simply counting:  Given ''n'' successive coin flips, that is, given the set of all possible [[string (computer science)|strings]] of length ''n'', the number ''N''(''k'',''n'') of such strings that contain ''k'' occurrences of ''H'' is given by the [[binomial coefficient]] 
:<math>N(k,n) = {n \choose k}=\frac{n!}{k! (n-k)!}</math>

If the probability of flipping heads is given by ''p'', then the total probability of seeing a string of length ''n'' with ''k'' heads is
:<math>\mathbb{P}([S_n=k]) = {n\choose k} p^k (1-p)^{n-k} , </math>
where  <math> S_n=\sum_{i=1}^{n}X_i </math>.
The probability measure thus defined is known as the [[Binomial distribution]].

As we can see from the above formula that, if n=1, the ''Binomial distribution'' will turn into a ''Bernoulli distribution''. So we can know that the ''Bernoulli distribution'' is exactly a special case of ''Binomial distribution'' when n equals to 1.

Of particular interest is the question of the value of <math>S_{n}</math> for a sufficiently long sequences of coin flips, that is, for the limit <math>n\to\infty</math>.  In this case, one may make use of [[Stirling's approximation]] to the factorial, and write
:<math>n! = \sqrt{2\pi n} \;n^n e^{-n}
 \left(1 + \mathcal{O}\left(\frac{1}{n}\right)\right)</math>

Inserting this into the expression for ''P''(''k'',''n''), one obtains the [[Normal distribution]]; this is the content of the [[central limit theorem]], and this is the simplest example thereof.

The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the [[asymptotic equipartition property]].  Put informally, one notes that, yes, over many coin flips, one will observe ''H'' exactly ''p'' fraction of the time, and that this corresponds exactly with the peak of the Gaussian.  The asymptotic equipartition property essentially states that this peak is infinitely sharp, with infinite fall-off on either side.  That is, given the set of all possible infinitely long strings of ''H'' and ''T'' occurring in the Bernoulli process, this set is partitioned into two: those strings that occur with probability 1, and those that occur with probability 0.  This partitioning is known as the [[Kolmogorov 0-1 law]].

The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the [[information entropy|entropy]] of the Bernoulli process.  Once again, consider the set of all strings of length ''n''. The size of this set is <math>2^n</math>.  Of these, only a certain subset are likely; the size of this set is <math>2^{nH}</math> for <math>H\le 1</math>.  By using Stirling's approximation, putting it into the expression for ''P''(''k'',''n''), solving for the location and width of the peak, and finally taking <math>n\to\infty</math> one finds that

:<math>H=-p\log_2 p - (1-p)\log_2(1-p)</math>

This value is the [[Bernoulli entropy]] of a Bernoulli process. Here, ''H'' stands for entropy; not to be confused with the same symbol ''H'' standing for ''heads''.

[[John von Neumann]] posed a question about the Bernoulli process regarding the possibility of a given process being [[isomorphic]] to another, in the sense of the [[isomorphism of dynamical systems]]. The question  long defied analysis, but was finally and completely answered with the [[Ornstein isomorphism theorem]].  This breakthrough resulted in the understanding that the Bernoulli process is unique and [[universal property|universal]]; in a certain sense, it is the single most random process possible; nothing is 'more' random than the Bernoulli process (although one must be careful with this informal statement; certainly, systems that are [[mixing (mathematics)|mixing]] are, in a certain sense, "stronger" than the Bernoulli process, which is merely ergodic but not mixing. However, such processes do not consist of independent random variables: indeed, many purely deterministic, non-random systems can be mixing).