Editing Bernstein polynomial (section)

==Approximating continuous functions==
Let ''&fnof;'' be a [[continuous function]] on the interval [0,&nbsp;1]. Consider the Bernstein polynomial
:<math>B_n(f)(x) = \sum_{\nu = 0}^n f\left( \frac{\nu}{n} \right) b_{\nu,n}(x).</math>

It can be shown that
:<math>\lim_{n \to \infty}{ B_n(f) } = f </math>

[[uniform convergence|uniformly]] on the interval&nbsp;[0,&nbsp;1].<ref name=Nat6>Natanson (1964) p.&nbsp;6</ref><ref name="Lorentz"/><ref name="Feller 1966">{{harvnb|Feller|1966}}</ref><ref name="Beals 2004">{{harvnb|Beals|2004}}</ref>

Bernstein polynomials thus provide one way to prove the [[Stone&ndash;Weierstrass theorem#Weierstrass approximation theorem|Weierstrass approximation theorem]] that every real-valued continuous function on a real interval [''a'',&nbsp;''b''] can be uniformly approximated by polynomial functions over&nbsp;<math>\mathbb R</math>.<ref name=Nat3>Natanson (1964) p.&nbsp;3</ref>

A more general statement for a function with continuous ''k''<sup>th</sup> derivative is
:<math>{\left\| B_n(f)^{(k)} \right\|}_\infty \le \frac{ (n)_k }{ n^k } \left\| f^{(k)} \right\|_\infty \quad\ \text{and} \quad\ \left\| f^{(k)}- B_n(f)^{(k)} \right\|_\infty \to 0,</math>
where additionally
:<math>\frac{ (n)_k }{ n^k } = \left( 1 - \frac{0}{n} \right) \left( 1 - \frac{1}{n} \right) \cdots \left( 1 - \frac{k - 1}{n} \right)</math>
is an [[eigenvalue]] of ''B''<sub>''n''</sub>; the corresponding eigenfunction is a polynomial of degree&nbsp;''k''.

===Probabilistic proof===

This proof follows Bernstein's original proof of 1912.<ref>{{harvnb|Bernstein|1912}}</ref> See also Feller (1966) or Koralov & Sinai (2007).<ref>{{cite book |first1=L. |last1=Koralov |first2=Y. |last2=Sinai |title=Theory of probability and random processes |edition=2nd |publisher=Springer |year=2007 |page=29 |chapter="Probabilistic proof of the Weierstrass theorem"}}</ref><ref name="Feller 1966"/>

====Motivation====

We will first give intuition for Bernstein's original proof. A continuous function on a compact interval must be uniformly continuous. Thus, the value of any continuous function can be uniformly approximated by its value on some finite net of points in the interval. This consideration renders the approximation theorem intuitive, given that polynomials should be flexible enough to match (or nearly match) a finite number of pairs <math>(x, f(x))</math>. To do so, we might (1) construct a function close to <math>f</math> on a lattice, and then (2) smooth out the function outside the lattice to make a polynomial. 

The probabilistic proof below simply provides a constructive method to create a polynomial which is approximately equal to <math>f</math> on such a point lattice, given that "smoothing out" a function is not always trivial. Taking the expectation of a random variable with a simple distribution is a common way to smooth. Here, we take advantage of the fact that Bernstein polynomials look like Binomial expectations. We split the interval into a lattice of ''n'' discrete values.  Then, to evaluate any ''f(x)'', we evaluate ''f'' at one of the ''n'' lattice points close to ''x'', randomly chosen by the Binomial distribution.  The expectation of this approximation technique is polynomial, as it is the expectation of a function of a binomial RV. The proof below illustrates that this achieves a uniform approximation of ''f''. The crux of the proof is to (1) justify replacing an arbitrary point with a binomially chosen lattice point by concentration properties of a Binomial distribution, and (2) justify the inference from <math>x \approx X</math> to <math>f(x) \approx f(X)</math> by uniform continuity. 

====Bernstein's proof====

Suppose ''K'' is a [[random variable]] distributed as the number of successes in ''n'' independent [[Bernoulli trial]]s with probability ''x'' of success on each trial; in other words, ''K'' has a [[binomial distribution]] with parameters ''n'' and&nbsp;''x''. Then we have the [[expected value]] <math>\operatorname{\mathcal E}\left[\frac{K}{n}\right] = x\ </math> and
:<math>p(K) = {n \choose K} x^{K} \left( 1 - x \right)^{n - K} = b_{K,n}(x)</math>

By the [[law of large numbers|weak law of large numbers]] of [[probability theory]],
:<math>\lim_{n \to \infty}{ P\left( \left| \frac{K}{n} - x \right|>\delta \right) } = 0</math>

for every ''&delta;''&nbsp;>&nbsp;0. Moreover, this relation holds uniformly in ''x'', which can be seen from its proof via [[Chebyshev's inequality]], taking into account that the variance of {{frac|1|''n''}}&nbsp;''K'', equal to {{frac|1|''n''}}&nbsp;''x''(1&minus;''x''), is bounded from above by {{frac|1|(4''n'')}} irrespective of ''x''.

Because ''&fnof;'', being continuous on a closed bounded interval, must be [[uniform continuity|uniformly continuous]] on that interval, one infers a statement of the form
:<math>\lim_{n \to \infty}{ P\left( \left| f\left( \frac{K}{n} \right) - f\left( x \right) \right| > \varepsilon \right) } = 0</math>

uniformly in ''x'' for each <math>\epsilon > 0</math>. Taking into account that ''ƒ'' is bounded (on the given interval) one finds that
: <math>\lim_{n \to \infty}{ \operatorname{\mathcal E}\left( \left| f\left( \frac{K}{n} \right) - f\left( x \right) \right| \right) } = 0</math>
uniformly in ''x''. To justify this statement, we use a common method in probability theory to convert from closeness in probability to closeness in expectation. One splits the expectation of <math>\left| f\left( \frac{K}{n} \right) - f\left( x \right) \right|</math> into two parts split based on whether or not <math>\left| f\left( \frac{K}{n} \right) - f\left( x \right) \right| < \epsilon</math>. In the interval where the difference does not exceed ''ε'', the expectation clearly cannot exceed ''ε''.
In the other interval, the difference still cannot exceed 2''M'', where ''M'' is an upper bound for |''&fnof;''(x)| (since uniformly continuous functions are bounded). However, by our 'closeness in probability' statement, this interval cannot have probability greater than ''ε''. Thus, this part of the expectation contributes no more than 2''M'' times ''ε''. Then the total expectation is no more than <math>\epsilon + 2M\epsilon</math>, which can be made arbitrarily small by choosing small ''ε''.

Finally, one observes that the absolute value of the difference between expectations never exceeds the expectation of the absolute value of the difference, a consequence of Holder's Inequality. Thus, using the above expectation, we see that (uniformly in ''x'')
: <math>\lim_{n \to \infty}{ \left| \operatorname{\mathcal E}f\left( \frac{K}{n} \right) - \operatorname{\mathcal E}f\left( x \right) \right| } \leq \lim_{n \to \infty}{ \operatorname{\mathcal E}\left( \left| f\left( \frac{K}{n} \right) - f\left( x \right) \right| \right) } = 0</math>

Noting that our randomness was over ''K'' while ''x'' is constant, the expectation of ''f(x)'' is just equal to ''f(x)''. But then we have shown that <math>\operatorname{\mathcal E_x}f\left( \frac{K}{n} \right)</math> converges to ''f(x)''. Then we will be done if <math>\operatorname{\mathcal E_x}f\left( \frac{K}{n} \right)</math> is a polynomial in ''x'' (the subscript reminding us that ''x'' controls the distribution of ''K''). Indeed it is:
:<math>\operatorname{\mathcal E_x}\left[f\left(\frac{K}{n}\right)\right] = \sum_{K=0}^n f\left(\frac{K}{n}\right) p(K) = \sum_{K=0}^n f\left(\frac{K}{n}\right) b_{K,n}(x) = B_n(f)(x)</math>

====Uniform convergence rates between functions====

In the above proof, recall that convergence in each limit involving ''f'' depends on the uniform continuity of ''f'', which implies a rate of convergence dependent on ''f'' 's  [[modulus of continuity]] <math>\omega.</math> It also depends on 'M', the absolute bound of the function, although this can be bypassed if one bounds  <math>\omega</math> and the interval size. Thus, the approximation only holds uniformly across ''x'' for a fixed ''f'', but one can readily extend the proof to uniformly approximate a set of functions with a set of Bernstein polynomials in the context of [[equicontinuity]].

=== Elementary proof ===

The probabilistic proof can also be rephrased in an elementary way, using the underlying probabilistic ideas but proceeding by direct verification:<ref>{{harvnb|Lorentz|1953|pages=5–6}}</ref><ref name="Beals 2004"/><ref>{{harvnb|Goldberg|1964}}</ref><ref>{{harvnb|Akhiezer|1956}}</ref><ref>{{harvnb|Burkill|1959}}</ref>

The following identities can be verified:

# <math> \sum_k {n \choose k} x^k (1-x)^{n-k} = 1</math> ("probability")
# <math> \sum_k {k\over n} {n \choose k} x^k (1-x)^{n-k} = x</math> ("mean")
# <math> \sum_k \left( x -{k\over n}\right)^2 {n \choose k} x^k (1-x)^{n-k} = {x(1-x)\over n}. </math> ("variance")

In fact, by the binomial theorem

<math display="block">(1+t)^n = \sum_k {n \choose k} t^k,</math>

and this equation can be applied twice to <math>t\frac{d}{dt}</math>. The identities (1), (2), and (3) follow easily using the substitution <math>t = x/ (1 - x)</math>.

Within these three identities, use the above basis polynomial notation

:<math> b_{k,n}(x) = {n\choose k} x^k (1-x)^{n-k},</math>

and let

:<math> f_n(x) = \sum_k f(k/n)\, b_{k,n}(x).</math>

Thus, by identity (1)

:<math>f_n(x) - f(x) =  \sum_k [f(k/n) - f(x)] \,b_{k,n}(x), </math>

so that

:<math>|f_n(x) - f(x)| \le   \sum_k |f(k/n) - f(x)| \, b_{k,n}(x).</math>

Since ''f'' is uniformly continuous, given <math>\varepsilon > 0</math>, there is a <math>\delta > 0</math> such that <math>|f(a) - f(b)| < \varepsilon</math> whenever
<math>|a-b| < \delta</math>.  Moreover, by continuity, <math>M= \sup |f| < \infty</math>. But then

:<math> |f_n(x) - f(x)| \le \sum_{|x -{k\over n}|< \delta} |f(k/n) - f(x)|\, b_{k,n}(x) + \sum_{|x -{k\over n}|\ge \delta} |f(k/n) - f(x)|\, b_{k,n}(x) .</math>

The first sum is less than ε. On the other hand, by identity (3) above, and since <math>|x - k/n| \ge \delta</math>, the second sum is bounded by <math>2M</math> times

:<math>\sum_{|x - k/n|\ge \delta} b_{k,n}(x) \le  \sum_k \delta^{-2} \left(x -{k\over n}\right)^2 b_{k,n}(x)  = \delta^{-2} {x(1-x)\over n} <  {1\over4} \delta^{-2} n^{-1}.</math>

:([[Chebyshev's inequality]])

It follows that the polynomials ''f''<sub>''n''</sub> tend to ''f'' uniformly.