Editing Rejection sampling (section)

==Adaptive rejection sampling==

For many distributions, finding a proposal distribution that includes the given distribution without a lot of wasted space is difficult.  An extension of rejection sampling that can be used to overcome this difficulty and efficiently sample from a wide variety of distributions (provided that they have [[Logarithmically concave function|log-concave]] density functions, which is in fact the case for most of the common distributions—even those whose ''density'' functions are not concave themselves) is known as '''adaptive rejection sampling (ARS)'''.

There are three basic ideas to this technique as ultimately introduced by Gilks in 1992:<ref>{{cite journal |first1=W. R. |last1=Gilks |first2=P. |last2=Wild |title=Adaptive Rejection Sampling for Gibbs Sampling |journal=Journal of the Royal Statistical Society |series=Series C (Applied Statistics) |volume=41 |issue=2 |year=1992 |pages=337–348 |doi=10.2307/2347565 |jstor=2347565 }}</ref>

# If it helps, define your envelope distribution in log space (e.g. log-probability or log-density) instead. That is, work with <math> h\left(x\right) = \log g\left(x\right)</math> instead of <math> g\left(x\right)</math> directly.
#* Often, distributions that have algebraically messy density functions have reasonably simpler log density functions (i.e. when <math>f\left( x \right) </math> is messy, <math> \log f\left(x\right)</math> may be easier to work with or, at least, closer to piecewise linear). 
# Instead of a single uniform envelope density function, use a piecewise linear density function as your envelope instead.
#* Each time you have to reject a sample, you can use the value of <math> f \left(x \right) </math> that you evaluated, to improve the piecewise approximation <math> h \left(x \right) </math>. This therefore reduces the chance that your next attempt will be rejected. Asymptotically, the probability of needing to reject your sample should converge to zero, and in practice, often very rapidly.
#* As proposed, any time we choose a point that is rejected, we tighten the envelope with another line segment that is tangent to the curve at the point with the same x-coordinate as the chosen point.
#* A piecewise linear model of the proposal log distribution results in a set of piecewise [[exponential distribution]]s (i.e. segments of one or more exponential distributions, attached end to end). Exponential distributions are well behaved and well understood. The logarithm of an exponential distribution is a straight line, and hence this method essentially involves enclosing the logarithm of the density in a series of line segments.  This is the source of the log-concave restriction: if a distribution is log-concave, then its logarithm is concave (shaped like an upside-down U), meaning that a line segment tangent to the curve will always pass over the curve. 
#* If not working in log space, a piecewise linear density function can also be sampled via triangle distributions<ref>{{cite journal |first1=D. B. |last1=Thomas |first2=W. |last2=Luk |title=Non-uniform random number generation through piecewise linear approximations |journal=IET Computers & Digital Techniques |volume=1 |issue=4 |pages=312–321 |year=2007 |doi=10.1049/iet-cdt:20060188 }}</ref>
# We can take even further advantage of the (log) concavity requirement, to potentially avoid the cost of evaluating <math>f \left( x \right)</math> when your sample ''is'' accepted.
#* Just like we can construct a piecewise linear upper bound (the "envelope" function) using the values of <math> h \left(x \right) </math> that we had to evaluate in the current chain of rejections, we can also construct a piecewise linear lower bound (the "squeezing" function) using these values as well.
#* Before evaluating (the potentially expensive) <math>f \left( x \right)</math> to see if your sample will be accepted, we may ''already know'' if it will be accepted by comparing against the (ideally cheaper) <math> g_l \left( x \right) </math> (or <math>h_l \left( x \right)</math> in this case) squeezing function that have available.
#* This squeezing step is optional, even when suggested by Gilks. At best it saves you from only one extra evaluation of your (messy and/or expensive) target density. However, presumably for particularly expensive density functions (and assuming the rapid convergence of the rejection rate toward zero) this can make a sizable difference in ultimate runtime.

The method essentially involves successively determining an envelope of straight-line segments that approximates the logarithm better and better while still remaining above the curve, starting with a fixed number of segments (possibly just a single tangent line). Sampling from a truncated exponential random variable is straightforward. Just take the log of a uniform random variable (with appropriate interval and corresponding truncation).

Unfortunately, ARS can only be applied for sampling from log-concave target densities. For this reason, several extensions of ARS have been proposed in literature for tackling non-log-concave target distributions.<ref>{{Cite journal|title = A Rejection Technique for Sampling from T-concave Distributions|journal = ACM Trans. Math. Softw.|date = 1995-06-01|issn = 0098-3500|pages = 182–193|volume = 21|issue = 2|doi = 10.1145/203082.203089|first = Wolfgang|last = Hörmann|citeseerx = 10.1.1.56.6055}}</ref><ref>{{Cite journal|title = Random Variable Generation Using Concavity Properties of Transformed Densities|jstor = 1390680|journal = Journal of Computational and Graphical Statistics|date = 1998-12-01|pages = 514–528|volume = 7|issue = 4|doi = 10.2307/1390680|first1 = M.|last1 = Evans|first2 = T.|last2 = Swartz|citeseerx = 10.1.1.53.9001}}</ref><ref>{{Cite journal|title = Concave-Convex Adaptive Rejection Sampling|journal = Journal of Computational and Graphical Statistics|date = 2011-01-01|issn = 1061-8600|pages = 670–691|volume = 20|issue = 3|doi = 10.1198/jcgs.2011.09058|first1 = Dilan|last1 = Görür|first2 = Yee Whye|last2 = Teh}}</ref> Furthermore, different combinations of ARS and the Metropolis-Hastings method have been designed in order to obtain a universal sampler that builds a self-tuning proposal densities (i.e., a proposal automatically constructed and adapted to the target). This class of methods are often called as '''Adaptive Rejection Metropolis Sampling (ARMS) algorithms'''.<ref>{{Cite journal|title = Adaptive Rejection Metropolis Sampling within Gibbs Sampling|jstor = 2986138|journal = Journal of the Royal Statistical Society |series=Series C (Applied Statistics)|date = 1995-01-01|pages = 455–472|volume = 44|issue = 4|doi = 10.2307/2986138|first1 = W. R.|last1 = Gilks|first2 = N. G.|last2 = Best|author2-link= Nicky Best |first3 = K. K. C.|last3 = Tan}}</ref><ref>{{Cite journal|title = Adaptive rejection Metropolis sampling using Lagrange interpolation polynomials of degree 2|journal = Computational Statistics & Data Analysis|date = 2008-03-15|pages = 3408–3423|volume = 52|issue = 7|doi = 10.1016/j.csda.2008.01.005|first1 = Renate|last1 = Meyer|first2 = Bo|last2 = Cai|first3 = François|last3 = Perron}}</ref> The resulting adaptive techniques can be always applied but the generated samples are correlated in this case (although the correlation vanishes quickly to zero as the number of iterations grows).