Editing Rejection sampling (section)

== Rejection sampling using exponential tilting ==

Given a random variable <math>X\sim F(\cdot)</math>, <math>F(x)=\mathbb{P}(X\le x)</math> is the target distribution. Assume for simplicity, the density function can be explicitly written as <math>f(x)</math>. Choose the proposal as

<math display="block">\begin{align}
F_\theta (x)&=\mathbb{E}\left[\exp(\theta X-\psi (\theta))\mathbb{I}(X\le x)\right]\\
&=\int^x_{-\infty}e^{\theta y-\psi(\theta)}f(y)dy\\
g_\theta(x)&=F'_\theta(x)=e^{\theta x-\psi(\theta)}f(x)
\end{align}</math>

where <math>\psi(\theta) = \log\left(\mathbb{E}\exp(\theta X)\right)</math> and <math>\Theta = \{\theta :\psi(\theta)<\infty\}</math>. Clearly, <math>\{F_\theta (\cdot)\}_{\theta \in \Theta}</math>, is from a [[natural exponential family]]. Moreover, the likelihood ratio is

<math display="block">Z(x)=\frac{f(x)}{g_\theta(x)}=\frac{f(x)}{e^{\theta x-\psi(\theta)}f(x)}=e^{-\theta x+\psi(\theta)}</math>

Note that <math>\psi (\theta)<\infty</math> implies that it is indeed a [[Cumulant-generating function|cumulant-generation function]], that is,

:<math>\psi(\theta)=\log\mathbb{E}{\exp(tX)}|_{t=\theta}=\log M_X(t)|_{t=\theta}</math>.

It is easy to derive the cumulant-generation function of the proposal and therefore the proposal's cumulants.

::<math>
\begin{align}
\psi_\theta(\eta)&=\log\left(\mathbb{E}_\theta\exp(\eta X)\right)=\psi(\theta+\eta)-\psi(\theta)<\infty\\
\mathbb{E}_\theta(X)&= \left.\frac{\partial \psi_\theta(\eta)}{\partial\eta}\right|_{\eta=0}\\
\mathrm{Var}_\theta(X)&= \left.\frac{\partial^2 \psi_\theta(\eta)}{\partial^2\eta}\right|_{\eta=0}
\end{align}
</math>

As a simple example, suppose under <math>F(\cdot)</math>, <math>X \sim \mathrm{N}(\mu, \sigma^2)</math>, with <math display="inline">\psi(\theta)= \mu\theta+\frac{\sigma^2\theta^2}{2}</math>. The goal is to sample <math>X|X\in \left[b,\infty\right]</math>, where <math>b>\mu</math>. The analysis goes as follows:

* Choose the form of the proposal distribution <math>F_\theta(\cdot)</math>, with cumulant-generating function as 
::<math display="inline">\psi_\theta(\eta) = \psi (\theta+\eta) - \psi(\theta) = (\mu+\theta\sigma^2)\eta + \frac{\sigma^2\eta^2}{2}</math>,

:which further implies it is a normal distribution <math>\mathrm{N}(\mu+\theta\sigma^2, \sigma^2)</math>.

* Decide the well chosen <math>\theta^*</math> for the proposal distribution. In this setup, the intuitive way to choose <math>\theta^*</math> is to set 
::<math>\mathbb{E}_{\theta}(X)=\mu+\theta\sigma^2=b</math>, 
:that is <math>\theta^*=\frac{b-\mu}{\sigma^2}.</math> The proposal distribution is thus <math>g_{\theta^*}(x) = \mathrm{N}(b, \sigma^2)</math>.

* Explicitly write out the target, the proposal and the likelihood ratio

:<math display="block">
\begin{align}
f_{X|X\ge b}(x)&=\frac{f(x)\mathbb{I}(x\ge b)}{\mathbb{P}(X\ge b)}\\
g_{\theta^*}(x)&=f(x)\exp(\theta^* x-\psi(\theta^*))\\
Z(x)&=\frac{f_{X|X\ge b}(x)}{g_{\theta^*}(x)}=\frac{\exp(-\theta^* x+\psi(\theta^*))\mathbb{I}(x\ge b)}{\mathbb{P}(X\ge b)}
\end{align}
</math>

* Derive the bound <math>M</math> for the likelihood ratio <math>Z(x)</math>, which is a decreasing function for <math>x \in [b, \infty]</math>, therefore

:<math display="block">M=Z(b)=\frac{\exp(-\theta^*b+\psi(\theta^*))}{\mathbb{P}(X\ge b)} = \frac{\exp\left(-\frac{(b-\mu)^2}{2\sigma^2}\right)}{\mathbb{P}(X\ge b)} = \frac{\exp\left(-\frac{(b-\mu)^2}{2\sigma^2}\right)}{\mathbb{P}\left(\mathrm{N}(0,1)\ge\frac{b-\mu}{\sigma}\right)}</math>

* Rejection sampling criterion: for <math>U\sim \mathrm{Unif}(0,1)</math>, if

:<math display="block">U\le \frac{Z(x)}{M}=e^{-\theta^*(x-b)}\mathbb{I}(x\ge b)</math>

holds, accept the value of <math>X</math>; if not, continue sampling new <math display="inline">X\sim_{i.i.d.}\mathrm{N}(\mu+\theta^*\sigma^2,\sigma^2)</math> and new <math display="inline">U\sim \mathrm{Unif}(0,1)</math> until acceptance.

For the above example, as the measurement of the efficiency, the expected number of the iterations the natural exponential family based rejection sampling method is of order <math>b</math>, that is <math>M(b)=O(b)</math>, while under the naive method, the expected number of the iterations is <math display="inline">\frac{1}{\mathbb{P}(X\ge b)}=O(b\cdot e^{\frac{(b-\mu)^2}{2\sigma^2}})</math>, which is far more inefficient.

In general, [[exponential tilting]] a parametric class of proposal distribution, solves the optimization problems conveniently, with its useful properties that directly characterize the distribution of the proposal. For this type of problem, to simulate <math>X</math> conditionally on <math>X\in A</math>, among the class of simple distributions, the trick is to use natural exponential family, which helps to gain some control over the complexity and considerably speed up the computation. Indeed, there are deep mathematical reasons for using natural exponential family.