Editing Gibbs sampling (section)

== Inference ==
Gibbs sampling is commonly used for [[statistical inference]] (e.g. determining the best value of a parameter, such as determining the number of people likely to shop at a particular store on a given day, the candidate a voter will most likely vote for, etc.).  The idea is that observed data is incorporated into the sampling process by creating separate variables for each piece of observed data and fixing the variables in question to their observed values, rather than sampling from those variables.  The distribution of the remaining variables is then effectively a [[posterior distribution]] conditioned on the observed data.

The most likely value of a desired parameter (the [[mode (statistics)|mode]]) could then simply be selected by choosing the sample value that occurs most commonly; this is essentially equivalent to [[maximum a posteriori]] estimation of a parameter. (Since the parameters are usually continuous, it is often necessary to "bin" the sampled values into one of a finite number of ranges or "bins" in order to get a meaningful estimate of the mode.) More commonly, however, the [[expected value]] ([[mean]] or average) of the sampled values is chosen; this is a [[Bayes estimator]] that takes advantage of the additional data about the entire distribution that is available from Bayesian sampling, whereas a maximization algorithm such as [[expectation maximization]] (EM) is capable of only returning a single point from the distribution.  For example, for a unimodal distribution the mean (expected value) is usually similar to the mode (most common value), but if the distribution is [[skewness|skewed]] in one direction, the mean will be moved in that direction, which effectively accounts for the extra probability mass in that direction. (If a distribution is multimodal, the expected value may not return a meaningful point, and any of the modes is typically a better choice.)

Although some of the variables typically correspond to parameters of interest, others are uninteresting ("nuisance") variables introduced into the model to properly express the relationships among variables.  Although the sampled values represent the [[joint distribution]] over all variables, the nuisance variables can simply be ignored when computing expected values or modes; this is equivalent to [[marginal distribution|marginalizing]] over the nuisance variables.  When a value for multiple variables is desired, the expected value is simply computed over each variable separately. (When computing the mode, however, all variables must be considered together.)

[[Supervised learning]], [[unsupervised learning]] and [[semi-supervised learning]] (aka learning with missing values) can all be handled by simply fixing the values of all variables whose values are known, and sampling from the remainder.

For observed data, there will be one variable for each observation—rather than, for example, one variable corresponding to the [[sample mean]] or [[sample variance]] of a set of observations.  In fact, there generally will be no variables at all corresponding to concepts such as "sample mean" or "sample variance".  Instead, in such a case there will be variables representing the unknown true mean and true variance, and the determination of sample values for these variables results automatically from the operation of the Gibbs sampler.

[[Generalized linear model]]s (i.e. variations of [[linear regression]]) can sometimes be handled by Gibbs sampling as well.  For example, [[probit regression]] for determining the probability of a given binary (yes/no) choice, with [[normal distribution|normally distributed]] priors placed over the regression coefficients, can be implemented with Gibbs sampling because it is possible to add additional variables and take advantage of [[conjugate prior|conjugacy]].  However, [[logistic regression]] cannot be handled this way.  One possibility is to approximate the [[logistic function]] with a mixture (typically 7–9) of normal distributions.  More commonly, however, [[Metropolis–Hastings]] is used instead of Gibbs sampling.