Editing Generalized linear model (section)

=== Link function ===

The link function provides the relationship between the linear predictor and the [[Expected value|mean]] of the distribution function.  There are many commonly used link functions, and their choice is informed by several considerations. There is always a well-defined ''canonical'' link function which is derived from the exponential of the response's [[density function]]. However, in some cases it makes sense to try to match the [[Domain of a function|domain]] of the link function to the [[range of a function|range]] of the distribution function's mean, or use a non-canonical link function for algorithmic purposes, for example [[Probit model#Gibbs sampling|Bayesian probit regression]].

When using a distribution function with a canonical parameter <math>\theta,</math> the canonical link function is the function that expresses <math>\theta</math> in terms of <math>\mu,</math> i.e. <math>\theta = g(\mu).</math>  For the most common distributions, the mean <math>\mu</math> is one of the parameters in the standard form of the distribution's [[density function]], and then <math>g(\mu)</math> is the function as defined above that maps the density function into its canonical form.  When using the canonical link function, <math>g(\mu) = \theta = \mathbf{X}\boldsymbol{\beta},</math> which allows <math>\mathbf{X}^{\rm T} \mathbf{Y}</math> to be a [[sufficiency (statistics)|sufficient statistic]] for <math>\boldsymbol{\beta}</math>.

Following is a table of several exponential-family distributions in common use and the data they are typically used for, along with the canonical link functions and their inverses (sometimes referred to as the mean function, as done here).

{| class="wikitable"
|+ Common distributions with typical uses and canonical link functions 
! Distribution !! Support of distribution !! Typical uses !! Link name !! Link function, <math>\mathbf{X}\boldsymbol{\beta}=g(\mu)\,\!</math>  !! Mean function
|-
| [[normal distribution|Normal]]
| rowspan="2" |real: <math>(-\infty,+\infty)</math> || rowspan="2" |Linear-response data || rowspan="2" | Identity 
| rowspan="2" |<math>\mathbf{X}\boldsymbol{\beta}=\mu\,\!</math> || rowspan="2" | <math>\mu=\mathbf{X}\boldsymbol{\beta}\,\!</math>
|-
| [[Laplace distribution|Laplace]]
|-
| [[exponential distribution|Exponential]]
| rowspan="2" | real: <math>(0,+\infty)</math> || rowspan="2" | Exponential-response data, scale parameters
| rowspan="2" | [[Multiplicative inverse|Negative inverse]]
| rowspan="2" | <math>\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\!</math> 
| rowspan="2" | <math>\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\!</math>
|-
| [[gamma distribution|Gamma]]
|-
| [[Inverse Gaussian distribution|Inverse <br>Gaussian]]
| real: <math>(0, +\infty)</math> || || Inverse <br>squared || <math>\mathbf{X}\boldsymbol{\beta}=\mu^{-2}\,\!</math> || <math>\mu=(\mathbf{X}\boldsymbol{\beta})^{-1/2}\,\!</math>
|-
| [[Poisson distribution|Poisson]]
| integer: <math>0,1,2,\ldots</math> || count of occurrences in fixed amount of time/space || [[Natural logarithm|Log]] || <math>\mathbf{X}\boldsymbol{\beta} = \ln(\mu) \,\!</math> || <math>\mu=\exp (\mathbf{X}\boldsymbol{\beta}) \,\!</math>
|-
| [[Bernoulli distribution|Bernoulli]]
| integer: <math>\{0,1\}</math> || outcome of single yes/no occurrence 
| rowspan="5" | [[Logit]]
| <math>\mathbf{X}\boldsymbol{\beta}=\ln \left(\frac \mu {1-\mu}\right) \,\!</math>
| rowspan="5" | <math>\mu=\frac{\exp(\mathbf{X}\boldsymbol{\beta})}{1 + \exp(\mathbf{X}\boldsymbol{\beta})} = \frac 1 {1 + \exp(-\mathbf{X} \boldsymbol{\beta})} \,\!</math>
|-
| [[binomial distribution|Binomial]]
| integer: <math>0,1,\ldots,N</math> || count of # of "yes" occurrences out of N yes/no occurrences 
|<math>\mathbf{X}\boldsymbol{\beta}=\ln \left(\frac \mu {n-\mu}\right) \,\!</math>
|-
| rowspan=2| [[categorical distribution|Categorical]]
| integer: <math>[0,K)</math>|| rowspan=2| outcome of single ''K''-way occurrence 
| rowspan="3" |<math>\mathbf{X}\boldsymbol{\beta}=\ln \left(\frac \mu {1-\mu}\right) \,\!</math>
|-
| ''K''-vector of integer: <math>[0,1]</math>, where exactly one element in the vector has the value 1
|-
| [[multinomial distribution|Multinomial]]
| ''K''-vector of integer: <math>[0,N]</math> || count of occurrences of different types (1, ..., ''K'') out of ''N'' total ''K''-way occurrences 
|}

In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean. In particular, the linear predictor may be positive, which would give an impossible negative mean.  When maximizing the likelihood, precautions must be taken to avoid this.  An alternative is to use a noncanonical link function.

In the case of the Bernoulli, binomial, categorical and multinomial distributions, the support of the distributions is not the same type of data as the parameter being predicted.  In all of these cases, the predicted parameter is one or more probabilities, i.e. real numbers in the range <math>[0,1]</math>. The resulting model is known as ''[[logistic regression]]'' (or ''[[multinomial logistic regression]]'' in the case that ''K''-way rather than binary values are being predicted).

For the Bernoulli and binomial distributions, the parameter is a single probability, indicating the likelihood of occurrence of a single event.  The Bernoulli still satisfies the basic condition of the generalized linear model in that, even though a single outcome will always be either 0 or 1, the ''[[expected value]]'' will nonetheless be a real-valued probability, i.e. the probability of occurrence of a "yes" (or 1) outcome.  Similarly, in a binomial distribution, the expected value is ''Np'', i.e. the expected proportion of "yes" outcomes will be the probability to be predicted.

For categorical and multinomial distributions, the parameter to be predicted is a ''K''-vector of probabilities, with the further restriction that all probabilities must add up to 1.  Each probability indicates the likelihood of occurrence of one of the ''K'' possible values. For the multinomial distribution, and for the vector form of the categorical distribution, the expected values of the elements of the vector can be related to the predicted probabilities similarly to the binomial and Bernoulli distributions.