Editing Conjugate prior (section)

=== When likelihood function is a continuous distribution ===

{| class="wikitable"
! Likelihood <br> <math>p(x_i|\theta)</math>!! Model parameters <br> <math>\theta</math>!! Conjugate prior (and posterior) distribution <math>p(\theta|\Theta), p(\theta|\mathbf{x},\Theta) = p(\theta|\Theta') </math>!! Prior hyperparameters <br><math>\Theta</math>!! Posterior hyperparameters<ref name="posterior-hyperparameters" group="note" /><br><math>\Theta'</math>!!Interpretation&nbsp;of&nbsp;hyperparameters!!Posterior&nbsp;predictive<ref name="ppredNt" group="note" /><br><math>p(\tilde{x}|\mathbf{x}, \Theta) = p(\tilde{x}|\Theta')</math>
|-
| [[normal distribution|Normal]]<br>with known variance ''σ''<sup>2</sup> || ''μ'' (mean) || [[normal distribution|Normal]] || <math>\mu_0,\, \sigma_0^2\!</math>|| <math>\frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}\left(\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^n x_i}{\sigma^2}\right),
\left(\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}\right)^{-1}</math>
| mean was estimated from observations with total precision (sum of all individual precisions) <math>1/\sigma_0^2</math> and with sample mean <math>\mu_0</math>
| <math>\mathcal{N}(\tilde{x}|\mu_0', {\sigma_0^2}' +\sigma^2)</math><ref name="murphy">{{citation |last=Murphy |first=Kevin P. |title=Conjugate Bayesian analysis of the Gaussian distribution |url=http://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf |year=2007}}</ref>
|-
| [[normal distribution|Normal]]<br>with known precision ''τ'' || ''μ'' (mean) || [[normal distribution|Normal]] || <math>\mu_0,\, \tau_0^{-1}\!</math>|| <math> \frac{\tau_0 \mu_0 + \tau \sum_{i=1}^n x_i}{\tau_0 + n \tau},\, \left(\tau_0 + n \tau\right)^{-1}</math>
| mean was estimated from observations with total precision (sum of all individual precisions)<math>\tau_0</math> and with sample mean <math>\mu_0</math>
| <math>\mathcal{N}\left(\tilde{x}\mid\mu_0', \frac{1}{\tau_0'} +\frac{1}{\tau}\right)</math><ref name="murphy" />
|-
| [[Normal distribution|Normal]]<br>with known mean ''μ'' || ''σ''<sup>2</sup> (variance) || [[Inverse gamma distribution|Inverse gamma]] || <math> \mathbf{\alpha,\, \beta} </math> <ref name="beta_scale" group="note" />|| <math> \mathbf{\alpha}+\frac{n}{2},\, \mathbf{\beta} + \frac{\sum_{i=1}^n{(x_i-\mu)^2}}{2} </math>
| variance was estimated from <math>2\alpha</math> observations with sample variance <math>\beta/\alpha</math> (i.e. with sum of [[squared deviations]] <math>2\beta</math>, where deviations are from known mean <math>\mu</math>)
| <math>t_{2\alpha'}(\tilde{x}|\mu,\sigma^2 = \beta'/\alpha')</math><ref name="murphy" />
|-
| [[normal distribution|Normal]]<br>with known mean ''μ'' || ''σ''<sup>2</sup> (variance) || [[Scaled inverse chi-squared distribution|Scaled inverse chi-squared]] || <math>\nu,\, \sigma_0^2\!</math>|| <math>\nu+n,\, \frac{\nu\sigma_0^2 + \sum_{i=1}^n (x_i-\mu)^2}{\nu+n}\!</math>
| variance was estimated from <math>\nu</math> observations with sample variance <math>\sigma_0^2</math>
| <math>t_{\nu'}(\tilde{x}|\mu,{\sigma_0^2}')</math><ref name="murphy" />
|-
| [[normal distribution|Normal]]<br>with known mean ''μ'' || ''τ'' (precision) || [[Gamma distribution|Gamma]] || <math>\alpha,\, \beta\!</math>  <ref name="beta_rate" group="note" />|| <math>\alpha + \frac{n}{2},\, \beta + \frac{\sum_{i=1}^n (x_i-\mu)^2}{2}\!</math>
| precision was estimated from <math>2\alpha</math> observations with sample variance <math>\beta/\alpha</math> (i.e. with sum of [[squared deviations]] <math>2\beta</math>, where deviations are from known mean <math>\mu</math>)
| <math>t_{2\alpha'}(\tilde{x}\mid\mu,\sigma^2 = \beta'/\alpha')</math><ref name="murphy" />
|-
| [[Normal distribution|Normal]]<ref group="note">A different conjugate prior for unknown mean and variance, but with a fixed, linear relationship between them, is found in the [[normal variance-mean mixture]], with the [[Generalized inverse Gaussian distribution|generalized inverse Gaussian]] as conjugate mixing distribution.</ref>|| ''μ'' and ''σ<sup>2</sup>''<br>Assuming [[Exchangeable random variables|exchangeability]]|| [[Normal-inverse gamma distribution|Normal-inverse gamma]]
| <math> \mu_0 ,\, \nu ,\, \alpha ,\, \beta</math>|| <math>\frac{\nu\mu_0+n\bar{x}}{\nu+n} ,\, \nu+n,\, \alpha+\frac{n}{2} ,\, </math><br/><math>
\beta + \tfrac{1}{2} \sum_{i=1}^n (x_i - \bar{x})^2 + \frac{n\nu}{\nu+n}\frac{(\bar{x}-\mu_0)^2}{2} </math>
*<math> \bar{x} </math> is the sample mean
| mean was estimated from <math>\nu</math> observations with sample mean <math>\mu_0</math>; variance was estimated from <math>2\alpha</math> observations with sample mean <math>\mu_0</math> and sum of [[squared deviations]] <math>2\beta</math>
| <math>t_{2\alpha'}\left(\tilde{x}\mid\mu',\frac{\beta'(\nu'+1)}{\nu' \alpha'}\right)</math><ref name="murphy" />
|-
| [[Normal distribution|Normal]] || ''μ'' and ''τ''<br>Assuming [[Exchangeable random variables|exchangeability]]|| [[Normal-gamma distribution|Normal-gamma]]
| <math> \mu_0 ,\, \nu ,\, \alpha ,\, \beta</math>|| <math>\frac{\nu\mu_0+n\bar{x}}{\nu+n} ,\, \nu+n,\, \alpha+\frac{n}{2} ,\, </math><br/><math>
\beta + \tfrac{1}{2} \sum_{i=1}^n (x_i - \bar{x})^2 + \frac{n\nu}{\nu+n}\frac{(\bar{x}-\mu_0)^2}{2} </math>
*<math> \bar{x} </math> is the sample mean
| mean was estimated from <math>\nu</math> observations with sample mean <math>\mu_0</math>, and precision was estimated from <math>2\alpha</math> observations with sample mean <math>\mu_0</math> and sum of [[squared deviations]] <math>2\beta</math>
| <math>t_{2\alpha'}\left(\tilde{x}\mid\mu',\frac{\beta'(\nu'+1)}{\alpha'\nu'}\right)</math><ref name="murphy" />
|-
| [[multivariate normal distribution|Multivariate normal]] with known covariance matrix '''''Σ''''' || '''''μ''''' (mean vector) || [[multivariate normal distribution|Multivariate normal]] || <math>\boldsymbol{\boldsymbol\mu}_0,\, \boldsymbol\Sigma_0</math>|| <math>\left(\boldsymbol\Sigma_0^{-1} + n\boldsymbol\Sigma^{-1}\right)^{-1}\left( \boldsymbol\Sigma_0^{-1}\boldsymbol\mu_0 + n \boldsymbol\Sigma^{-1} \mathbf{\bar{x}} \right),</math><br/><math>\left(\boldsymbol\Sigma_0^{-1} + n\boldsymbol\Sigma^{-1}\right)^{-1}</math>
*<math>\mathbf{\bar{x}}</math> is the sample mean
| mean was estimated from observations with total precision (sum of all individual precisions)<math>\boldsymbol\Sigma_0^{-1}</math> and with sample mean <math>\boldsymbol\mu_0</math>
| <math>\mathcal{N}(\tilde{\mathbf{x}}\mid{\boldsymbol\mu_0}', {\boldsymbol\Sigma_0}' +\boldsymbol\Sigma)</math><ref name="murphy" />
|-
| [[multivariate normal distribution|Multivariate normal]] with known precision matrix '''''Λ''''' || '''''μ''''' (mean vector) || [[multivariate normal distribution|Multivariate normal]] || <math>\mathbf{\boldsymbol\mu}_0,\, \boldsymbol\Lambda_0</math>|| <math>\left(\boldsymbol\Lambda_0 + n\boldsymbol\Lambda\right)^{-1}\left( \boldsymbol\Lambda_0\boldsymbol\mu_0 + n \boldsymbol\Lambda \mathbf{\bar{x}} \right),\, \left(\boldsymbol\Lambda_0 + n\boldsymbol\Lambda\right)</math>
*<math>\mathbf{\bar{x}}</math> is the sample mean
| mean was estimated from observations with total precision (sum of all individual precisions)<math>\boldsymbol\Lambda_0</math> and with sample mean <math>\boldsymbol\mu_0</math>
| <math>\mathcal{N}\left(\tilde{\mathbf{x}}\mid{\boldsymbol\mu_0}', {{\boldsymbol\Lambda_0}'}^{-1} + \boldsymbol\Lambda^{-1}\right)</math><ref name="murphy" />
|-
| [[multivariate normal distribution|Multivariate normal]] with known mean '''''μ''''' || '''''Σ''''' (covariance matrix) || [[Inverse-Wishart distribution|Inverse-Wishart]] || <math>\nu ,\, \boldsymbol\Psi</math>|| <math>n+\nu ,\, \boldsymbol\Psi + \sum_{i=1}^n (\mathbf{x_i} - \boldsymbol\mu) (\mathbf{x_i} - \boldsymbol\mu)^T  </math>
| covariance matrix was estimated from <math>\nu</math> observations with sum of pairwise deviation products <math>\boldsymbol\Psi</math>
| <math>t_{\nu'-p+1}\left(\tilde{\mathbf{x}}|\boldsymbol\mu,\frac{1}{\nu'-p+1}\boldsymbol\Psi'\right)</math><ref name="murphy" />
|-
| [[multivariate normal distribution|Multivariate normal]] with known mean '''''μ''''' || '''''Λ''''' (precision matrix) || [[Wishart distribution|Wishart]] || <math>\nu ,\, \mathbf{V}</math>|| <math>n+\nu ,\, \left(\mathbf{V}^{-1} + \sum_{i=1}^n (\mathbf{x_i} - \boldsymbol\mu) (\mathbf{x_i} - \boldsymbol\mu)^T\right)^{-1}  </math>
| covariance matrix was estimated from <math>\nu</math> observations with sum of pairwise deviation products <math>\mathbf{V}^{-1}</math>
| <math>t_{\nu'-p+1}\left(\tilde{\mathbf{x}}\mid\boldsymbol\mu,\frac{1}{\nu'-p+1}{\mathbf{V}'}^{-1}\right)</math><ref name="murphy" />
|-
| [[multivariate normal distribution|Multivariate normal]] || '''''μ''''' (mean vector) and '''''Σ''''' (covariance matrix) || [[normal-inverse-Wishart distribution|normal-inverse-Wishart]] || <math>\boldsymbol\mu_0 ,\, \kappa_0 ,\, \nu_0 ,\, \boldsymbol\Psi</math>|| <math>\frac{\kappa_0\boldsymbol\mu_0+n\mathbf{\bar{x}}}{\kappa_0+n} ,\, \kappa_0+n,\, \nu_0+n ,\,</math><br/><math>  \boldsymbol\Psi + \mathbf{C} + \frac{\kappa_0 n}{\kappa_0+n}(\mathbf{\bar{x}}-\boldsymbol\mu_0)(\mathbf{\bar{x}}-\boldsymbol\mu_0)^T </math>
*<math> \mathbf{\bar{x}} </math> is the sample mean
*<math>\mathbf{C} = \sum_{i=1}^n (\mathbf{x_i} - \mathbf{\bar{x}}) (\mathbf{x_i} - \mathbf{\bar{x}})^T</math>
| mean was estimated from <math>\kappa_0</math> observations with sample mean <math>\boldsymbol\mu_0</math>; covariance matrix was estimated from <math>\nu_0</math> observations with sample mean <math>\boldsymbol\mu_0</math> and with sum of pairwise deviation products <math>\boldsymbol\Psi=\nu_0\boldsymbol\Sigma_0</math>
| <math>t_{{\nu_0}'-p+1}\left(\tilde{\mathbf{x}}|{\boldsymbol\mu_0}',\frac{{\kappa_0}'+1}{{\kappa_0}'({\nu_0}'-p+1)}\boldsymbol\Psi'\right)</math><ref name="murphy" />
|-
| [[multivariate normal distribution|Multivariate normal]] || '''''μ''''' (mean vector) and '''''Λ''''' (precision matrix)|| [[normal-Wishart distribution|normal-Wishart]] || <math>\boldsymbol\mu_0 ,\, \kappa_0 ,\, \nu_0 ,\, \mathbf{V}</math>|| <math>\frac{\kappa_0\boldsymbol\mu_0+n\mathbf{\bar{x}}}{\kappa_0+n} ,\, \kappa_0+n,\, \nu_0+n ,\,</math><br/><math>  \left(\mathbf{V}^{-1} + \mathbf{C} + \frac{\kappa_0 n}{\kappa_0+n}(\mathbf{\bar{x}}-\boldsymbol\mu_0)(\mathbf{\bar{x}}-\boldsymbol\mu_0)^T\right)^{-1} </math>
*<math> \mathbf{\bar{x}} </math> is the sample mean
*<math>\mathbf{C} = \sum_{i=1}^n (\mathbf{x_i} - \mathbf{\bar{x}}) (\mathbf{x_i} - \mathbf{\bar{x}})^T</math>
| mean was estimated from <math>\kappa_0</math> observations with sample mean <math>\boldsymbol\mu_0</math>; covariance matrix was estimated from <math>\nu_0</math> observations with sample mean <math>\boldsymbol\mu_0</math> and with sum of pairwise deviation products <math>\mathbf{V}^{-1}</math>
| <math>t_{{\nu_0}'-p+1}\left(\tilde{\mathbf{x}}\mid {\boldsymbol\mu_0}', \frac{{\kappa_0}'+1}{{\kappa_0}'({\nu_0}'-p+1)}{\mathbf{V}'}^{-1}\right)</math><ref name="murphy" />
|-
| [[Uniform distribution (continuous)|Uniform]] || <math> U(0,\theta)\!</math>|| [[Pareto distribution|Pareto]] || <math> x_{m},\, k\!</math>|| <math> \max\{\,x_1,\ldots,x_n,x_\mathrm{m}\},\, k+n\!</math>
| <math>k</math> observations with maximum value <math>x_m</math>
|
|-
| [[Pareto distribution|Pareto]] <br/>with known minimum ''x''<sub>''m''</sub> || ''k'' (shape) || [[Gamma distribution|Gamma]] || <math>\alpha,\, \beta\!</math>|| <math>\alpha+n,\, \beta+\sum_{i=1}^n \ln\frac{x_i}{x_{\mathrm{m}}}\!</math>
| <math>\alpha</math> observations with sum <math>\beta</math> of the [[order of magnitude]] of each observation (i.e. the logarithm of the ratio of each observation to the minimum <math>x_m</math>)
|
|-
| [[Weibull distribution|Weibull]] <br/>with known shape ''β'' || ''θ'' (scale) || [[inverse-gamma distribution|Inverse gamma]]<ref name="Fink" />|| <math>a, b\!</math>|| <math>a+n,\, b+\sum_{i=1}^n x_i^{\beta}\!</math>
| <math>a</math> observations with sum <math>b</math> of the ''β'''th power of each observation
|
|-
| [[log-normal distribution|Log-normal]]
| colspan="6" | Same as for the normal distribution after applying the natural logarithm to the data for the posterior hyperparameters. Please refer to {{harvtxt|Fink|1997|pp=21–22}} to see the details.
|-
| [[exponential distribution|Exponential]] || ''λ'' (rate) || [[Gamma distribution|Gamma]] || <math>\alpha,\, \beta\!</math>  <ref name="beta_rate" group="note" />|| <math>\alpha+n,\, \beta+\sum_{i=1}^n x_i\!</math>
| <math>\alpha</math> observations that sum to <math>\beta</math> <ref>{{cite book |last1=Liu |first1=Han |url=https://www.stat.cmu.edu/~larry/=sml/Bayes.pdf#page=16 |title=Statistical Machine Learning |last2=Wasserman |first2=Larry |year=2014 |page=314}}</ref>
| <math>\operatorname{Lomax}(\tilde{x}\mid\beta',\alpha')</math><br />([[Lomax distribution]])
|-
| [[Gamma Distribution|Gamma]] <br>with known shape ''α''|| ''β'' (rate) || [[Gamma Distribution|Gamma]] || <math>\alpha_0,\, \beta_0\!</math>||<math>\alpha_0+n\alpha,\, \beta_0+\sum_{i=1}^n x_i\!</math>
| <math>\alpha_0/\alpha</math> observations with sum <math>\beta_0</math>
| <math>\operatorname{CG}(\tilde{\mathbf{x}}\mid\alpha,{\alpha_0}',{\beta_0}')=\operatorname{\beta'}(\tilde{\mathbf{x}}|\alpha,{\alpha_0}',1,{\beta_0}')</math> <ref name="CG" group="note" />
|-
| [[Inverse-gamma distribution|Inverse Gamma]] <br>with known shape ''α''|| ''β'' (inverse scale) || [[Gamma Distribution|Gamma]] || <math>\alpha_0,\, \beta_0\!</math>||<math>\alpha_0+n\alpha,\, \beta_0+\sum_{i=1}^n \frac{1}{x_i}\!</math>
| <math>\alpha_0/\alpha</math> observations with sum <math>\beta_0</math>
|
|-
| [[Gamma Distribution|Gamma]] <br>with known rate ''β''|| ''α'' (shape)
| <math>\propto \frac{a^{\alpha-1} \beta^{\alpha c}}{\Gamma(\alpha)^b}</math>
| <math>a,\, b,\, c\!</math>||<math>a \prod_{i=1}^n x_i,\, b + n,\, c + n\!</math>
| <math>b</math> or <math>c</math> observations (<math>b</math> for estimating <math>\alpha</math>, <math>c</math> for estimating <math>\beta</math>) with product <math>a</math>
|
|-
| [[Gamma Distribution|Gamma]]<ref name="Fink" />|| ''α'' (shape), ''β'' (inverse scale) ||  <math>\propto \frac{p^{\alpha-1} e^{-\beta q}}{\Gamma(\alpha)^r \beta^{-\alpha s}}</math>|| <math>p,\, q,\, r,\, s \!</math>|| <math>p \prod_{i=1}^n x_i,\, q + \sum_{i=1}^n x_i,\, r + n,\, s + n \!</math>
| <math>\alpha</math> was estimated from <math>r</math> observations with product <math>p</math>; <math>\beta</math> was estimated from <math>s</math> observations with sum <math>q</math>
|
|-
| [[Beta Distribution|Beta]]|| ''α'', ''β'' ||  <math>\propto \frac{\Gamma(\alpha+\beta)^k \, p^\alpha \, q^\beta}{\Gamma(\alpha)^k\,\Gamma(\beta)^k}</math>|| <math>p,\, q,\, k \!</math>|| <math>p \prod_{i=1}^n x_i,\, q \prod_{i=1}^n (1-x_i),\, k + n \!</math>
| <math>\alpha</math> and <math>\beta</math> were estimated from <math>k</math> observations with product <math>p</math> and product of the complements <math>q</math>
|
|}