Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bayesian inference
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Mathematical properties== {{More footnotes needed|section|date=February 2012}} ===Interpretation of factor=== <math display="inline"> \frac{P(E \mid M)}{P(E)} > 1 \Rightarrow P(E \mid M) > P(E)</math>. That is, if the model were true, the evidence would be more likely than is predicted by the current state of belief. The reverse applies for a decrease in belief. If the belief does not change, <math display="inline"> \frac{P(E \mid M)}{P(E)} = 1 \Rightarrow P(E \mid M) = P(E)</math>. That is, the evidence is independent of the model. If the model were true, the evidence would be exactly as likely as predicted by the current state of belief. ===Cromwell's rule=== {{Main|Cromwell's rule}} If <math>P(M) = 0</math> then <math>P(M \mid E) = 0</math>. If <math>P(M) = 1</math> and <math>P(E) > 0</math>, then <math>P(M|E) = 1</math>. This can be interpreted to mean that hard convictions are insensitive to counter-evidence. The former follows directly from Bayes' theorem. The latter can be derived by applying the first rule to the event "not <math>M</math>" in place of "<math>M</math>", yielding "if <math>1 - P(M) = 0</math>, then <math>1 - P(M \mid E) = 0</math>", from which the result immediately follows. ===Asymptotic behaviour of posterior=== Consider the behaviour of a belief distribution as it is updated a large number of times with [[independent and identically distributed]] trials. For sufficiently nice prior probabilities, the [[Bernstein–von Mises theorem|Bernstein-von Mises theorem]] gives that in the limit of infinite trials, the posterior converges to a [[Gaussian distribution]] independent of the initial prior under some conditions firstly outlined and rigorously proven by [[Joseph L. Doob]] in 1948, namely if the random variable in consideration has a finite [[probability space]]. The more general results were obtained later by the statistician [[David A. Freedman (statistician)|David A. Freedman]] who published in two seminal research papers in 1963 <ref>{{cite journal| last1=Freedman|first1=DA|title=On the asymptotic behavior of Bayes' estimates in the discrete case|journal=The Annals of Mathematical Statistics|volume=34|issue=4|date=1963|pages=1386–1403|jstor=2238346|doi=10.1214/aoms/1177703871|doi-access=free}}</ref> and 1965 <ref>{{cite journal|last1=Freedman|first1=DA|title=On the asymptotic behavior of Bayes estimates in the discrete case II|journal=The Annals of Mathematical Statistics|date=1965|volume=36|issue=2|pages=454–456|jstor=2238150|doi=10.1214/aoms/1177700155|doi-access=free}}</ref> when and under what circumstances the asymptotic behaviour of posterior is guaranteed. His 1963 paper treats, like Doob (1949), the finite case and comes to a satisfactory conclusion. However, if the random variable has an infinite but countable [[probability space]] (i.e., corresponding to a die with infinite many faces) the 1965 paper demonstrates that for a dense subset of priors the [[Bernstein–von Mises theorem|Bernstein-von Mises theorem]] is not applicable. In this case there is [[almost surely]] no asymptotic convergence. Later in the 1980s and 1990s [[David A. Freedman (statistician)|Freedman]] and [[Persi Diaconis]] continued to work on the case of infinite countable probability spaces.<ref>{{cite journal|first2=Larry|last2= Wasserman |first1 = James|last1 =Robins|journal = Journal of the American Statistical Association|date = 2000|title = Conditioning, likelihood, and coherence: A review of some foundational concepts|doi=10.1080/01621459.2000.10474344|volume=95|issue=452| pages=1340–1346|s2cid= 120767108 }}</ref> To summarise, there may be insufficient trials to suppress the effects of the initial choice, and especially for large (but finite) systems the convergence might be very slow. ===Conjugate priors=== {{Main|Conjugate prior}} In parameterized form, the prior distribution is often assumed to come from a family of distributions called [[conjugate prior]]s. The usefulness of a conjugate prior is that the corresponding posterior distribution will be in the same family, and the calculation may be expressed in [[Closed-form expression|closed form]]. ===Estimates of parameters and predictions=== It is often desired to use a posterior distribution to estimate a parameter or variable. Several methods of Bayesian estimation select [[central tendency|measurements of central tendency]] from the posterior distribution. For one-dimensional problems, a unique median exists for practical continuous problems. The posterior median is attractive as a [[robust statistics|robust estimator]].<ref>{{cite book|title=Pitman's measure of closeness: A comparison of statistical estimators|first1=Pranab K.|last1=Sen|author-link1=Pranab K. Sen|first2=J. P.|last2=Keating|first3=R. L.|last3= Mason | publisher=SIAM|location=Philadelphia|year=1993}}</ref> If there exists a finite mean for the posterior distribution, then the posterior mean is a method of estimation.<ref>{{Cite book| last1=Choudhuri|first1=Nidhan|last2=Ghosal|first2=Subhashis|last3=Roy|first3=Anindya|date=2005-01-01|chapter=Bayesian Methods for Function Estimation|title=Handbook of Statistics|series=Bayesian Thinking|volume=25|pages=373–414|doi= 10.1016/s0169-7161(05)25013-7 |isbn=9780444515391|citeseerx=10.1.1.324.3052}}</ref> <math display="block">\tilde \theta = \operatorname{E}[\theta] = \int \theta \, p(\theta \mid \mathbf{X},\alpha) \, d\theta</math> Taking a value with the greatest probability defines [[maximum a posteriori estimation|maximum ''a posteriori'' (MAP)]] estimates:<ref>{{Cite web|url=https://www.probabilitycourse.com/chapter9/9_1_2_MAP_estimation.php|title=Maximum A Posteriori (MAP) Estimation|website=www.probabilitycourse.com|language=en|access-date=2017-06-02}}</ref> <math display="block">\{ \theta_{\text{MAP}}\} \subset \arg \max_\theta p(\theta \mid \mathbf{X},\alpha) .</math> There are examples where no maximum is attained, in which case the set of MAP estimates is [[empty set|empty]]. There are other methods of estimation that minimize the posterior ''[[risk]]'' (expected-posterior loss) with respect to a [[loss function]], and these are of interest to [[statistical decision theory]] using the sampling distribution ("frequentist statistics").<ref>{{Cite web|url=http://www.cogsci.ucsd.edu/~ajyu/Teaching/Tutorials/bayes_dt.pdf|title=Introduction to Bayesian Decision Theory|last=Yu|first=Angela|website=cogsci.ucsd.edu/|archive-url=https://web.archive.org/web/20130228060536/http://www.cogsci.ucsd.edu/~ajyu/Teaching/Tutorials/bayes_dt.pdf|archive-date=2013-02-28|url-status=dead}}</ref> The [[posterior predictive distribution]] of a new observation <math>\tilde{x}</math> (that is independent of previous observations) is determined by<ref>{{Cite web|url=http://people.stat.sc.edu/Hitchcock/stat535slidesday18.pdf|title=Posterior Predictive Distribution Stat Slide|last=Hitchcock|first=David|website=stat.sc.edu}}</ref> <math display="block">p(\tilde{x}|\mathbf{X},\alpha) = \int p(\tilde{x},\theta \mid \mathbf{X},\alpha) \, d\theta = \int p(\tilde{x} \mid \theta) p(\theta \mid \mathbf{X},\alpha) \, d\theta .</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)