Editing Principle of maximum entropy (section)

==Applications==
The principle of maximum entropy is commonly applied in two ways to inferential problems:

===Prior probabilities===
The principle of maximum entropy is often used to obtain [[prior probability|prior probability distributions]] for [[Bayesian inference]]. Jaynes was a strong advocate of this approach, claiming the maximum entropy distribution represented the least informative distribution.<ref>{{cite journal
 |last=Jaynes |first=E. T. |author-link = Edwin Thompson Jaynes
 |year=1968
 |url=http://bayes.wustl.edu/etj/articles/brandeis.pdf
 |title=Prior Probabilities
 |journal=IEEE Transactions on Systems Science and Cybernetics
 |volume=4 |issue=3 |pages=227–241
 |doi=10.1109/TSSC.1968.300117
}}</ref>
A large amount of literature is now dedicated to the elicitation of maximum entropy priors and links with [[channel coding]].<ref>{{cite journal
 |last=Clarke |first=B.
 |year=2006
 |title=Information optimality and Bayesian modelling
 |journal=[[Journal of Econometrics]]
 |volume=138 |issue=2 |pages=405–429
 |doi=10.1016/j.jeconom.2006.05.003 
}}</ref><ref>{{cite journal
 |doi=10.2307/2669786
 |last=Soofi |first=E.S. 
 |year=2000
 |title=Principal Information Theoretic Approaches
 |journal=[[Journal of the American Statistical Association]]
 |volume=95 |issue=452 |pages=1349–1353
 |mr=1825292 |jstor=2669786
}}</ref><ref>{{cite journal
 |last=Bousquet |first=N. 
 |year=2008
 |title=Eliciting vague but proper maximal entropy priors in Bayesian experiments
 |journal=Statistical Papers
 |volume=51
 |issue=3
 |doi=10.1007/s00362-008-0149-9 
 |pages=613–628
|s2cid=119657859 
 }}</ref><ref>{{Cite journal|title = Objective priors from maximum entropy in data classification|journal = Information Fusion|date = 2013-04-01|pages = 186–198|volume = 14|issue = 2|doi = 10.1016/j.inffus.2012.01.012|first1 = Francesco A. N.|last1 = Palmieri|first2 = Domenico|last2 = Ciuonzo|citeseerx = 10.1.1.387.4515}}</ref>

===Posterior probabilities===
Maximum entropy is a sufficient updating rule for [[radical probabilism]]. [[Richard Jeffrey]]'s [[probability kinematics]] is a special case of [[maximum entropy inference]]. However, maximum entropy is not a generalisation of all such sufficient updating rules.<ref>{{ cite journal | last=Skyrms, B | author-link=Brian Skyrms | year=1987 | title=Updating, supposing and MAXENT | journal=Theory and Decision | volume=22 | issue=3 | pages=225–46| doi=10.1007/BF00134086 | s2cid=121847242 }}</ref>

===Maximum entropy models===
Alternatively, the principle is often invoked for model specification: in this case the observed data itself is assumed to be the testable information. Such models are widely used in [[natural language processing]]. An example of such a model is [[logistic regression]], which corresponds to the [[maximum entropy classifier]] for independent observations.

===Probability density estimation ===
One of the main applications of the maximum entropy principle is in discrete and continuous [[density estimation]].<ref name="BK08">{{cite journal
 |last1=Botev |first1=Z. I. | last2=Kroese | first2=D. P.
 |year=2008
 |title=Non-asymptotic Bandwidth Selection for Density Estimation of Discrete Data
 |journal=Methodology and Computing in Applied Probability
 |volume=10 |issue=3 |pages=435
 |doi=10.1007/s11009-007-9057-z
|s2cid=122047337 }}</ref><ref name="BK11">{{cite journal
 |last1=Botev |first1=Z. I. | last2=Kroese | first2=D. P.
 |year=2011
 |title=The Generalized Cross Entropy Method, with Applications to Probability Density Estimation
 |journal=Methodology and Computing in Applied Probability
 |volume=13 |issue=1 |pages=1–27
 |doi=10.1007/s11009-009-9133-7
|s2cid=18155189 |url=http://espace.library.uq.edu.au/view/UQ:200564/UQ200564_preprint.pdf }}</ref>
Similar to [[support vector machine]] estimators, 
the maximum entropy principle may require the solution to a [[quadratic programming]] problem, and thus provide 
a sparse mixture model as the optimal density estimator. One important advantage of the method is its ability to incorporate prior information in the density estimation.<ref>{{cite book
 |last1=Kesavan|first1=H. K. | last2=Kapur  | first2=J. N.
 |year=1990
 |contribution=Maximum Entropy and Minimum Cross-Entropy Principles
 |title=Maximum Entropy and Bayesian Methods
 |url=https://archive.org/details/maximumentropyba00jayn_552|url-access=limited| editor-last= Fougère | editor-first= P. F. |pages=[https://archive.org/details/maximumentropyba00jayn_552/page/n418 419]–432
 |doi=10.1007/978-94-009-0683-9_29 |isbn=978-94-010-6792-8 }}</ref>