Editing Algorithmic probability (section)

==Overview==

Algorithmic probability is the main ingredient of Solomonoff's theory of inductive inference, the theory of prediction based on observations; it was invented with the goal of using it for machine learning; given a sequence of symbols, which one will come next? Solomonoff's theory provides an answer that is optimal in a certain sense, although it is incomputable. 

Four principal inspirations for Solomonoff's algorithmic probability were: [[Occam's razor]], [[Epicurus#Epistemology|Epicurus' principle of multiple explanations]], modern computing theory (e.g. use of a universal Turing machine) and Bayes’ rule for prediction.<ref>Li and Vitanyi, 2008, p. 347</ref>

Occam's razor and Epicurus' principle are essentially two different non-mathematical approximations of the [[universal prior]].

* Occam's razor: ''among the theories that are consistent with the observed phenomena, one should select the simplest theory''.<ref>Li and Vitanyi, 2008, p. 341</ref>
* Epicurus' principle of multiple explanations: ''if more than one theory is consistent with the observations, keep all such theories''.<ref>Li and Vitanyi, 2008, p. 339.</ref>

At the heart of the universal prior is an abstract model of a computer, such as a universal Turing machine.<ref>Hutter, M., [http://www.scholarpedia.org/article/Algorithmic_information_theory "Algorithmic Information Theory"], Scholarpedia, 2(3):2519.</ref>   Any abstract computer will do, as long as it is Turing-complete, i.e. every [[computable function]] has at least one program that will compute its application on the abstract computer.

The abstract computer is used to give precise meaning to the phrase "simple explanation".  In the formalism used, explanations, or theories of phenomena, are computer programs that generate observation strings when run on the abstract computer.  Each computer program is assigned a weight corresponding to its length. The universal probability distribution is the probability distribution on all possible output strings with random input, assigning for each finite output prefix ''q'' the sum of the probabilities of the programs that compute something starting with ''q''.<ref>Solomonoff, R., "[http://world.std.com/~rjs/kollect.pdf The Kolmogorov Lecture: The Universal Distribution and Machine Learning]" ''The Computer Journal'', Vol 46, No. 6 p 598, 2003.</ref>  Thus, a simple explanation is a short computer program. A complex explanation is a long computer program.  Simple explanations are more likely, so a high-probability observation string is one generated by a short computer program, or perhaps by any of a large number of slightly longer computer programs.  A low-probability observation string is one that can only be generated by a long computer program.

Algorithmic probability is closely related to the concept of [[Kolmogorov complexity]].  Kolmogorov's introduction of complexity was motivated by information theory and problems in randomness, while Solomonoff introduced algorithmic complexity for a different reason: inductive reasoning. A single universal prior probability that can be substituted for each actual prior probability in Bayes's rule was invented by Solomonoff with Kolmogorov complexity as a side product.<ref>Gács, P. and Vitányi, P., "In Memoriam Raymond J. Solomonoff", ''IEEE Information Theory Society Newsletter'', Vol. 61, No. 1, March 2011, p 11.</ref>  It predicts the most likely continuation of that observation, and provides a measure of how likely this continuation will be.{{Citation needed|date=September 2017}}

Solomonoff's enumerable measure is [[Universality (philosophy)|universal]] in a certain powerful sense, but the computation time can be infinite. One way of dealing with this issue is a variant of Leonid Levin's Search Algorithm,<ref>Levin, L.A., "Universal Search Problems", in Problemy Peredaci Informacii 9, pp. 115–116, 1973</ref> which limits the time spent computing the success of possible programs, with shorter programs given more time. When run for longer and longer periods of time, it will generate a sequence of approximations which converge to the universal probability distribution.  Other methods of dealing with the issue include limiting the search space by including training sequences.

Solomonoff proved this distribution to be machine-invariant within a constant factor (called the [[Kolmogorov complexity#Invariance theorem|invariance theorem]]).<ref>Solomonoff, R., "[http://world.std.com/~rjs/publications/solo1.pdf Complexity-Based Induction Systems: Comparisons and Convergence Theorems]," IEEE Trans. on Information Theory, Vol. IT-24, No. 4, pp. 422-432, July 1978</ref>