Editing Kolmogorov complexity

{{short description|Measure of algorithmic complexity}}
[[Image:Mandelpart2 red.png|right|thumb|upright=1.4|This image illustrates part of the [[Mandelbrot set]] [[fractal]]. Simply storing the 24-bit color of each pixel in this image would require 23 million bytes<!--3200 × 2400 × 3 = 23.04e6-->, but a small computer program can reproduce these 23 MB using the definition of the Mandelbrot set, the corner coordinates of the image and the parameters of the color mapping. Thus, the Kolmogorov complexity of this image is much less than 23 MB in any pragmatic [[model of computation]]. [[Portable Network Graphics|PNG]]'s general-purpose image compression only reduces it to 1.6 MB, smaller than the raw data but much larger than the Kolmogorov complexity.]]

In [[algorithmic information theory]] (a subfield of [[computer science]] and [[mathematics]]), the '''Kolmogorov complexity''' of an object, such as a piece of text, is the length of a shortest [[computer program]] (in a predetermined [[programming language]]) that produces the object as output. It is a measure of the [[computation]]al resources needed to specify the object, and is also known as '''algorithmic complexity''', '''Solomonoff–Kolmogorov–Chaitin complexity''', '''program-size complexity''', '''descriptive complexity''', or '''algorithmic entropy'''. It is named after [[Andrey Kolmogorov]], who first published on the subject in 1963<ref>{{cite journal |author-link=Andrey Kolmogorov |first=Andrey|last=Kolmogorov |date=Dec 1963 |title=On Tables of Random Numbers| journal=Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) |volume=25 |issue=4 |pages=369–375 |mr=178484 |jstor=25049284 |issn=0581-572X }}</ref><ref>{{cite journal|author-link=Andrey Kolmogorov|first=Andrey|last=Kolmogorov|year=1998|title=On Tables of Random Numbers| journal=Theoretical Computer Science|volume=207|issue=2|pages=387–395|doi=10.1016/S0304-3975(98)00075-9 |mr=1643414|doi-access=free}}</ref> and is a generalization of classical information theory.

The notion of Kolmogorov complexity can be used to state and [[Proof of impossibility|prove impossibility]] results akin to [[Cantor's diagonal argument]], [[Gödel's incompleteness theorem]], and [[halting problem|Turing's halting problem]].
In particular, no program ''P'' computing a [[lower bound]] for each text's Kolmogorov complexity can return a value essentially larger than ''P''<nowiki>'</nowiki>s own length (see section {{slink||Chaitin's incompleteness theorem}}); hence no single program can compute the exact Kolmogorov complexity for infinitely many texts. Kolmogorov complexity is the length of the ultimately compressed version of a file (i.e., anything which can be put in a computer). Formally, it is the length of a shortest program from which the file can be reconstructed. While Kolmogorov complexity is uncomputable, various approaches have been proposed and reviewed.<ref name="zenil20202">{{cite journal |last1=Zenil |first1=Hector |year=2020 |title=A Review of Methods for Estimating Algorithmic Complexity: Options, Challenges, and New Directions |journal=Entropy |volume=22 |issue=6 |pages=612 |doi=10.3390/e22060612 |doi-access=free |pmid=33286384 |pmc=7517143 }}</ref>

==Definition==

=== Intuition ===
Consider the following two [[string (computer science)|strings]] of 32 lowercase letters and digits:

:  <code>abababababababababababababababab</code> , and
:  <code>4c1j5b2p0cv4w1x8rx2y39umgw5q85s7</code>

The first string has a short English-language description, namely "write ab 16 times", which consists of '''17''' characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, i.e., "write 4c1j5b2p0cv4w1x8rx2y39umgw5q85s7" which has '''38''' characters. Hence the operation of writing the first string can be said to have "less complexity" than writing the second.

More formally, the [[complexity]] of a string is the length of the shortest possible description of the string in some fixed [[Turing complete|universal]] description language (the sensitivity of complexity relative to the choice of description language is discussed below). It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Strings like the ''abab'' example above, whose Kolmogorov complexity is small relative to the string's size, are not considered to be complex.

The Kolmogorov complexity can be defined for any mathematical object, but for simplicity the scope of this article is restricted to strings. We must first specify a description language for strings.  Such a description language can be based on any computer programming language, such as [[Lisp programming language|Lisp]], [[Pascal (programming language)|Pascal]], or [[Java (programming language)|Java]].  If '''P''' is a program which outputs a string ''x'', then '''P''' is a description of ''x''. The length of the description is just the length of '''P''' as a character string, multiplied by the number of bits in a character (e.g., 7 for [[ASCII]]).

We could, alternatively, choose an encoding for [[Turing machine]]s, where an ''encoding'' is a function which associates to each Turing Machine '''M''' a bitstring <'''M'''>. If '''M''' is a Turing Machine which, on input ''w'', outputs string ''x'', then the concatenated string <'''M'''> ''w'' is a description of ''x''. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed.

Any string ''s'' has at least one description. For example, the second string above is output by the [[pseudo-code]]:

 '''function''' GenerateString2()
     '''return''' "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"

whereas the first string is output by the (much shorter) pseudo-code:

 '''function''' GenerateString1()
     '''return''' "ab" × 16

If a description ''d''(''s'') of a string ''s'' is of minimal length (i.e., using the fewest bits), it is called a '''minimal description''' of ''s'', and the length of ''d''(''s'') (i.e. the number of bits in the minimal description) is the '''Kolmogorov complexity''' of ''s'', written ''K''(''s''). Symbolically,

:''K''(''s'') = |''d''(''s'')|.

The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the ''invariance theorem'').

=== Plain Kolmogorov complexity ''C'' ===
There are two definitions of Kolmogorov complexity: ''plain'' and ''prefix-free''. The plain complexity is the minimal description length of any program, and denoted <math>C(x)</math> while the prefix-free complexity is the minimal description length of any program encoded in a [[prefix code|prefix-free code]], and denoted <math>K(x)</math>. The plain complexity is more intuitive, but the prefix-free complexity is easier to study.

By default, all equations hold only up to an additive constant. For example, <math>f(x) = g(x)</math> really means that <math>f(x) = g(x) + O(1)</math>, that is, <math>\exists c, \forall x, |f(x) - g(x)| \leq c</math>.

Let <math>U: 2^* \to 2^*</math> be a computable function mapping finite binary strings to binary strings. It is a universal function if, and only if, for any computable <math>f: 2^* \to 2^*</math>, we can encode the function in a "program" <math>s_f</math>, such that <math>\forall x \in 2^*, U(s_fx) = f(x) </math>. We can think of <math>U</math> as a program interpreter, which takes in an initial segment describing the program, followed by data that the program should process.

One problem with plain complexity is that <math>C(xy) \not < C(x) + C(y)</math>, because intuitively speaking, there is no general way to tell where to divide an output string just by looking at the concatenated string. We can divide it by specifying the length of <math>x</math> or <math>y</math>, but that would take <math>O(\min(\ln x, \ln y))</math> extra symbols. Indeed, for any <math>c > 0</math> there exists <math>x, y</math> such that <math>C(xy) \geq C(x) + C(y) + c</math>.<ref>(Downey and Hirschfeldt, 2010), Theorem 3.1.4</ref>

Typically, inequalities with plain complexity have a term like <math>O(\min(\ln x, \ln y))</math> on one side, whereas the same inequalities with prefix-free complexity have only <math>O(1)</math>.

The main problem with plain complexity is that there is something extra sneaked into a program. A program not only represents for something with its code, but also represents its own length. In particular, a program <math>x</math> may represent a binary number up to <math>\log_2 |x|</math>, simply by its own length. Stated in another way, it is as if we are using a termination symbol to denote where a word ends, and so we are not using 2 symbols, but 3. To fix this defect, we introduce the prefix-free Kolmogorov complexity.<ref>(Downey and Hirschfeldt, 2010), Section 3.5</ref>

=== Prefix-free Kolmogorov complexity ''K'' ===
A prefix-free code is a subset of <math>2^*</math> such that given any two different words <math>x, y</math> in the set, neither is a prefix of the other. The benefit of a prefix-free code is that we can build a machine that reads words from the code forward in one direction, and as soon as it reads the last symbol of the word, it ''knows'' that the word is finished, and does not need to backtrack or a termination symbol.

Define a '''prefix-free Turing machine''' to be a Turing machine that comes with a prefix-free code, such that the Turing machine can read any string from the code in one direction, and stop reading as soon as it reads the last symbol. Afterwards, it may compute on a work tape and write to a write tape, but it cannot move its read-head anymore.

This gives us the following formal way to describe ''K''.<ref name=":0">{{Cite journal |last=Hutter |first=Marcus |author-link=Marcus Hutter |date=2007-03-06 |title=Algorithmic information theory |journal=Scholarpedia |language=en |volume=2 |issue=3 |pages=2519 |doi=10.4249/scholarpedia.2519 |doi-access=free |bibcode=2007SchpJ...2.2519H |issn=1941-6016|hdl=1885/15015 |hdl-access=free }}</ref>

* Fix a prefix-free universal Turing machine, with three tapes: a read tape infinite in one direction, a work tape infinite in two directions, and a write tape infinite in one direction.
* The machine can read from the read tape in one direction only (no backtracking), and write to the write tape in one direction only. It can read and write the work tape in both directions.
* The work tape and write tape start with all zeros. The read tape starts with an input prefix code, followed by all zeros.
* Let <math>S</math> be the prefix-free code on <math>2^*</math>, used by the universal Turing machine.

Note that some universal Turing machines may not be programmable with prefix codes. We must pick only a prefix-free universal Turing machine.

The prefix-free complexity of a string <math>x</math> is the shortest prefix code that makes the machine output <math>x</math>:<math display="block">K(x) := \min\{|c| : c \in S, U(c) = x\}</math>

==Invariance theorem==

===Informal treatment===
There are some description languages which are optimal, in the following sense: given any description of an object in a description language, said description may be used in the optimal description language with a constant overhead. The constant depends only on the languages involved, not on the description of the object, nor the object being described.

Here is an example of an optimal description language. A description will have two parts:

* The first part describes another description language.
* The second part is a description of the object in that language.

In more technical terms, the first part of a description is a computer program (specifically: a compiler for the object's language, written in the description language), with the second part being the input to that computer program which produces the object as output.

'''The invariance theorem follows:''' Given any description language ''L'', the optimal description language is at least as efficient as ''L'', with some constant overhead.

'''Proof:''' Any description ''D'' in ''L'' can be converted into a description in the optimal language by first describing ''L'' as a computer program ''P'' (part 1), and then using the original description ''D'' as input to that program (part 2). The
total length of this new description ''D&prime;'' is (approximately):

:|''D&prime;'' | = |''P''| + |''D''|

The length of ''P'' is a constant that doesn't depend on ''D''. So, there is at most a constant overhead, regardless of the object described. Therefore, the optimal language is universal [[up to]] this additive constant.

===A more formal treatment===
'''Theorem''': If ''K''<sub>1</sub> and ''K''<sub>2</sub> are the complexity functions relative to [[Turing complete]] description languages ''L''<sub>1</sub> and ''L''<sub>2</sub>, then there is a constant ''c''&nbsp;– which depends only on the languages ''L''<sub>1</sub> and ''L''<sub>2</sub> chosen&nbsp;– such that

:∀''s''.  −''c'' ≤ ''K''<sub>1</sub>(''s'') − ''K''<sub>2</sub>(''s'') ≤ ''c''.

'''Proof''': By symmetry, it suffices to prove that there is some constant ''c'' such that for all strings ''s''

:''K''<sub>1</sub>(''s'') ≤ ''K''<sub>2</sub>(''s'') + ''c''.

Now, suppose there is a program in the language ''L''<sub>1</sub> which acts as an [[interpreter (computing)|interpreter]] for ''L''<sub>2</sub>:

 '''function''' InterpretLanguage('''string''' ''p'')

where ''p'' is a program in ''L''<sub>2</sub>. The interpreter is characterized by the following property:

: Running <code>InterpretLanguage</code> on input ''p'' returns the result of running ''p''.

Thus, if '''P''' is a program in ''L''<sub>2</sub> which is a minimal description of ''s'', then <code>InterpretLanguage</code>('''P''') returns the string ''s''. The length of this description of ''s'' is the sum of

# The length of the program <code>InterpretLanguage</code>, which we can take to be the constant ''c''.
# The length of '''P''' which by definition is ''K''<sub>2</sub>(''s'').

This proves the desired upper bound.

==History and context==
[[Algorithmic information theory]] is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other [[data structure]]s).

The concept and theory of Kolmogorov Complexity is based on a crucial theorem first discovered by [[Ray Solomonoff]], who published it in 1960, describing it in "A Preliminary Report on a General Theory of Inductive Inference"<ref>{{cite report |author-link=Ray Solomonoff | last=Solomonoff |first= Ray | url=http://world.std.com/~rjs/rayfeb60.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://world.std.com/~rjs/rayfeb60.pdf |archive-date=2022-10-09 |url-status=live | title=A Preliminary Report on a General Theory of Inductive Inference | work = Report V-131 | date= February 4, 1960|id=[http://world.std.com/~rjs/z138.pdf Revision] published November 1960}}</ref> as part of his invention of [[algorithmic probability]]. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part 1 and Part 2 in ''Information and Control''.<ref>{{cite journal
| doi=10.1016/S0019-9958(64)90223-2
| last=Solomonoff
| first= Ray
| title=A Formal Theory of Inductive Inference Part I
| journal = Information and Control
| url=http://world.std.com/~rjs/1964pt1.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://world.std.com/~rjs/1964pt1.pdf |archive-date=2022-10-09 |url-status=live
| volume=7
| issue= 1
| pages= 1–22
| date=March 1964
| doi-access=free
}}</ref><ref>{{cite journal
| doi=10.1016/S0019-9958(64)90131-7
| last=Solomonoff
|first= Ray
| title=A Formal Theory of Inductive Inference Part II
| journal = Information and Control
| url=http://world.std.com/~rjs/1964pt2.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://world.std.com/~rjs/1964pt2.pdf |archive-date=2022-10-09 |url-status=live
|volume=7
|issue= 2
|pages= 224–254
|date=June 1964
|doi-access=free
}}</ref>

Andrey Kolmogorov later [[multiple discovery|independently published]] this theorem in ''Problems Inform. Transmission''<ref>{{cite journal|volume=1 |issue=1 |year=1965 |pages=1–7 |title=Three Approaches to the Quantitative Definition of Information |url=http://www.ece.umd.edu/~abarg/ppi/contents/1-65-abstracts.html#1-65.2 |journal=Problems Inform. Transmission |first=A.N. |last=Kolmogorov |url-status=dead |archive-url=https://web.archive.org/web/20110928032821/http://www.ece.umd.edu/~abarg/ppi/contents/1-65-abstracts.html |archive-date=September 28, 2011 }}</ref> in 1965. [[Gregory Chaitin]] also presents this theorem in ''J. ACM''&nbsp;– Chaitin's paper was submitted October 1966 and revised in December 1968, and cites both Solomonoff's and Kolmogorov's papers.<ref>{{cite journal | last1 = Chaitin | first1 = Gregory J. | title = On the Simplicity and Speed of Programs for Computing Infinite Sets of Natural Numbers| journal = Journal of the ACM | volume = 16 | pages = 407–422| year = 1969 | doi = 10.1145/321526.321530 | issue = 3| citeseerx = 10.1.1.15.3821 | s2cid = 12584692 }}</ref>

The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm and the code lengths it allows to define a "universal probability" of a string on which inductive inference of the subsequent digits of the string can be based. Kolmogorov used this theorem to define several functions of strings, including complexity, randomness, and information.

When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority.<ref>{{cite journal | last1=Kolmogorov | first1=A. | title=Logical basis for information theory and probability theory | journal=IEEE Transactions on Information Theory | volume=14|issue=5 | pages=662–664 | year=1968 | doi =10.1109/TIT.1968.1054210 | s2cid=11402549 }}</ref> For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence, while Algorithmic Probability became associated with Solomonoff, who focused on prediction using his invention of the universal prior probability distribution. The broader area encompassing descriptional complexity and probability is often called Kolmogorov complexity. The computer scientist [[Ming Li]] considers this an example of the [[Matthew effect (sociology)|Matthew effect]]: "...to everyone who has, more will be given..."<ref>{{Cite book |doi=10.1007/978-0-387-49820-1_1 |chapter=Preliminaries |title=An Introduction to Kolmogorov Complexity and its Applications |url=https://archive.org/details/introductiontoko00limi_695 |url-access=limited |pages=[https://archive.org/details/introductiontoko00limi_695/page/n23 1]–99 |series=Texts in Computer Science |year=2008 |last1=Li |first1=Ming |last2=Vitányi |first2=Paul |isbn=978-0-387-33998-6 }}</ref>

There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on [[self-delimiting program]]s, and is mainly due to [[Leonid Levin]] (1974).

An axiomatic approach to Kolmogorov complexity based on [[Blum axioms]] (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov.<ref name=Burgin1982>{{cite journal |last=Burgin |first=M. |year=1982 |title=Generalized Kolmogorov complexity and duality in theory of computations |journal=Notices of the Russian Academy of Sciences |volume=25 |issue=3 |pages=19–23 |url=https://www.mathnet.ru/eng/dan45265}}</ref>

In the late 1990s and early 2000s, methods developed to approximate Kolmogorov complexity relied on popular compression algorithms like LZW,<ref name="zenil20202">{{cite journal |last1=Zenil |first1=Hector |year=2020 |title=A Review of Methods for Estimating Algorithmic Complexity: Options, Challenges, and New Directions |journal=Entropy |volume=22 |issue=6 |pages=612 |doi=10.3390/e22060612 |doi-access=free |pmid=33286384 |pmc=7517143 }}</ref> which made difficult or impossible to provide any estimation to short strings until a method based on [[Algorithmic probability]] was introduced,<ref>{{cite journal |last1=Delahaye |first1=Jean-Paul |last2=Zenil |first2=Hector |title=Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness |journal=Applied Mathematics and Computation |volume=219 |issue=1 |pages=63–77 |year=2012 |doi=10.1016/j.amc.2011.09.020 |url=https://www.sciencedirect.com/science/article/pii/S0096300311012367 }}</ref><ref>{{cite journal |last1=Soler-Toscano |first1=Fernando |last2=Zenil |first2=Hector |last3=Delahaye |first3=Jean-Paul |last4=Gauvrit |first4=Nicolas |title=Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines |journal=PLOS ONE |volume=9 |issue=5 |pages=74–85 |year=2014 |doi=10.1371/journal.pone.0096223 |doi-access=free |pmid=24787763 |pmc=4013017 |bibcode=2014PLoSO...996223S }}</ref> offering the only alternative to compression-based methods.<ref>Zenil, Hector; Soler Toscano, Fernando; Gauvrit, Nicolas (2022). "Methods and Applications of Algorithmic Complexity: Beyond Statistical Lossless Compression". <i>Emergence, Complexity and Computation</i>. Springer Berlin, Heidelberg. doi:10.1007/978-3-662-64985-5. ISBN 978-3-662-64983-1.
</ref>

==Basic results==
We write <math>K(x, y)</math> to be <math>K((x,y))</math>, where <math>(x, y)</math> means some fixed way to code for a tuple of strings x and y.

=== Inequalities ===
We omit additive factors of <math>O(1)</math>. This section is based on.<ref name=":0" />

'''Theorem.''' <math>K(x) \leq C(x) + 2\log_2 C(x) </math>

'''Proof.''' Take any program for the universal Turing machine used to define plain complexity, and convert it to a prefix-free program by first coding the length of the program in binary, then convert the length to prefix-free coding. For example, suppose the program has length 9, then we can convert it as follows:<math display="block">9 \mapsto 1001 \mapsto 11-00-00-11-\color{red}{01} </math>where we double each digit, then add a termination code. The prefix-free universal Turing machine can then read in any program for the other machine as follows:<math display="block">[\text{code for simulating the other machine}][\text{coded length of the program}][\text{the program}]</math>The first part programs the machine to simulate the other machine, and is a constant overhead <math>O(1)</math>. The second part has length <math>\leq 2 \log_2 C(x) + 3</math>. The third part has length <math>C(x)</math>.

'''Theorem''': There exists <math>c</math> such that <math>\forall x, C(x) \leq |x| + c</math>. More succinctly, <math>C(x) \leq |x|</math>. Similarly, <math>K(x) \leq |x| + 2\log_2 |x|</math>, and <math>K(x | |x|) \leq |x| </math>.{{clarify|reason=This appears to be the first use of conditional complexity notation. Its definition should occur before here. Moreover, notation could better be changed to avoid confusion with string length (maybe, string length is better denoted by '#x').|date=January 2024}}

'''Proof.''' For the plain complexity, just write a program that simply copies the input to the output. For the prefix-free complexity, we need to first describe the length of the string, before writing out the string itself.

'''Theorem. (extra information bounds, subadditivity)'''

* <math>K(x | y) \leq K(x) \leq K(x, y) \leq \max( K(x|y) + K(y), K(y|x) + K(x)) \leq K(x) + K(y) </math>
* <math>K(xy) \leq K(x, y)</math>
Note that there is no way to compare <math>K(xy)</math> and <math>K(x|y)</math> or <math>K(x)</math> or <math>K(y|x)</math> or <math>K(y)</math>. There are strings such that the whole string <math>xy</math> is easy to describe, but its substrings are very hard to describe.

'''Theorem. (symmetry of information)''' <math>K(x, y) = K(x | y, K(y)) + K(y)  = K(y, x)</math>.

'''Proof.''' One side is simple. For the other side with <math>K(x, y) \geq K(x | y, K(y)) + K(y)</math>, we need to use a counting argument (page 38 <ref>{{Cite book |last=Hutter |first=Marcus |title=Universal artificial intelligence: sequential decisions based on algorithmic probability |date=2005 |publisher=Springer |isbn=978-3-540-26877-2 |series=Texts in theoretical computer science |location=Berlin New York}}</ref>).

'''Theorem. (information non-increase)''' For any computable function <math>f</math>, we have <math>K(f(x)) \leq K(x) + K(f)</math>.

'''Proof.''' Program the Turing machine to read two subsequent programs, one describing the function and one describing the string. Then run both programs on the work tape to produce <math>f(x)</math>, and write it out.

===Uncomputability of Kolmogorov complexity===

==== A naive attempt at a program to compute ''K'' ====

At first glance it might seem trivial to write a program which can compute ''K''(''s'') for any ''s'', such as the following:

 '''function''' KolmogorovComplexity('''string''' s)
     '''for''' i = 1 '''to''' infinity:
         '''for each''' string p '''of''' length exactly i
             '''if''' isValidProgram(p) '''and''' evaluate(p) == s
                 '''return''' i

This program iterates through all possible programs (by iterating through all possible strings and only considering those which are valid programs), starting with the shortest. Each program is executed to find the result produced by that program, comparing it to the input ''s''. If the result matches then the length of the program is returned.

However this will not work because some of the programs ''p'' tested will not terminate, e.g. if they contain infinite loops. There is no way to avoid all of these programs by testing them in some way before executing them due to the non-computability of the [[halting problem]].

What is more, no program at all can compute the function ''K'', be it ever so sophisticated. This is proven in the following.

==== Formal proof of uncomputability of ''K''====

'''Theorem''': There exist strings of arbitrarily large Kolmogorov complexity. Formally: for each natural number ''n'', there is a string ''s'' with ''K''(''s'') ≥ ''n''.<ref group="note">However, an ''s'' with ''K''(''s'') = ''n'' need not exist for every ''n''. For example, if ''n'' is not a multiple of 7, no [[ASCII]] program can have a length of exactly ''n'' bits.</ref>

'''Proof:''' Otherwise all of the infinitely many possible finite strings could be generated by the finitely many<ref group="note">There are 1 + 2 + 2<sup>2</sup> + 2<sup>3</sup> + ... + 2<sup>''n''</sup> = 2<sup>''n''+1</sup> − 1 different program texts of length up to ''n'' bits; cf. [[geometric series]]. If program lengths are to be multiples of 7 bits, even fewer program texts exist.</ref> programs with a complexity below ''n'' bits.

'''Theorem''':  ''K'' is not a [[computable function]]. In other words, there is no program which takes any string ''s'' as input and produces the integer ''K''(''s'') as output.

The following [[proof by contradiction|'''proof''' by contradiction]] uses a simple [[Pascal (programming language)|Pascal]]-like language to denote programs; for sake of proof simplicity assume its description (i.e. an [[interpreter (computing)|interpreter]]) to have a length of {{val|1400000}} bits.
Assume for contradiction there is a program

 '''function''' KolmogorovComplexity('''string''' s)

which takes as input a string ''s'' and returns ''K''(''s''). All programs are of finite length so, for sake of proof simplicity, assume it to be {{val|7000000000}} bits.
Now, consider the following program of length {{val|1288}} bits:

 '''function''' GenerateComplexString()
     '''for''' i = 1 '''to''' infinity:
         '''for each''' string s '''of''' length exactly i
             '''if''' KolmogorovComplexity(s) ≥ 8000000000
                 '''return''' s

Using <code>KolmogorovComplexity</code> as a subroutine, the program tries every string, starting with the shortest, until it returns a string with Kolmogorov complexity at least {{val|8000000000}} bits,<ref group="note">By the previous theorem, such a string exists, hence the <code>for</code> loop will eventually terminate.</ref> i.e. a string that cannot be produced by any program shorter than {{val|8000000000}} bits. However, the overall length of the above program that produced ''s'' is only {{val|7001401288}} bits,<ref group=note>including the language interpreter and the subroutine code for <code>KolmogorovComplexity</code></ref> which is a contradiction. (If the code of <code>KolmogorovComplexity</code> is shorter, the contradiction remains. If it is longer, the constant used in <code>GenerateComplexString</code> can always be changed appropriately.)<ref group=note>If <code>KolmogorovComplexity</code> has length ''n'' bits, the constant ''m'' used in <code>GenerateComplexString</code> needs to be adapted to satisfy {{nowrap|''n'' + {{val|1400000}} + {{val|1218}} + 7·log<sub>10</sub>(''m'') < ''m''}}, which is always possible since ''m'' grows faster than log<sub>10</sub>(''m'').</ref>

The above proof uses a contradiction similar to that of the [[Berry paradox]]: "<sub>{{color|#8080ff|1}}</sub>The <sub>{{color|#8080ff|2}}</sub>smallest <sub>{{color|#8080ff|3}}</sub>positive <sub>{{color|#8080ff|4}}</sub>integer <sub>{{color|#8080ff|5}}</sub>that <sub>{{color|#8080ff|6}}</sub>cannot <sub>{{color|#8080ff|7}}</sub>be <sub>{{color|#8080ff|8}}</sub>defined <sub>{{color|#8080ff|9}}</sub>in <sub>{{color|#8080ff|10}}</sub>fewer <sub>{{color|#8080ff|11}}</sub>than <sub>{{color|#8080ff|12}}</sub>twenty <sub>{{color|#8080ff|13}}</sub>English <sub>{{color|#8080ff|14}}</sub>words". It is also possible to show the non-computability of ''K'' by reduction from the non-computability of the halting problem ''H'', since ''K'' and ''H'' are [[turing degree#Turing equivalence|Turing-equivalent]].<ref>Stated without proof in: {{cite web |url=http://www.daimi.au.dk/~bromille/DC05/Kolmogorov.pdf |title=Course notes for Data Compression - Kolmogorov complexity |archive-url=https://web.archive.org/web/20090909132048/http://www.daimi.au.dk/~bromille/DC05/Kolmogorov.pdf |archive-date=2009-09-09 |year=2005 |author=P. B. Miltersen |page=7 | url-status=dead}}</ref>

There is a corollary, humorously called the "[[full employment theorem]]" in the programming language community, stating that there is no perfect size-optimizing compiler.

===Chain rule for Kolmogorov complexity===
{{Main| Chain rule for Kolmogorov complexity}}
The chain rule<ref>{{cite journal
  | first = A.
  | last = Zvonkin
  |author2=L. Levin
  | title = The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms.
  | journal = Russian Mathematical Surveys
  | volume = 25
  | number = 6
  | pages = 83–124
  | year = 1970
  | doi = 10.1070/RM1970v025n06ABEH001269
 | bibcode = 1970RuMaS..25...83Z
 | s2cid = 250850390
 | url = http://alexander.shen.free.fr/library/Zvonkin_Levin_70.pdf}}</ref> for Kolmogorov complexity states that there exists a constant ''c'' such that for all ''X'' and ''Y'':

:''K''(''X'',''Y'') = ''K''(''X'') + ''K''(''Y''|''X'') + c*max(1,log(''K''(''X'',''Y''))).

It states that the shortest program that reproduces ''X'' and ''Y'' is [[Big-O notation|no more]] than a logarithmic term larger than a program to reproduce ''X'' and a program to reproduce ''Y'' given ''X''. Using this statement, one can define [[Mutual information#Absolute mutual information|an analogue of mutual information for Kolmogorov complexity]].

==Compression==
It is straightforward to compute upper bounds for ''K''(''s'')&nbsp;– simply [[data compression|compress]] the string ''s'' with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the length of the resulting string&nbsp;– concretely, the size of a [[self-extracting archive]] in the given language.

A string ''s'' is compressible by a number ''c'' if it has a description whose length does not exceed |''s''| − ''c'' bits. This is equivalent to saying that {{math|''K''(''s'') ≤ {{abs|''s''}} − ''c''}}.  Otherwise, ''s'' is incompressible by ''c''. A string incompressible by 1 is said to be simply ''incompressible''&nbsp;– by the [[pigeonhole principle]], which applies because every compressed string maps to only one uncompressed string, [[incompressible string]]s must exist, since there are 2<sup>''n''</sup> bit strings of length ''n'', but only 2<sup>''n''</sup> − 1 shorter strings, that is, strings of length less than ''n'', (i.e. with length 0, 1, ..., ''n''&nbsp;−&nbsp;1).<ref group=note>As there are {{math|1=''N''<sub>''L''</sub> = 2<sup>''L''</sup>}} strings of length ''L'', the number of strings of lengths {{math|1=''L'' = 0, 1, ..., ''n'' − 1}} is {{math|''N''<sub>0</sub> + ''N''<sub>1</sub> + ... + ''N''<sub>''n''−1</sub>}} = {{math|2<sup>0</sup> + 2<sup>1</sup> + ... + 2<sup>''n''−1</sup>}}, which is a finite [[geometric series]] with sum {{math|2<sup>0</sup> + 2<sup>1</sup> + ... + 2<sup>''n''−1</sup>}} = {{math|1 = 2<sup>0</sup> × (1 − 2<sup>''n''</sup>) / (1 − 2) = 2<sup>''n''</sup> − 1}}</ref>

For the same reason, most strings are complex in the sense that they cannot be significantly compressed&nbsp;– their ''K''(''s'') is not much smaller than |''s''|, the length of ''s'' in bits. To make this precise, fix a value of ''n''. There are 2<sup>''n''</sup> bitstrings of length ''n''. The [[Uniform distribution (discrete)|uniform]] [[probability]] distribution on the space of these bitstrings assigns exactly equal weight 2<sup>−''n''</sup> to each string of length ''n''.

'''Theorem''': With the uniform probability distribution on the space of bitstrings of length ''n'', the probability that a string is incompressible by ''c'' is at least {{math|1 − 2<sup>−''c''+1</sup> + 2<sup>−''n''</sup>}}.

To prove the theorem, note that the number of descriptions of length not exceeding ''n'' − ''c'' is given by the geometric series:

: 1 + 2 + 2<sup>2</sup> + ... + 2<sup>''n'' − ''c''</sup> = 2<sup>''n''−''c''+1</sup> − 1.

There remain at least

: 2<sup>''n''</sup> − 2<sup>''n''−''c''+1</sup> + 1

bitstrings of length ''n'' that are incompressible by ''c''.  To determine the probability, divide by 2<sup>''n''</sup>.

==Chaitin's incompleteness theorem==
[[File:Kolmogorov complexity and computable lower bounds svg.svg|thumb|right|600px|Kolmogorov complexity {{color|#ff0000|''K''(''s'')}}, and two computable lower bound functions <code>{{color|#00e000|prog1(s)}}</code>, <code>{{color|#0000ff|prog2(s)}}</code>. The horizontal axis ([[logarithmic scale]]) enumerates all [[string (computer science)|strings]] ''s'', ordered by length; the vertical axis ([[linear scale]]) measures Kolmogorov complexity in [[bit]]s. Most strings are incompressible, i.e. their Kolmogorov complexity exceeds their length by a constant amount. 9 compressible strings are shown in the picture, appearing as almost vertical slopes. Due to Chaitin's incompleteness theorem (1974), the output of any program computing a lower bound of the Kolmogorov complexity cannot exceed some fixed limit, which is independent of the input string ''s''.]]
By the above theorem ({{slink||Compression}}), most strings are complex in the sense that they cannot be described in any significantly "compressed" way. However, it turns out that the fact that a specific string is complex cannot be formally proven, if the complexity of the string is above a certain threshold. The precise formalization is as follows. First, fix a particular [[axiomatic system]] '''S''' for the [[natural number]]s. The axiomatic system has to be powerful enough so that, to certain assertions  '''A''' about complexity of strings, one can associate a formula '''F'''<sub>'''A'''</sub> in '''S'''. This association must have the following property:

If '''F'''<sub>'''A'''</sub> is provable from the axioms of '''S''', then the corresponding assertion '''A''' must be true. This "formalization" can be achieved based on a [[Gödel numbering]].

'''Theorem''': There exists a constant ''L'' (which only depends on '''S''' and on the choice of description language) such that there does not exist a string ''s'' for which the statement

:''K''(''s'') ≥  ''L'' &nbsp;  &nbsp;  &nbsp; (as formalized in '''S''')

can be proven within '''S'''.<ref>{{cite journal | url=http://www.cas.mcmaster.ca/~sancheg/EE_UCU2006_thesis/biblio/Information-theoretic%20limitations%20of%20Formal%20Systems%20(acm74).pdf | author=Gregory J. Chaitin | title=Information-theoretic limitations of formal systems | journal=Journal of the ACM | volume=21 | number=3 | pages=403&ndash;434 | date=Jul 1974 | doi=10.1145/321832.321839 | s2cid=2142553 }} Here: Thm.4.1b</ref><ref>{{Cite book|last=Calude |first=Cristian S. |title=Information and Randomness: an algorithmic perspective |date=12 September 2002 |publisher=Springer |url=https://www.springer.com/br/book/9783540434665 |language=en |isbn=9783540434665 }}</ref>

'''Proof Idea''': The proof of this result is modeled on a self-referential construction used in [[Berry's paradox]]. We firstly obtain a program which enumerates the proofs within '''S''' and we specify a procedure ''P'' which takes as an input an integer ''L'' and prints the strings ''x'' which are within proofs within '''S''' of the statement ''K''(''x'') ≥ ''L''. By then setting ''L'' to greater than the length of this procedure ''P'', we have that the required length of a program to print ''x'' as stated in ''K''(''x'') ≥ ''L'' as being at least ''L'' is then less than the amount ''L'' since the string ''x'' was printed by the procedure ''P''. This is a contradiction. So it is not possible for the proof system '''S''' to prove ''K''(''x'') ≥ ''L'' for ''L'' arbitrarily large, in particular, for ''L'' larger than the length of the procedure ''P'', (which is finite).

'''Proof''':

We can find an effective enumeration of all the formal proofs in '''S''' by some procedure

 '''function''' NthProof('''int''' ''n'')

which takes as input ''n'' and outputs some proof. This function enumerates all proofs. Some of these are proofs for formulas we do not care about here, since every possible proof in the language of '''S''' is produced for some ''n''. Some of these are complexity formulas of the form ''K''(''s'')&nbsp;≥&nbsp;''n'' where ''s'' and ''n'' are constants in the language of '''S'''. There is a procedure

 '''function''' NthProofProvesComplexityFormula('''int''' ''n'')

which determines whether the ''n''th proof actually proves a complexity formula ''K''(''s'')&nbsp;≥&nbsp;''L''. The strings ''s'', and the integer ''L'' in turn, are computable by procedure:

 '''function''' StringNthProof('''int''' ''n'')

 '''function''' ComplexityLowerBoundNthProof('''int''' ''n'')

Consider the following procedure:

 '''function''' GenerateProvablyComplexString('''int''' ''n'')
     '''for''' i = 1 to infinity:
         '''if''' NthProofProvesComplexityFormula(i) '''and''' ComplexityLowerBoundNthProof(i) ≥ ''n''
             '''return''' StringNthProof(''i'')

Given an ''n'', this procedure tries every proof until it finds a string and a proof in the formal system '''S''' of the formula ''K''(''s'')&nbsp;≥&nbsp;''L'' for some ''L''&nbsp;≥&nbsp;''n''; if no such proof exists, it loops forever.

Finally, consider the program consisting of all these procedure definitions, and a main call:

 GenerateProvablyComplexString(''n''<sub>0</sub>)

where the constant ''n''<sub>0</sub> will be determined later on. The overall program length can be expressed as ''U''+log<sub>2</sub>(''n''<sub>0</sub>), where ''U'' is some constant and log<sub>2</sub>(''n''<sub>0</sub>) represents the length of the integer value ''n''<sub>0</sub>, under the reasonable assumption that it is encoded in binary digits. We will choose ''n''<sub>0</sub> to be greater than the program length, that is, such that ''n''<sub>0</sub> > ''U''+log<sub>2</sub>(''n''<sub>0</sub>).  This is clearly true for ''n''<sub>0</sub> sufficiently large, because the left hand side grows linearly in ''n''<sub>0</sub> whilst the right hand side grows logarithmically in ''n''<sub>0</sub> up to the fixed constant ''U''.

Then no proof of the form "''K''(''s'')≥''L''" with ''L''≥''n''<sub>0</sub> can be obtained in '''S''', as can be seen by an [[indirect argument]]:
If <code>ComplexityLowerBoundNthProof(i)</code> could return a value ≥''n''<sub>0</sub>, then the loop inside <code>GenerateProvablyComplexString</code> would eventually terminate, and that procedure would return a string ''s'' such that
{| style="border: 1px solid darkgray;"
|-
| || ''K''(''s'')
|-
| ≥ || ''n''<sub>0</sub> || &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; || by construction of <code>GenerateProvablyComplexString</code>
|-
| > || ''U''+log<sub>2</sub>(''n''<sub>0</sub>) || || by the choice of ''n''<sub>0</sub> 
|-
| ≥ || ''K''(''s'') || || since ''s'' was described by the program with that length
|}
This is a contradiction, [[Q.E.D.]]

As a consequence, the above program, with the chosen value of ''n''<sub>0</sub>, must loop forever.

Similar ideas are used to prove the properties of [[Chaitin's constant]].

==Minimum message length==
{{Main|Minimum message length}}
The minimum message length principle of statistical and inductive inference and machine learning was developed by [[Chris Wallace (computer scientist)|C.S. Wallace]] and D.M. Boulton in 1968. MML is [[Bayesian probability|Bayesian]] (i.e. it incorporates prior beliefs) and information-theoretic. It has the desirable properties of statistical invariance (i.e. the inference transforms with a re-parametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (i.e. even for very hard problems, MML will converge to any underlying model) and efficiency (i.e. the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace and D.L. Dowe (1999) showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity).<ref>{{cite journal |citeseerx=10.1.1.17.321 |title=Minimum Message Length and Kolmogorov Complexity |journal=Computer Journal |volume=42 |issue=4 |pages=270–283 |year=1999 |last1=Wallace |first1=C. S. |last2=Dowe |first2=D. L. |doi=10.1093/comjnl/42.4.270 }}</ref>

==Kolmogorov randomness==
{{See also|Algorithmically random sequence}}
''Kolmogorov randomness'' defines a string (usually of [[bit]]s) as being [[randomness|random]] if and only if every [[computer program]] that can produce that string is at least as long as the string itself.  To make this precise, a universal computer (or [[universal Turing machine]]) must be specified, so that "program" means a program for this universal machine. A random string in this sense is "incompressible" in that it is impossible to "compress" the string into a program that is shorter than the string itself. For every universal computer, there is at least one algorithmically random string of each length.<ref>There are 2<sup>''n''</sup> bit strings of length ''n'' but only 2<sup>''n''</sup>-1 shorter bit strings, hence at most that much compression results.</ref>  Whether a particular string is random, however, depends on the specific universal computer that is chosen. This is because a universal computer can have a particular string hard-coded in itself, and a program running on this universal computer can then simply refer to this hard-coded string using a short sequence of bits (i.e. much shorter than the string itself).

This definition can be extended to define a notion of randomness for ''infinite'' sequences from a finite alphabet. These [[algorithmically random sequence]]s can be defined in three equivalent ways. One way uses an effective analogue of [[measure theory]]; another uses effective [[Martingale (probability theory)|martingales]].  The third way defines an infinite sequence to be random if the prefix-free Kolmogorov complexity of its initial segments grows quickly enough&nbsp;— there must be a constant ''c'' such that the complexity of an initial segment of length ''n'' is always at least ''n''−''c''.  This definition, unlike the definition of randomness for a finite string, is not affected by which universal machine is used to define prefix-free Kolmogorov complexity.<ref>{{cite journal |doi=10.1016/s0019-9958(66)80018-9 |title=The definition of random sequences |journal=Information and Control |volume=9 |issue=6 |pages=602–619 |year=1966 |last=Martin-Löf |first=Per |doi-access=free }}</ref>

==Relation to entropy==
For dynamical systems, entropy rate and algorithmic complexity of the trajectories are related by a theorem of Brudno, that the equality <math>K(x;T) =  h(T)</math> holds for almost all <math>x</math>.<ref>{{cite journal |first1=Stefano|last1=Galatolo |first2=Mathieu|last2=Hoyrup |first3=Cristóbal|last3=Rojas |title=Effective symbolic dynamics, random points, statistical behavior, complexity and entropy | journal=Information and Computation | volume=208 | pages=23–41 | year=2010| url=http://www.loria.fr/~hoyrup/random_ergodic.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://www.loria.fr/~hoyrup/random_ergodic.pdf |archive-date=2022-10-09 |url-status=live | doi=10.1016/j.ic.2009.05.001|arxiv=0801.0209 |s2cid=5555443 }}</ref>

It can be shown<ref>{{cite arXiv |author=Alexei Kaltchenko |title=Algorithms for Estimating Information Distance with Application to Bioinformatics and Linguistics  |year=2004 |eprint=cs.CC/0404039}}</ref> that for the output of [[Markov information source]]s, Kolmogorov complexity is related to the [[Entropy (information theory)|entropy]] of the information source. More precisely, the Kolmogorov complexity of the output of a Markov information source, normalized by the length of the output, converges almost surely (as the length of the output goes to infinity) to the [[Entropy (information theory)|entropy]] of the source.

'''Theorem.''' (Theorem 14.2.5 <ref name=":1">{{cite book |last1=Cover |first1=Thomas M. |title=Elements of information theory |last2=Thomas |first2=Joy A. |publisher=Wiley-Interscience |year=2006 |isbn=0-471-24195-4 |edition=2nd}}</ref>) The conditional Kolmogorov complexity of a binary string <math>x_{1:n}</math> satisfies<math display="block">\frac 1n K(x_{1:n} | n ) \leq H_b\left(\frac 1n \sum_i x_i\right) + \frac{\log n}{2n} + O(1/n)
</math>where <math>H_b</math> is the [[binary entropy function]] (not to be confused with the entropy rate).

== Halting problem ==
The Kolmogorov complexity function is equivalent to deciding the halting problem.

If we have a halting oracle, then the Kolmogorov complexity of a string can be computed by simply trying every halting program, in lexicographic order, until one of them outputs the string.

The other direction is much more involved.<ref>{{Cite journal |last1=Chaitin |first1=G. |last2=Arslanov |first2=A. |last3=Calude |first3=Cristian S. |date=1995-09-01 |title=Program-size Complexity Computes the Halting Problem |journal=Bull. EATCS|s2cid=39718973 }}</ref><ref>{{Cite book |last1=Li |first1=Ming |last2=Vitányi |first2=Paul |date=2008 |title=An Introduction to Kolmogorov Complexity and Its Applications |url=https://link.springer.com/book/10.1007/978-0-387-49820-1 |series=Texts in Computer Science |language=en |at=Exercise 2.7.7. |doi=10.1007/978-0-387-49820-1 |bibcode=2008ikca.book.....L |isbn=978-0-387-33998-6 |issn=1868-0941}}</ref> It shows that given a Kolmogorov complexity function, we can construct a function <math>p</math>, such that <math>p(n) \geq BB(n)</math> for all large <math>n</math>, where <math>BB</math> is the [[Busy beaver|Busy Beaver]] shift function (also denoted as <math>S(n)</math>). By modifying the function at lower values of <math>n</math> we get an upper bound on <math>BB</math>, which solves the halting problem.

Consider this program <math display="inline">p_K</math>, which takes input as <math display="inline">n</math>, and uses <math display="inline">K</math>.

* List all strings of length <math display="inline">\leq 2n + 1</math>.
* For each such string <math display="inline">x</math>, enumerate all (prefix-free) programs of length <math>K(x)</math> until one of them does output <math display="inline">x</math>. Record its runtime <math display="inline">n_x</math>.
* Output the largest <math display="inline">n_x</math>.

We prove by contradiction that <math display="inline">p_K(n) \geq BB(n)</math> for all large <math display="inline">n</math>.

Let <math display="inline">p_{n}</math> be a Busy Beaver of length <math>n</math>. Consider this (prefix-free) program, which takes no input:

* Run the program <math display="inline">p_{n}</math>, and record its runtime length <math display="inline">BB(n)</math>.
* Generate all programs with length <math display="inline">\leq 2n</math>. Run every one of them for up to <math display="inline">BB(n)</math> steps. Note the outputs of those that have halted.
* Output the string with the lowest lexicographic order that has not been output by any of those.

Let the string output by the program be <math display="inline">x</math>.

The program has length <math display="inline">\leq n + 2\log_2 n + O(1)</math>, where <math>n</math> comes from the length of the Busy Beaver <math display="inline">p_{n}</math>, <math>2\log_2 n</math> comes from using the (prefix-free) [[Elias delta code]] for the number <math>n</math>, and <math>O(1)</math> comes from the rest of the program. Therefore,<math display="block">K(x) \leq n + 2\log_2 n + O(1) \leq 2n</math>for all big <math display="inline">n</math>. Further, since there are only so many possible programs with length <math display="inline">\leq 2n</math>, we have <math display="inline">l(x) \leq 2n + 1</math> by [[pigeonhole principle]].
By assumption, <math display="inline">p_K(n) < BB(n)</math>, so every string of length <math display="inline">\leq 2n + 1</math> has a minimal program with runtime <math display="inline">< BB(n)</math>. Thus, the string <math display="inline">x</math> has a minimal program with runtime <math display="inline">< BB(n)</math>. Further, that program has length <math display="inline">K(x) \leq 2n</math>. This contradicts how <math display="inline">x</math> was constructed.

== Universal probability ==
Fix a universal Turing machine <math>U</math>, the same one used to define the (prefix-free) Kolmogorov complexity. Define the (prefix-free) universal probability of a string <math>x</math> to be<math display="block">P(x) = \sum_{U(p) = x} 2^{-l(p)}</math>In other words, it is the probability that, given a uniformly random binary stream as input, the universal Turing machine would halt after reading a certain prefix of the stream, and output <math>x</math>.

Note. <math>U(p) = x</math> does not mean that the input stream is <math>p000\cdots</math>, but that the universal Turing machine would halt at some point after reading the initial segment <math>p</math>, without reading any further input, and that, when it halts, its has written <math>x</math> to the output tape.

'''Theorem.''' (Theorem 14.11.1<ref name=":1" />) <math>\log \frac{1}{P(x)} = K(x) + O(1)</math>

==Implications in biology==
In the context of biology to argue that the symmetries and modular arrangements observed in multiple species emerge from the tendency of evolution to prefer minimal Kolmogorov complexity.<ref>{{Cite journal |last1=Johnston |first1=Iain G. |last2=Dingle |first2=Kamaludin |last3=Greenbury |first3=Sam F. |last4=Camargo |first4=Chico Q. |last5=Doye |first5=Jonathan P. K. |last6=Ahnert |first6=Sebastian E. |last7=Louis |first7=Ard A. |date=2022-03-15 |title=Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution |journal=Proceedings of the National Academy of Sciences |volume=119 |issue=11 |pages=e2113883119 |doi=10.1073/pnas.2113883119 |doi-access=free |pmc=8931234 |pmid=35275794|bibcode=2022PNAS..11913883J }}</ref> Considering the genome as a program that must solve a task or implement a series of functions, shorter programs would be preferred on the basis that they are easier to find by the mechanisms of evolution.<ref>{{Cite journal |last=Alon |first=Uri |date=Mar 2007 |title=Simplicity in biology |url=https://www.nature.com/articles/446497a |journal=Nature |language=en |volume=446 |issue=7135 |pages=497 |doi=10.1038/446497a |pmid=17392770 |bibcode=2007Natur.446..497A |issn=1476-4687}}</ref> An example of this approach is the eight-fold symmetry of the compass circuit that is found across insect species, which correspond to the circuit that is both functional and requires the minimum Kolmogorov complexity to be generated from self-replicating units.<ref>{{Cite journal |last1=Vilimelis Aceituno |first1=Pau |last2=Dall'Osto |first2=Dominic |last3=Pisokas |first3=Ioannis |date=2024-05-30 |editor-last=Colgin |editor-first=Laura L |editor2-last=Vafidis |editor2-first=Pantelis |title=Theoretical principles explain the structure of the insect head direction circuit |url=https://elifesciences.org/articles/91533 |journal=eLife |volume=13 |pages=e91533 |doi=10.7554/eLife.91533 |doi-access=free |pmid=38814703 |issn=2050-084X}}</ref>

==Conditional versions==
{{expand section|date=July 2014}}
The conditional Kolmogorov complexity of two strings <math>K(x|y)</math> is, roughly speaking, defined as the Kolmogorov complexity of ''x'' given ''y'' as an auxiliary input to the procedure.<ref name="Rissanen2007">{{cite book |author=Jorma Rissanen |url=https://link.springer.com/book/10.1007/978-0-387-68812-1 |title=Information and Complexity in Statistical Modeling |series=Information Science and Statistics |publisher=Springer S |year=2007 |isbn=978-0-387-68812-1 |page=[https://archive.org/details/informationcompl00riss_364/page/n59 53] |doi=10.1007/978-0-387-68812-1 |url-access=limited}}</ref><ref>{{cite book |author1=Ming Li |url=https://link.springer.com/book/10.1007/978-0-387-49820-1 |title=An Introduction to Kolmogorov Complexity and Its Applications |author2=Paul M.B. Vitányi |publisher=Springer |year=2009 |isbn=978-0-387-49820-1 |pages=[https://archive.org/details/introductiontoko00limi_695/page/n127 105]–106 |doi=10.1007/978-0-387-49820-1 |url-access=limited}}</ref>

There is also a length-conditional complexity <math>K(x|L(x))</math>, which is the complexity of ''x'' given the length of ''x'' as known/input.<ref>{{cite book|author1=Ming Li|author2=Paul M.B. Vitányi|title=An Introduction to Kolmogorov Complexity and Its Applications|url=https://archive.org/details/introductiontoko00limi_695|url-access=limited|year=2009|publisher=Springer  |isbn=978-0-387-49820-1|page=[https://archive.org/details/introductiontoko00limi_695/page/n141 119]}}</ref><ref>{{cite journal |doi=10.1016/j.tcs.2013.07.009 |title=Conditional Kolmogorov complexity and universal probability |journal=Theoretical Computer Science |volume=501 |pages=93–100 |year=2013 |last1=Vitányi |first1=Paul M.B. |url=https://ir.cwi.nl/pub/26818 |arxiv=1206.0983 |s2cid=12085503 }}</ref>

== Time-bounded complexity ==
Time-bounded Kolmogorov complexity is a modified version of Kolmogorov complexity where the space of programs to be searched for a solution is confined to only programs that can run within some pre-defined number of steps.<ref>{{Cite journal |last1=Hirahara |first1=Shuichi |last2=Kabanets |first2=Valentine |last3=Lu |first3=Zhenjian |last4=Oliveira |first4=Igor C. |date=2024 |title=Exact Search-To-Decision Reductions for Time-Bounded Kolmogorov Complexity |journal=39th Computational Complexity Conference (CCC 2024) |series=Leibniz International Proceedings in Informatics (LIPIcs) |volume=300 |language=en |publisher=Schloss Dagstuhl – Leibniz-Zentrum für Informatik |pages=29:1–29:56 |doi=10.4230/LIPIcs.CCC.2024.29|doi-access=free |isbn=978-3-95977-331-7 }}</ref> It is hypothesised that the possibility of the existence of an efficient algorithm for determining approximate time-bounded Kolmogorov complexity is related to the question of whether true [[one-way function]]s exist.<ref>{{Cite web |last=Klarreich |first=Erica |date=2022-04-06 |title=Researchers Identify 'Master Problem' Underlying All Cryptography |url=https://www.quantamagazine.org/researchers-identify-master-problem-underlying-all-cryptography-20220406/ |access-date=2024-11-16 |website=Quanta Magazine |language=en}}</ref><ref>{{Citation |last1=Liu |first1=Yanyi |title=On One-way Functions and Kolmogorov Complexity |date=2020-09-24 |arxiv=2009.11514 |last2=Pass |first2=Rafael}}</ref>

==See also==
* [[Berry paradox]]
* [[Code golf]]
* [[Data compression]]
* [[Descriptive complexity theory]]
* [[Grammar induction]]
* [[Inductive reasoning]]
* [[Kolmogorov structure function]]
* [[Levenshtein distance]]
* [[Manifold hypothesis]]
* [[Solomonoff's theory of inductive inference]]
* [[Sample entropy]]

==Notes==
{{Reflist|group=note}}

==References==
{{Reflist|30em}}

==Further reading==
* {{cite journal | author-link=Manuel Blum|last=Blum | title=On the size of machines | journal=Information and Control |first= M. | volume=11 | issue=3 | page=257 | year=1967 | doi = 10.1016/S0019-9958(67)90546-3 | doi-access=free }}
* {{cite journal |first=A. |last=Brudno |title=Entropy and the complexity of the trajectories of a dynamical system |journal=Transactions of the Moscow Mathematical Society |volume=2 |pages=127–151 |year=1983 }}
* {{cite book |last1=Cover |first1=Thomas M. |last2=Thomas |first2=Joy A. |title=Elements of information theory |publisher=Wiley-Interscience |edition=2nd |year=2006 |isbn=0-471-24195-4 }}
* {{cite book |last1=Lajos |first1=Rónyai |last2=Gábor |first2=Ivanyos |last3=Réka |first3=Szabó |title=Algoritmusok |publisher=TypoTeX  |year=1999 |isbn=963-279-014-6 }}
* {{cite book |last1=Li |first1=Ming  |last2=Vitányi |first2=Paul|title=An Introduction to Kolmogorov Complexity and Its Applications|publisher=Springer |year=1997 |isbn= 978-0387339986 }}
* {{cite book |first=Manin |last=Yu |title=A Course in Mathematical Logic |publisher=Springer-Verlag |year=1977 |isbn=978-0-7204-2844-5 |url-access=registration |url=https://archive.org/details/courseinmathemat0000bell }}
* {{cite book |first=Michael |last=Sipser |title=Introduction to the Theory of Computation |url=https://archive.org/details/introductiontoth00sips |url-access=registration |publisher=PWS |year=1997 |isbn=0-534-95097-3 }}
* {{Cite journal |last1=Downey |first1=Rodney G. |last2=Hirschfeldt |first2=Denis R. |date=2010 |title=Algorithmic Randomness and Complexity |url=https://link.springer.com/book/10.1007/978-0-387-68441-3 |journal=Theory and Applications of Computability |language=en |doi=10.1007/978-0-387-68441-3 |isbn=978-0-387-95567-4 |issn=2190-619X}}

==External links==
* [https://web.archive.org/web/20180321163508/http://kolmogorov.com/ The Legacy of Andrei Nikolaevich Kolmogorov]
* [https://web.archive.org/web/20150215210504/http://www.cs.umaine.edu/~chaitin/ Chaitin's online publications]
* [http://www.idsia.ch/~juergen/ray.html Solomonoff's IDSIA page]
* [http://www.idsia.ch/~juergen/kolmogorov.html Generalizations of algorithmic information] by [[Juergen Schmidhuber|J. Schmidhuber]]
* {{cite web  |title=Review of Li Vitányi 1997  |url=http://homepages.cwi.nl/~paulv/kolmogorov.html}}
* {{cite web |first=John |last=Tromp |url=https://tromp.github.io/cl/cl.html |title=John's Lambda Calculus and Combinatory Logic Playground}} Tromp's lambda calculus computer model offers a concrete definition of K()]
* Universal AI based on Kolmogorov Complexity {{isbn|3-540-22139-5}} by [[Marcus Hutter|M. Hutter]]:  {{isbn|3-540-22139-5}}
* [http://www.csse.monash.edu.au/~dld David Dowe]'s [http://www.csse.monash.edu.au/~dld/MML.html Minimum Message Length (MML)] and [http://www.csse.monash.edu.au/~dld/Occam.html Occam's razor] pages.
* {{cite book |first1=P. |last1=Grunwald |first2=M.A. |last2=Pitt |editor-first=I. J. |editor-last=Myung |title=Advances in Minimum Description Length: Theory and Applications |publisher=MIT Press |year=2005 |isbn=0-262-07262-9 }}

{{Mathematical logic}}
{{Compression Methods}}

{{DEFAULTSORT:Kolmogorov Complexity}}
[[Category:Algorithmic information theory|*]]
[[Category:Information theory|*]]
[[Category:Computability theory]]
[[Category:Descriptive complexity]]
[[Category:Measures of complexity]]
[[Category:Computational complexity theory]]
[[Category:Data compression]]