Editing Kolmogorov complexity (section)

==Definition==

=== Intuition ===
Consider the following two [[string (computer science)|strings]] of 32 lowercase letters and digits:

:  <code>abababababababababababababababab</code> , and
:  <code>4c1j5b2p0cv4w1x8rx2y39umgw5q85s7</code>

The first string has a short English-language description, namely "write ab 16 times", which consists of '''17''' characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, i.e., "write 4c1j5b2p0cv4w1x8rx2y39umgw5q85s7" which has '''38''' characters. Hence the operation of writing the first string can be said to have "less complexity" than writing the second.

More formally, the [[complexity]] of a string is the length of the shortest possible description of the string in some fixed [[Turing complete|universal]] description language (the sensitivity of complexity relative to the choice of description language is discussed below). It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Strings like the ''abab'' example above, whose Kolmogorov complexity is small relative to the string's size, are not considered to be complex.

The Kolmogorov complexity can be defined for any mathematical object, but for simplicity the scope of this article is restricted to strings. We must first specify a description language for strings.  Such a description language can be based on any computer programming language, such as [[Lisp programming language|Lisp]], [[Pascal (programming language)|Pascal]], or [[Java (programming language)|Java]].  If '''P''' is a program which outputs a string ''x'', then '''P''' is a description of ''x''. The length of the description is just the length of '''P''' as a character string, multiplied by the number of bits in a character (e.g., 7 for [[ASCII]]).

We could, alternatively, choose an encoding for [[Turing machine]]s, where an ''encoding'' is a function which associates to each Turing Machine '''M''' a bitstring <'''M'''>. If '''M''' is a Turing Machine which, on input ''w'', outputs string ''x'', then the concatenated string <'''M'''> ''w'' is a description of ''x''. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed.

Any string ''s'' has at least one description. For example, the second string above is output by the [[pseudo-code]]:

 '''function''' GenerateString2()
     '''return''' "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"

whereas the first string is output by the (much shorter) pseudo-code:

 '''function''' GenerateString1()
     '''return''' "ab" × 16

If a description ''d''(''s'') of a string ''s'' is of minimal length (i.e., using the fewest bits), it is called a '''minimal description''' of ''s'', and the length of ''d''(''s'') (i.e. the number of bits in the minimal description) is the '''Kolmogorov complexity''' of ''s'', written ''K''(''s''). Symbolically,

:''K''(''s'') = |''d''(''s'')|.

The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the ''invariance theorem'').

=== Plain Kolmogorov complexity ''C'' ===
There are two definitions of Kolmogorov complexity: ''plain'' and ''prefix-free''. The plain complexity is the minimal description length of any program, and denoted <math>C(x)</math> while the prefix-free complexity is the minimal description length of any program encoded in a [[prefix code|prefix-free code]], and denoted <math>K(x)</math>. The plain complexity is more intuitive, but the prefix-free complexity is easier to study.

By default, all equations hold only up to an additive constant. For example, <math>f(x) = g(x)</math> really means that <math>f(x) = g(x) + O(1)</math>, that is, <math>\exists c, \forall x, |f(x) - g(x)| \leq c</math>.

Let <math>U: 2^* \to 2^*</math> be a computable function mapping finite binary strings to binary strings. It is a universal function if, and only if, for any computable <math>f: 2^* \to 2^*</math>, we can encode the function in a "program" <math>s_f</math>, such that <math>\forall x \in 2^*, U(s_fx) = f(x) </math>. We can think of <math>U</math> as a program interpreter, which takes in an initial segment describing the program, followed by data that the program should process.

One problem with plain complexity is that <math>C(xy) \not < C(x) + C(y)</math>, because intuitively speaking, there is no general way to tell where to divide an output string just by looking at the concatenated string. We can divide it by specifying the length of <math>x</math> or <math>y</math>, but that would take <math>O(\min(\ln x, \ln y))</math> extra symbols. Indeed, for any <math>c > 0</math> there exists <math>x, y</math> such that <math>C(xy) \geq C(x) + C(y) + c</math>.<ref>(Downey and Hirschfeldt, 2010), Theorem 3.1.4</ref>

Typically, inequalities with plain complexity have a term like <math>O(\min(\ln x, \ln y))</math> on one side, whereas the same inequalities with prefix-free complexity have only <math>O(1)</math>.

The main problem with plain complexity is that there is something extra sneaked into a program. A program not only represents for something with its code, but also represents its own length. In particular, a program <math>x</math> may represent a binary number up to <math>\log_2 |x|</math>, simply by its own length. Stated in another way, it is as if we are using a termination symbol to denote where a word ends, and so we are not using 2 symbols, but 3. To fix this defect, we introduce the prefix-free Kolmogorov complexity.<ref>(Downey and Hirschfeldt, 2010), Section 3.5</ref>

=== Prefix-free Kolmogorov complexity ''K'' ===
A prefix-free code is a subset of <math>2^*</math> such that given any two different words <math>x, y</math> in the set, neither is a prefix of the other. The benefit of a prefix-free code is that we can build a machine that reads words from the code forward in one direction, and as soon as it reads the last symbol of the word, it ''knows'' that the word is finished, and does not need to backtrack or a termination symbol.

Define a '''prefix-free Turing machine''' to be a Turing machine that comes with a prefix-free code, such that the Turing machine can read any string from the code in one direction, and stop reading as soon as it reads the last symbol. Afterwards, it may compute on a work tape and write to a write tape, but it cannot move its read-head anymore.

This gives us the following formal way to describe ''K''.<ref name=":0">{{Cite journal |last=Hutter |first=Marcus |author-link=Marcus Hutter |date=2007-03-06 |title=Algorithmic information theory |journal=Scholarpedia |language=en |volume=2 |issue=3 |pages=2519 |doi=10.4249/scholarpedia.2519 |doi-access=free |bibcode=2007SchpJ...2.2519H |issn=1941-6016|hdl=1885/15015 |hdl-access=free }}</ref>

* Fix a prefix-free universal Turing machine, with three tapes: a read tape infinite in one direction, a work tape infinite in two directions, and a write tape infinite in one direction.
* The machine can read from the read tape in one direction only (no backtracking), and write to the write tape in one direction only. It can read and write the work tape in both directions.
* The work tape and write tape start with all zeros. The read tape starts with an input prefix code, followed by all zeros.
* Let <math>S</math> be the prefix-free code on <math>2^*</math>, used by the universal Turing machine.

Note that some universal Turing machines may not be programmable with prefix codes. We must pick only a prefix-free universal Turing machine.

The prefix-free complexity of a string <math>x</math> is the shortest prefix code that makes the machine output <math>x</math>:<math display="block">K(x) := \min\{|c| : c \in S, U(c) = x\}</math>