Editing Perceptron (section)

=== Learning a Boolean function ===
Consider a dataset where the <math>x</math> are from <math>\{-1, +1\}^n</math>, that is, the vertices of an n-dimensional hypercube centered at origin, and <math>y = \theta(x_i)</math>. That is, all data points with positive <math>x_i</math> have <math>y=1</math>, and vice versa. By the perceptron convergence theorem, a perceptron would converge after making at most <math>n</math> mistakes.

If we were to write a logical program to perform the same task, each positive example shows that one of the coordinates is the right one, and each negative example shows that its ''complement'' is a positive example. By collecting all the known positive examples, we eventually eliminate all but one coordinate, at which point the dataset is learned.<ref name=":3">{{Cite book |last1=Simon |first1=Herbert A. |title=The Sciences of the Artificial, reissue of the third edition with a new introduction by John Laird |last2=Laird |first2=John E. |date=2019-08-13 |publisher=The MIT Press |isbn=978-0-262-53753-7 |edition=Reissue |location=Cambridge, Massachusetts London, England |language=English |chapter=Limits on Speed of Concept Attainment}}</ref>

This bound is asymptotically tight in terms of the worst-case. In the worst-case, the first presented example is entirely new, and gives <math>n</math> bits of information, but each subsequent example would differ minimally from previous examples, and gives 1 bit each. After <math>n+1</math> examples, there are <math>2n</math> bits of information, which is sufficient for the perceptron (with <math>2n</math> bits of information).<ref name=":2" />

However, it is not tight in terms of expectation if the examples are presented uniformly at random, since the first would give <math>n</math> bits, the second <math>n/2</math> bits, and so on, taking <math>O(\ln n)</math> examples in total.<ref name=":3" />