Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Perceptron
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Learning a Boolean function === Consider a dataset where the <math>x</math> are from <math>\{-1, +1\}^n</math>, that is, the vertices of an n-dimensional hypercube centered at origin, and <math>y = \theta(x_i)</math>. That is, all data points with positive <math>x_i</math> have <math>y=1</math>, and vice versa. By the perceptron convergence theorem, a perceptron would converge after making at most <math>n</math> mistakes. If we were to write a logical program to perform the same task, each positive example shows that one of the coordinates is the right one, and each negative example shows that its ''complement'' is a positive example. By collecting all the known positive examples, we eventually eliminate all but one coordinate, at which point the dataset is learned.<ref name=":3">{{Cite book |last1=Simon |first1=Herbert A. |title=The Sciences of the Artificial, reissue of the third edition with a new introduction by John Laird |last2=Laird |first2=John E. |date=2019-08-13 |publisher=The MIT Press |isbn=978-0-262-53753-7 |edition=Reissue |location=Cambridge, Massachusetts London, England |language=English |chapter=Limits on Speed of Concept Attainment}}</ref> This bound is asymptotically tight in terms of the worst-case. In the worst-case, the first presented example is entirely new, and gives <math>n</math> bits of information, but each subsequent example would differ minimally from previous examples, and gives 1 bit each. After <math>n+1</math> examples, there are <math>2n</math> bits of information, which is sufficient for the perceptron (with <math>2n</math> bits of information).<ref name=":2" /> However, it is not tight in terms of expectation if the examples are presented uniformly at random, since the first would give <math>n</math> bits, the second <math>n/2</math> bits, and so on, taking <math>O(\ln n)</math> examples in total.<ref name=":3" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)