Editing Machine learning (section)

==== Self-learning ====
Self-learning, as a machine learning paradigm was introduced in 1982 along with a neural network capable of self-learning, named ''crossbar adaptive array'' (CAA).<ref>Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In Trappl, Robert (ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North-Holland. pp. 397–402. {{ISBN|978-0-444-86488-8}}.</ref><ref>Bozinovski, S. (1999) "Crossbar Adaptive Array: The first connectionist network that solved the delayed reinforcement learning problem" In A. Dobnikar, N. Steele, D. Pearson, R. Albert (eds.) Artificial Neural Networks and Genetic Algorithms, Springer Verlag, p. 320-325, ISBN 3-211-83364-1 </ref> It gives a solution to the problem learning without any external reward, by introducing emotion as an internal reward. Emotion is used as state evaluation of a self-learning agent. The CAA self-learning algorithm computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about consequence situations. The system is driven by the interaction between cognition and emotion.<ref>Bozinovski, Stevo (2014) "Modeling mechanisms of cognition-emotion interaction in artificial neural networks, since 1981." Procedia Computer Science p. 255-263</ref>
The self-learning algorithm updates a memory matrix W =||w(a,s)|| such that in each iteration executes the following machine learning routine: 
#  in situation ''s'' perform action ''a''
#  receive a consequence situation ''s'''
#  compute emotion of being in the consequence situation ''v(s')''
#  update crossbar memory  ''w'(a,s) = w(a,s) + v(s')''

It is a system with only one input, situation, and only one output, action (or behaviour) a. There is neither a separate reinforcement input nor an advice input from the environment. The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is the behavioural environment where it behaves, and the other is the genetic environment, wherefrom it initially and only once receives initial emotions about situations to be encountered in the behavioural environment. After receiving the genome (species) vector from the genetic environment, the CAA learns a goal-seeking behaviour, in an environment that contains both desirable and undesirable situations.<ref>Bozinovski, S. (2001) "Self-learning agents: A connectionist theory of emotion based on crossbar value judgment." Cybernetics and Systems 32(6) 637–667.</ref>