Editing Reinforcement learning (section)

=== Self-reinforcement learning ===
Self-reinforcement learning (or self-learning), is a learning paradigm which does not use the concept of immediate reward <math>R_a(s,s')</math> after transition from <math>s</math> to <math>s'</math> with action <math>a</math>. It does not use an external reinforcement, it only uses the agent internal self-reinforcement. The internal self-reinforcement is provided by mechanism of feelings and emotions. In the learning process emotions are backpropagated by a mechanism of secondary reinforcement. The learning equation does not include the immediate reward, it only includes the state evaluation.

The self-reinforcement algorithm updates a memory matrix <math>W=||w(a,s)||</math> such that in each iteration executes the following machine learning routine:
# In situation <math>s</math> perform action <math>a</math>.
# Receive a consequence situation <math>s'</math>.
# Compute state evaluation <math>v(s')</math> of how good is to be in the consequence situation <math>s'</math>.
# Update crossbar memory <math>w'(a,s) = w(a,s) + v(s')</math>.

Initial conditions of the memory are received as input from the genetic environment. It is a system with only one input (situation), and only one output (action, or behavior).

Self-reinforcement (self-learning) was introduced in 1982 along with a neural network capable of self-reinforcement learning, named Crossbar Adaptive Array (CAA).<ref>Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In Trappl, Robert (ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North-Holland. pp. 397–402. ISBN 978-0-444-86488-8</ref><ref>Bozinovski S. (1995) "Neuro genetic agents and structural theory of self-reinforcement learning systems". CMPSCI Technical Report 95-107, University of Massachusetts at Amherst [https://web.cs.umass.edu/publication/docs/1995/UM-CS-1995-107.pdf]</ref> The CAA computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about consequence states. The system is driven by the interaction between cognition and emotion.<ref>Bozinovski, S. (2014) "Modeling mechanisms of cognition-emotion interaction in artificial neural networks, since 1981." Procedia Computer Science p. 255–263</ref>