Editing Artificial neuron (section)

===Rectifier===
{{Main|Rectifier (neural networks)}}
In the context of [[artificial neural network]]s, the '''rectifier''' or '''ReLU (Rectified Linear Unit)''' is an [[activation function]] defined as the positive part of its argument:

: <math>f(x) = x^+ = \max(0, x),</math>

where <math>x</math> is the input to a neuron. This is also known as a [[ramp function]] and is analogous to [[half-wave rectification]] in electrical engineering. This [[activation function]] was first introduced to a dynamical network by Hahnloser et al. in a 2000 paper in ''[[Nature (journal)|Nature]]''<ref name="Hahnloser2000">{{cite journal | last1=Hahnloser | first1=Richard H. R. | last2=Sarpeshkar | first2=Rahul | last3=Mahowald | first3=Misha A. | last4=Douglas | first4=Rodney J. | last5=Seung | first5=H. Sebastian | title=Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit | journal=Nature | volume=405 | issue=6789 | year=2000 | issn=0028-0836 | doi=10.1038/35016072 | pmid=10879535 | pages=947–951| bibcode=2000Natur.405..947H | s2cid=4399014 }}</ref> with strong [[biological]] motivations and mathematical justifications.<ref name="Hahnloser2001">{{cite conference |author=R Hahnloser |author2=H.S. Seung |year=2001 |title=Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks|conference=NIPS 2001}}</ref> It has been demonstrated for the first time in 2011 to enable better training of deeper networks,<ref name="glorot2011">{{cite conference |author1=Xavier Glorot |author2=Antoine Bordes |author3=[[Yoshua Bengio]] |year=2011 |title=Deep sparse rectifier neural networks |conference=AISTATS |url=http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf}}</ref> compared to the widely used activation functions prior to 2011, i.e., the [[Logistic function|logistic sigmoid]] (which is inspired by [[probability theory]]; see [[logistic regression]]) and its more practical<ref>{{cite encyclopedia |author=[[Yann LeCun]] |author2=[[Leon Bottou]] |author3=Genevieve B. Orr |author4=[[Klaus-Robert Müller]] |year=1998 |url=http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf |title=Efficient BackProp |editor=G. Orr |editor2=K. Müller |encyclopedia=Neural Networks: Tricks of the Trade |publisher=Springer}}</ref> counterpart, the [[hyperbolic tangent]].

A commonly used variant of the ReLU activation function is the Leaky ReLU which allows a small, positive gradient when the unit is not active:

<math>f(x) = \begin{cases}
    x & \text{if } x > 0, \\
    ax & \text{otherwise}.
\end{cases}</math>

where <math>x</math> is the input to the neuron and <math>a</math> is a small positive constant (set to 0.01 in the original paper).<ref name="maas2014">Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). [https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf Rectifier Nonlinearities Improve Neural Network Acoustic Models].</ref>