Leaky Rectified Linear Activation (LReLU) Function

[math]\displaystyle{ f(x)=max(0,x)+\beta∗min(0,x) }[/math],

where [math]\displaystyle{ \beta }[/math] is small non-zero gradient.

Context:
- It can (typically) be used in the activation of Leaky Rectified Linear Neurons.
Example(s):
Counter-Example(s):
See: Artificial Neural Network, Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.

References

Returns: Output variable. A [math]\displaystyle{ (s_1,s_2,\cdots,s_N) }[/math]-shaped float array.

Return type: Variable

Example:

>>> x = np.array([ [-1, 0], [2, -3], [-2, 1] ], 'f') >>> x array([ [-1., 0.], [ 2., -3.], [-2., 1.] ], dtype=float32) >>> F.leaky_relu(x, slope=0.2).data array([ [-0.2, 0. ], [ 2. , -0.6],

Shape:

Examples:

{| class="wikitable" style="margin-left: 30px;border:0px; width:60%;"

|style="font-family:monospace; font-size:10.5pt;font-weight=bold;text-align:top;"| >>> m = nn.LeakyReLU(0.1)

>>> input = autograd.Variable(torch.randn(2))

>>> print(input)

>>> print(m(input)) |}

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Leaky_ReLUs Retrieved:2018-2-4.
- Leaky ReLUs allow a small, non-zero gradient when the unit is not active.^[1] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ 0.01x & \mbox{otherwise} \end{cases} }[/math]
  Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.^[2] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ a x & \mbox{otherwise} \end{cases} }[/math]
  Note that for [math]\displaystyle{ a\leq1 }[/math], this is equivalent to : [math]\displaystyle{ f(x) = \max(x, ax) }[/math] and thus has a relation to "maxout" networks.

(CS231n, 2018) ⇒ Commonly used activation functions. In: CS231n Convolutional Neural Networks for Visual Recognition Retrieved: 2018-01-28.
- QUOTE: Leaky ReLU. Leaky ReLUs are one attempt to fix the “dying ReLU” problem. Instead of the function being zero when [math]\displaystyle{ x \lt 0 }[/math], a leaky ReLU will instead have a small negative slope (of 0.01, or so). That is, the function computes [math]\displaystyle{ f(x)=1(x\lt 0)(\alpha x)+1(x \ge 0)(x) }[/math] where [math]\displaystyle{ \alpha }[/math] is a small constant. Some people report success with this form of activation function, but the results are not always consistent. The slope in the negative region can also be made into a parameter of each neuron, as seen in PReLU neurons, introduced in Delving Deep into Rectifiers, by Kaiming He et al., 2015. However, the consistency of the benefit across tasks is presently unclear.

(Mate Labs, 2017) ⇒ Mate Labs Aug 23, 2017. Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
- QUOTE: Leaky rectified linear unit (Leaky ReLU) — Leaky ReLUs allow a small, non-zero gradient when the unit is not active. 0.01 is the small non-zero gradient here
  [math]\displaystyle{ f(x) = \begin{cases} 0, & \mbox{for } 0.01x \lt 0 \\ x, & \mbox{for } x \geq 0 \end{cases} }[/math]
  Range:[math]\displaystyle{ (-\infty, +\infty) }[/math]

↑ Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models
↑ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification". arXiv:1502.01852 Freely accessible [cs.CV].