Leaky Rectified Linear Activation (LReLU) Function
A Leaky Rectified Linear Activation (LReLU) Function is a rectified-based activation function that is based on the mathematical function:
where [math]\displaystyle{ \beta }[/math] is small non-zero gradient.
- Context:
- It can (typically) be used in the activation of Leaky Rectified Linear Neurons.
- Example(s):
- Counter-Example(s):
- a Clipped Rectifier Unit Activation Function,
- a Concatenated Rectified Linear Activation Function,
- an Exponential Linear Activation Function,
- a Noisy Rectified Linear Activation Function,
- a Parametric Rectified Linear Activation Function,
- a Scaled Exponential Linear Activation Function,
- a Softplus Activation Function,
- a S-shaped Rectified Linear Activation Function.
- See: Artificial Neural Network, Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.
References
2018a
- (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.functions.leaky_relu.html Retrieved:2018-2-18
- QUOTE:
chainer.functions.leaky_relu(x, slope=0.2)
source Leaky Rectified Linear Unit function.
This function is expressed as
[math]\displaystyle{ f(x) = \begin{cases} x, & \mbox{if } x \ge 0 \\ ax, & \mbox{if } x \lt 0 \end{cases} }[/math].
where [math]\displaystyle{ a }[/math] is a configurable slope value.
Parameters:
- x (Variable or
numpy.ndarray
orcupy.ndarray
) – Input variable. A [math]\displaystyle{ (s_1,s_2,\cdots,s_N) }[/math]-shaped float array. - slope (float) – Slope value [math]\displaystyle{ a }[/math].
- x (Variable or
- QUOTE:
- Returns: Output variable. A [math]\displaystyle{ (s_1,s_2,\cdots,s_N) }[/math]-shaped float array.
- Return type: Variable
- Example:
>>> x = np.array([ [-1, 0], [2, -3], [-2, 1] ], 'f') >>> x array([ [-1., 0.], [ 2., -3.], [-2., 1.] ], dtype=float32) >>> F.leaky_relu(x, slope=0.2).data array([ [-0.2, 0. ], [ 2. , -0.6], |
2018b
- (Pytorch,2018) & rArr; http://pytorch.org/docs/master/nn.html#leakyrelu Retrieved: 2018-2-10.
- QUOTE:
class torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)
sourceApplies element-wise, [math]\displaystyle{ f(x)=max(0,x)+negative\_slope∗min(0,x) }[/math]
Parameters:
- negative_slope– Controls the angle of the negative slope. Default: 1e-2
- inplace – can optionally do the operation in-place. Default: False
- QUOTE:
- Shape:
- Input: (N,∗) where * means, any number of additional dimensions
- Output: (N,∗), same shape as the input
- Examples:
{| class="wikitable" style="margin-left: 30px;border:0px; width:60%;"
- Shape:
|style="font-family:monospace; font-size:10.5pt;font-weight=bold;text-align:top;"| >>> m = nn.LeakyReLU(0.1)
>>> input = autograd.Variable(torch.randn(2))
>>> print(input)
>>> print(m(input)) |}
2018c
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Leaky_ReLUs Retrieved:2018-2-4.
- Leaky ReLUs allow a small, non-zero gradient when the unit is not active.[1] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ 0.01x & \mbox{otherwise} \end{cases} }[/math]
Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.[2] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ a x & \mbox{otherwise} \end{cases} }[/math]
Note that for [math]\displaystyle{ a\leq1 }[/math], this is equivalent to : [math]\displaystyle{ f(x) = \max(x, ax) }[/math] and thus has a relation to "maxout" networks.
- Leaky ReLUs allow a small, non-zero gradient when the unit is not active.[1] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ 0.01x & \mbox{otherwise} \end{cases} }[/math]
2018d
- (CS231n, 2018) ⇒ Commonly used activation functions. In: CS231n Convolutional Neural Networks for Visual Recognition Retrieved: 2018-01-28.
- QUOTE: Leaky ReLU. Leaky ReLUs are one attempt to fix the “dying ReLU” problem. Instead of the function being zero when [math]\displaystyle{ x \lt 0 }[/math], a leaky ReLU will instead have a small negative slope (of 0.01, or so). That is, the function computes [math]\displaystyle{ f(x)=1(x\lt 0)(\alpha x)+1(x \ge 0)(x) }[/math] where [math]\displaystyle{ \alpha }[/math] is a small constant. Some people report success with this form of activation function, but the results are not always consistent. The slope in the negative region can also be made into a parameter of each neuron, as seen in PReLU neurons, introduced in Delving Deep into Rectifiers, by Kaiming He et al., 2015. However, the consistency of the benefit across tasks is presently unclear.
2017
- (Mate Labs, 2017) ⇒ Mate Labs Aug 23, 2017. Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
- QUOTE: Leaky rectified linear unit (Leaky ReLU) — Leaky ReLUs allow a small, non-zero gradient when the unit is not active. 0.01 is the small non-zero gradient here
[math]\displaystyle{ f(x) = \begin{cases} 0, & \mbox{for } 0.01x \lt 0 \\ x, & \mbox{for } x \geq 0 \end{cases} }[/math]
Range:[math]\displaystyle{ (-\infty, +\infty) }[/math]
- QUOTE: Leaky rectified linear unit (Leaky ReLU) — Leaky ReLUs allow a small, non-zero gradient when the unit is not active. 0.01 is the small non-zero gradient here
- ↑ Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models
- ↑ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification". arXiv:1502.01852 Freely accessible [cs.CV].