Parametric Rectified Linear Activation Function
A Parametric Rectified Linear Activation Function is a Rectified-based Activation Function that is based on the mathematical function: [math]\displaystyle{ f(x)=max(0,x)+\alpha∗min(0,x) }[/math], where [math]\displaystyle{ \alpha }[/math] is a Neural Network Learnable Parameter.
- AKA: PReLU.
- Context:
- It can (typically) be used in the activation of Parametric Rectified Linear Neurons.
- Example(s):
- a
torch.nn.PReLU
(a PyTorch implementation), - a
chainer.functions.prelu
(a Chainer implementation), - …
- a
- Counter-Example(s):
- a Clipped Rectifier Unit Activation Function.
- a Concatenated Rectified Linear Activation Function,
- an Exponential Linear Activation Function,
- a Leaky Rectified Linear Activation Function,
- a Noisy Rectified Linear Activation Function,
- a Randomized Leaky Rectified Linear Activation Function,
- a Scaled Exponential Linear Activation Function,
- a Softplus Activation Function,
- a S-shaped Rectified Linear Activation Function.
- a Maxout Activation Function.
- See: Artificial Neural Network, Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.
References
2018a
- (Pytorch,2018) & rArr; http://pytorch.org/docs/master/nn.html#prelu Retrieved: 2018-2-18.
- QUOTE:
class torch.nn.PReLU(num_parameters=1, init=0.25)
sourceApplies element-wise the function [math]\displaystyle{ PReLU(x)=max(0,x)+a∗min(0,x) }[/math] Here “[math]\displaystyle{ a }[/math]” is a learnable parameter. When called without arguments,
nn.PReLU()
uses a single parameter “[math]\displaystyle{ a }[/math]” across all input channels. If called withnn.PReLU(nChannels)
, a separate “[math]\displaystyle{ a }[/math]” is used for each input channel.Note:
weight decay should not be used when learning “[math]\displaystyle{ a }[/math]” for good performance.
Parameters:
- num_parameters – number of “[math]\displaystyle{ a }[/math]” to learn. Default: 1
- init – the initial value of “[math]\displaystyle{ a }[/math]”. Default: 0.25
- QUOTE:
- Shape:
- Input: (N,∗) where * means, any number of additional dimensions
- Output: (N,∗), same shape as the input
- Examples:
- Shape:
>>> m = nn.PReLU()
>>> input = autograd.Variable(torch.randn(2)) >>> print(input) >>> print(m(input)) |
2018b
- (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.functions.prelu.html Retrieved:2018-2-18
- QUOTE:
chainer.functions.prelu(x, W)
sourceIt accepts two arguments: an input
x
and a weight arrayW
and computes the output as [math]\displaystyle{ PReLU(x)=max(x,W∗x) }[/math], where ∗ is an elementwise multiplication for each sample in the batch.When the PReLU function is combined with two-dimensional convolution, the elements of parameter
W
are typically shared across the same filter of different pixels. In order to support such usage, this function supports the shape of parameter array that indicates leading dimensions of input arrays except the batch dimension.For example, if [math]\displaystyle{ W }[/math] has the shape of [math]\displaystyle{ (2,3,4) }[/math], [math]\displaystyle{ x }[/math] must have the shape of [math]\displaystyle{ (B,2,3,4,S1,...,SN) }[/math] where [math]\displaystyle{ B }[/math] is the batch size and the number of trailing [math]\displaystyle{ S }[/math]‘s </math>N</math> is an arbitrary non-negative integer.
Parameters:
- x(Variable) – Input variable. Its first argument is assumed to be the minibatch dimension.
- W (Variable) – Weight variable.
- QUOTE:
- Returns: Output variable.
- Return type: Variable.
- See also: PReLU
2018c
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Leaky_ReLUs Retrieved:2018-2-4.
- Leaky ReLUs allow a small, non-zero gradient when the unit is not active.[1] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ 0.01x & \mbox{otherwise} \end{cases} }[/math]
Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.[2] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ a x & \mbox{otherwise} \end{cases} }[/math]
Note that for [math]\displaystyle{ a\leq1 }[/math], this is equivalent to : [math]\displaystyle{ f(x) = \max(x, ax) }[/math] and thus has a relation to "maxout" networks.
- Leaky ReLUs allow a small, non-zero gradient when the unit is not active.[1] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ 0.01x & \mbox{otherwise} \end{cases} }[/math]
2017
- (Mate Labs, 2017) ⇒ Mate Labs Aug 23, 2017. Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
- QUOTE: Parametric Rectified Linear Unit(PReLU) — It makes the coefficient of leakage into a parameter that is learned along with the other neural network parameters. Alpha(α) is the coefficient of leakage here.
For [math]\displaystyle{ \alpha\leq 1 \quad f(x) = max(x, \alpha x) }[/math]
Range:[math]\displaystyle{ (-\infty, +\infty) }[/math]
[math]\displaystyle{ f(\alpha, x) = \begin{cases} \alpha x, & \mbox{for } x \lt 0 \\ x, & \mbox{for } x \geq 0 \end{cases} }[/math]
- QUOTE: Parametric Rectified Linear Unit(PReLU) — It makes the coefficient of leakage into a parameter that is learned along with the other neural network parameters. Alpha(α) is the coefficient of leakage here.
2015
- (He et al., 2015) ⇒ He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on computer vision (pp. 1026-1034).
- ABSTRACT: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
- ↑ Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models
- ↑ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification". arXiv:1502.01852 Freely accessible [cs.CV].