Scaled Exponential Linear Activation Function
A Scaled Exponential Linear Activation Function is a Rectified-based Activation Function that is based on an Exponential Linear Activation Function.
- AKA: SELU Function, Scaled Exponential Linear Unit Function.
- Context:
- It can (typically) be used in the activation of SELUs.
- Example(s):
torch.nn.SELU
,chainer.functions.selu
- https://github.com/bioinf-jku/SNNs/blob/master/selu.py (Klambauer et al., 2017).
- …
- Counter-Example(s):
- a Clipped Rectifier Unit Activation Function,
- a Concatenated Rectified Linear Activation Function,
- an Exponential Linear Activation Function,
- a Leaky Rectified Linear Activation Function,
- a Noisy Rectified Linear Activation Function,
- a Parametric Rectified Linear Activation Function,
- a Randomized Leaky Rectified Linear Activation Function,
- a Softplus Activation Function,
- a S-shaped Rectified Linear Activation Function.
- See: Artificial Neural Network, Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.
References
2018a
- (Pytorch,2018) ⇒ http://pytorch.org/docs/master/nn.html#selu Retrieved: 2018-2-10.
- QUOTE:
class torch.nn.SELU(inplace=False)
sourceApplies element-wise, [math]\displaystyle{ f(x)=scale∗(max(0,x)+min(0,alpha∗(exp(x)−1))) }[/math], with
alpha=1.6732632423543772848170429916717
andscale=1.0507009873554804934193349852946
.More details can be found in the paper Self-Normalizing Neural Networks.
Parameters:
- inplace(bool, optional) – can optionally do the operation in-place. Default:
False
- inplace(bool, optional) – can optionally do the operation in-place. Default:
- QUOTE:
- Shape:
- Input: (N,∗) where * means, any number of additional dimensions
- Output: (N,∗), same shape as the input
- Examples:
- Shape:
>>> m = nn.SELU() >>> input = autograd.Variable(torch.randn(2)) >>> print(input) >>> print(m(input))
2018b
- (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.functions.selu.html Retrieved:2018-2-18
- QUOTE:
chainer.functions.selu(x, alpha=1.6732632423543772, scale=1.0507009873554805)
source Scaled Exponential Linear Unit function.
For parameters [math]\displaystyle{ \alpha }[/math] and [math]\displaystyle{ \lambda }[/math], it is expressed as
[math]\displaystyle{ f(x) = \lambda \begin{cases} x, & \mbox{if } x \ge 0 \\ \alpha(\exp(x)−1), & \mbox{if } x \lt 0 \end{cases} }[/math].
See: https://arxiv.org/abs/1706.02515
Parameters:
- x (Variable or
numpy.ndarray
orcupy.ndarray
) – Input variable. A [math]\displaystyle{ (s_1,s_2,\cdots,s_N) }[/math]-shaped float array. - alpha (float) – Parameter [math]\displaystyle{ \alpha }[/math].
- scale (float) – Parameter [math]\displaystyle{ \lambda }[/math].
- x (Variable or
- QUOTE:
- Returns: Output variable. A [math]\displaystyle{ (s_1,s_2,\cdots,s_N) }[/math]-shaped float array.
- Return type: Variable
2017a
- (Mate Labs, 2017) ⇒ Mate Labs Aug 23, 2017. Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
- QUOTE: Scaled Exponential Linear Unit (SELU)
Range: [math]\displaystyle{ (-\lambda\alpha,+\infty) }[/math]
[math]\displaystyle{ f(x) = \begin{cases} \alpha(e^x-1)x & \mbox{if } x \lt 0 \\ x & \mbox{if } x\ge 0 \end{cases} }[/math]
with [math]\displaystyle{ \lambda=1.0507 }[/math] and [math]\displaystyle{ \alpha=1.67326 }[/math]
- QUOTE: Scaled Exponential Linear Unit (SELU)
2017b
- (Klambauer et al., 2017) ⇒ Klambauer, G., Unterthiner, T., Mayr, A., & Hochreiter, S. (2017). Self-normalizing neural networks. In Advances in Neural Information Processing Systems (pp. 972-981) arXiv:1706.02515.
- ABSTRACT: Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are "scaled exponential linear units" (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, we prove that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero mean and unit variance -- even under the presence of noise and perturbations. This convergence property of SNNs allows to (1) train deep networks with many layers, (2) employ strong regularization, and (3) to make learning highly robust. Furthermore, for activations not close to unit variance, we prove an upper and lower bound on the variance, thus, vanishing and exploding gradients are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set. The winning SNN architectures are often very deep. Implementations are available at: https://github.com/bioinf-jku/SNNs.