Bent Identity Activation Function

A Bent Identity Activation Function is a neuron activation function that is based on the mathematical function: [math]\displaystyle{ f(x)=\frac{\sqrt{x^2 + 1} - 1}{2} + x }[/math]. [math]\displaystyle{ }[/math].

Context:
- It can (typically) be used in the activation of Bent Identity Neurons.
Example(s):
- …
Counter-Example(s):
See: Artificial Neural Network, Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.

References

2018

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions Retrieved:2018-2-18.
- The following table compares the properties of several activation functions that are functions of one fold from the previous layer or layers:

Name	Plot	Equation	Derivative (with respect to x)	Range	Order of continuity	Monotonic	Derivative Monotonic	Approximates identity near the origin
Identity		[math]\displaystyle{ f(x)=x }[/math]	[math]\displaystyle{ f'(x)=1 }[/math]	[math]\displaystyle{ (-\infty,\infty) }[/math]	[math]\displaystyle{ C^\infty }[/math]	Yes	Yes	Yes
Binary step		[math]\displaystyle{ f(x) = \begin{cases} 0 & \text{for } x \lt 0\\ 1 & \text{for } x \ge 0\end{cases} }[/math]	[math]\displaystyle{ f'(x) = \begin{cases} 0 & \text{for } x \ne 0\\ ? & \text{for } x = 0\end{cases} }[/math]	[math]\displaystyle{ \{0,1\} }[/math]	[math]\displaystyle{ C^{-1} }[/math]	Yes	No	No
Logistic (a.k.a. Soft step)		[math]\displaystyle{ f(x)=\frac{1}{1+e^{-x}} }[/math]	[math]\displaystyle{ f'(x)=f(x)(1-f(x)) }[/math]	[math]\displaystyle{ (0,1) }[/math]	[math]\displaystyle{ C^\infty }[/math]	Yes	No	No
(...)	(...)	(...)	(...)	(...)	(...)	(...)	(...)	(...)
Adaptive piecewise linear (APL) ^[1]		[math]\displaystyle{ f(x) = \max(0,x) + \sum_{s=1}^{S}a_i^s \max(0, -x + b_i^s) }[/math]	[math]\displaystyle{ f'(x) = H(x) - \sum_{s=1}^{S}a_i^s H(-x + b_i^s) }[/math]	[math]\displaystyle{ (-\infty,\infty) }[/math]	[math]\displaystyle{ C^0 }[/math]	No	No	No
SoftPlus^[2]		[math]\displaystyle{ f(x)=\ln(1+e^x) }[/math]	[math]\displaystyle{ f'(x)=\frac{1}{1+e^{-x}} }[/math]	[math]\displaystyle{ (0,\infty) }[/math]	[math]\displaystyle{ C^\infty }[/math]	Yes	Yes	No
Bent identity		[math]\displaystyle{ f(x)=\frac{\sqrt{x^2 + 1} - 1}{2} + x }[/math]	[math]\displaystyle{ f'(x)=\frac{x}{2\sqrt{x^2 + 1}} + 1 }[/math]	[math]\displaystyle{ (-\infty,\infty) }[/math]	[math]\displaystyle{ C^\infty }[/math]	Yes	Yes	Yes
SoftExponential ^[3]		[math]\displaystyle{ f(\alpha,x) = \begin{cases} -\frac{\ln(1-\alpha (x + \alpha))}{\alpha} & \text{for } \alpha \lt 0\\ x & \text{for } \alpha = 0\\ \frac{e^{\alpha x} - 1}{\alpha} + \alpha & \text{for } \alpha \gt 0\end{cases} }[/math]	[math]\displaystyle{ f'(\alpha,x) = \begin{cases} \frac{1}{1-\alpha (\alpha + x)} & \text{for } \alpha \lt 0\\ e^{\alpha x} & \text{for } \alpha \ge 0\end{cases} }[/math]	[math]\displaystyle{ (-\infty,\infty) }[/math]	[math]\displaystyle{ C^\infty }[/math]	Yes	Yes	Template:Depends
Sinusoid^[4]		[math]\displaystyle{ f(x)=\sin(x) }[/math]	[math]\displaystyle{ f'(x)=\cos(x) }[/math]	[math]\displaystyle{ [-1,1] }[/math]	[math]\displaystyle{ C^\infty }[/math]	No	No	Yes
Sinc		[math]\displaystyle{ f(x)=\begin{cases} 1 & \text{for } x = 0\\ \frac{\sin(x)}{x} & \text{for } x \ne 0\end{cases} }[/math]	[math]\displaystyle{ f'(x)=\begin{cases} 0 & \text{for } x = 0\\ \frac{\cos(x)}{x} - \frac{\sin(x)}{x^2} & \text{for } x \ne 0\end{cases} }[/math]	[math]\displaystyle{ [\approx-.217234,1] }[/math]	[math]\displaystyle{ C^\infty }[/math]	No	No	No
Gaussian		[math]\displaystyle{ f(x)=e^{-x^2} }[/math]	[math]\displaystyle{ f'(x)=-2xe^{-x^2} }[/math]	[math]\displaystyle{ (0,1] }[/math]	[math]\displaystyle{ C^\infty }[/math]	No	No	No

Here, H is the Heaviside step function.

α is a stochastic variable sampled from a uniform distribution at training time and fixed to the expectation value of the distribution at test time.

2017

(Mate Labs, 2017) ⇒ Mate Labs Aug 23, 2017. Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
- QUOTE: Bent identity
  Range: [math]\displaystyle{ (-\infty,+\infty) }[/math]
  [math]\displaystyle{ f(x)=\frac{\sqrt{x^2 + 1} - 1}{2} + x }[/math]

2016

(Chen, 2016) ⇒ Chen, J. (2016). "Combinatorially Generated Piecewise Activation Functions". arXiv preprint arXiv:1605.05216.
- QUOTE: For this study, I created a pool of seven non-parametric canonical activation functions (Sine, Sigmoid, ArcTan, TanH, Bent identity, ReLU, and ELU) that could be paired together to generate piecewise smooth functions, so I could observe different mathematical properties. Each time the algorithm mutated a new node into the network, the node selected two functions uniformly at random from the pool for the resting state and the active state. This meant the algorithm could generate 49 different piecewise activation functions. For a network of [math]\displaystyle{ n }[/math] nodes, there are [math]\displaystyle{ 49^n }[/math] possible configurations of activation functions for a single topology. This exponential search space further justifies my choice for NEAT, which was designed to handle high-dimensional search.

↑ Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi (21 Dec 2014). “Learning Activation Functions to Improve Deep Neural Networks". arXiv:1412.6830 Freely accessible cs.NE.
↑ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011). "Deep sparse rectifier neural networks" (PDF). International Conference on Artificial Intelligence and Statistics.
↑ Godfrey, Luke B.; Gashler, Michael S. (2016-02-03). "A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks". 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR 1602: 481–486. arXiv:1602.01321. Bibcode 2016arXiv160201321G.
↑ Gashler, Michael S.; Ashmore, Stephen C. (2014-05-09). “Training Deep Fourier Neural Networks To Fit Time-Series Data". arXiv:1405.2262 Freely accessible cs.NE.

[1] Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi (21 Dec 2014). “Learning Activation Functions to Improve Deep Neural Networks". arXiv:1412.6830 Freely accessible cs.NE.

[2] Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011). "Deep sparse rectifier neural networks" (PDF). International Conference on Artificial Intelligence and Statistics.

[3] Godfrey, Luke B.; Gashler, Michael S. (2016-02-03). "A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks". 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR 1602: 481–486. arXiv:1602.01321. Bibcode 2016arXiv160201321G.

[4] Gashler, Michael S.; Ashmore, Stephen C. (2014-05-09). “Training Deep Fourier Neural Networks To Fit Time-Series Data". arXiv:1405.2262 Freely accessible cs.NE.

[1]

[2]

[3]

[4]

Bent Identity Activation Function

References

2018

2017

2016

Navigation menu

Search