Soft Exponential Activation Function

Context:
- It is usually defined as
  [math]\displaystyle{ f(\alpha,x) = \begin{cases} -\frac{\ln(1-\alpha (x + \alpha))}{\alpha} & \text{for } \alpha \lt 0\\ x & \text{for } \alpha = 0\\ \frac{e^{\alpha x} - 1}{\alpha} + \alpha & \text{for } \alpha \gt 0\end{cases} }[/math],
  where [math]\displaystyle{ \alpha }[/math] is a learnable parameter.
- It can (typically) be used in Radial Basis Function Neural Network and Fourier Neural Networks.
Example(s):
- …
Counter-Example(s):
See: Artificial Neural Network, Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.

References

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions Retrieved:2018-2-18.
- The following table compares the properties of several activation functions that are functions of one fold from the previous layer or layers:

Name	Plot	Equation	Derivative (with respect to x)	Range	Order of continuity	Monotonic	Derivative Monotonic	Approximates identity near the origin
(...)	(...)	(...)	(...)	(...)	(...)	(...)	(...)	(...)
SoftExponential ^[1]		[math]\displaystyle{ f(\alpha,x) = \begin{cases} -\frac{\ln(1-\alpha (x + \alpha))}{\alpha} & \text{for } \alpha \lt 0\\ x & \text{for } \alpha = 0\\ \frac{e^{\alpha x} - 1}{\alpha} + \alpha & \text{for } \alpha \gt 0\end{cases} }[/math]	[math]\displaystyle{ f'(\alpha,x) = \begin{cases} \frac{1}{1-\alpha (\alpha + x)} & \text{for } \alpha \lt 0\\ e^{\alpha x} & \text{for } \alpha \ge 0\end{cases} }[/math]	[math]\displaystyle{ (-\infty,\infty) }[/math]	[math]\displaystyle{ C^\infty }[/math]	Yes	Yes	Template:Depends
Sinusoid^[2]		[math]\displaystyle{ f(x)=\sin(x) }[/math]	[math]\displaystyle{ f'(x)=\cos(x) }[/math]	[math]\displaystyle{ [-1,1] }[/math]	[math]\displaystyle{ C^\infty }[/math]	No	No	Yes
Sinc		[math]\displaystyle{ f(x)=\begin{cases} 1 & \text{for } x = 0\\ \frac{\sin(x)}{x} & \text{for } x \ne 0\end{cases} }[/math]	[math]\displaystyle{ f'(x)=\begin{cases} 0 & \text{for } x = 0\\ \frac{\cos(x)}{x} - \frac{\sin(x)}{x^2} & \text{for } x \ne 0\end{cases} }[/math]	[math]\displaystyle{ [\approx-.217234,1] }[/math]	[math]\displaystyle{ C^\infty }[/math]	No	No	No
Gaussian		[math]\displaystyle{ f(x)=e^{-x^2} }[/math]	[math]\displaystyle{ f'(x)=-2xe^{-x^2} }[/math]	[math]\displaystyle{ (0,1] }[/math]	[math]\displaystyle{ C^\infty }[/math]	No	No	No

α is a stochastic variable sampled from a uniform distribution at training time and fixed to the expectation value of the distribution at test time.

(Gofrey & Gashler, 2015) ⇒ Godfrey, L. B., & Gashler, M. S. (2015, November). A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), 2015 7th International Joint Conference on (Vol. 1, pp. 481-486). IEEE, arXiv:1602.01321.
- ABSTRACT: We present the soft exponential activation function for artificial neural networks that continuously interpolates between logarithmic, linear, and exponential functions. This activation function is simple, differentiable, and parameterized so that it can be trained as the rest of the network is trained. We hypothesize that soft exponential has the potential to improve neural network learning, as it can exactly calculate many natural operations that typical neural networks can only approximate, including addition, multiplication, inner product, distance, polynomials, and sinusoids.

↑ Godfrey, Luke B.; Gashler, Michael S. (2016-02-03). "A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks". 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR 1602: 481–486. arXiv:1602.01321. Bibcode 2016arXiv160201321G.
↑ Gashler, Michael S.; Ashmore, Stephen C. (2014-05-09). “Training Deep Fourier Neural Networks To Fit Time-Series Data". arXiv:1405.2262 Freely accessible cs.NE.