Gaussian Activation Function
Jump to navigation
Jump to search
A Gaussian Activation Function is a neuron activation function based on a Gaussian function, i.e. [math]\displaystyle{ f(x)=\exp(-x^2) }[/math].
- Context:
- It can (often) be used in Neural Network Hidden Layers (to introduce non-linearity).
- It can (typically) be used in the activation of Gaussian Neurons.
- It can enable Optimization Algorithms to converge to the optimal solution.
- …
- Example(s):
- …
- Counter-Example(s):
- See: Artificial Neuron, Neural Network Topology, Neural Network Layer.
References
2023
- chat
- Q: what is a Gaussian Activation Function?
- A: A Gaussian activation function is a mathematical function that maps the inputs of a neural network to outputs in a Gaussian distribution. This function is often used in the hidden layers of a neural network to introduce non-linearity, as it can help the network to better fit more complex patterns in the data. The Gaussian activation function is a popular choice in many deep learning models and is a type of activation function commonly used in artificial neural networks.
2023
- chat
- Q: What are its pros and cons?
- A:
- Pros of Gaussian Activation Function:
- Smoothness: The Gaussian activation function is continuous and smooth, which means that it is differentiable everywhere and has a well-behaved gradient. This property makes it easy for optimization algorithms to converge to the optimal solution.
- Non-Linearity: Gaussian activation function introduces non-linearity into the network, allowing it to model complex relationships between inputs and outputs.
- Cons of Gaussian Activation Function:
- Complexity: The Gaussian activation function can increase the complexity of the network and may require more computational resources to train.
- Overfitting: If the network becomes too complex, it may start to overfit the training data, meaning that it memorizes the training data instead of learning general patterns.
- Local Minima: Gaussian activation function may also have a tendency to get stuck in local minima, which can make optimization more difficult and may result in sub-optimal solutions.
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions Retrieved:2018-2-12.
- The following table compares the properties of several activation functions that are functions of one fold from the previous layer or layers:
Name | Plot | Equation | Derivative (with respect to x) | Range | Order of continuity | Monotonic | Derivative Monotonic | Approximates identity near the origin |
---|---|---|---|---|---|---|---|---|
Identity | [math]\displaystyle{ f(x)=x }[/math] | [math]\displaystyle{ f'(x)=1 }[/math] | [math]\displaystyle{ (-\infty,\infty) }[/math] | [math]\displaystyle{ C^\infty }[/math] | Yes | Yes | Yes | |
Binary step | [math]\displaystyle{ f(x) = \begin{cases} 0 & \text{for } x \lt 0\\ 1 & \text{for } x \ge 0\end{cases} }[/math] | [math]\displaystyle{ f'(x) = \begin{cases} 0 & \text{for } x \ne 0\\ ? & \text{for } x = 0\end{cases} }[/math] | [math]\displaystyle{ \{0,1\} }[/math] | [math]\displaystyle{ C^{-1} }[/math] | Yes | No | No | |
Logistic (a.k.a. Soft step) | [math]\displaystyle{ f(x)=\frac{1}{1+e^{-x}} }[/math] | [math]\displaystyle{ f'(x)=f(x)(1-f(x)) }[/math] | [math]\displaystyle{ (0,1) }[/math] | [math]\displaystyle{ C^\infty }[/math] | Yes | No | No | |
(...) | (...) | (...) | (...) | (...) | (...) | (...) | (...) | (...) |
Gaussian | [math]\displaystyle{ f(x)=e^{-x^2} }[/math] | [math]\displaystyle{ f'(x)=-2xe^{-x^2} }[/math] | [math]\displaystyle{ (0,1] }[/math] | [math]\displaystyle{ C^\infty }[/math] | No | No | No |
Here, H is the Heaviside step function.
α is a stochastic variable sampled from a uniform distribution at training time and fixed to the expectation value of the distribution at test time.