Gaussian Activation Function

Jump to navigation Jump to search

A Gaussian Activation Function is a neuron activation function based on a Gaussian function, i.e. [math]\displaystyle{ f(x)=\exp(-x^2) }[/math].



  • chat
    • Q: what is a Gaussian Activation Function?
    • A: A Gaussian activation function is a mathematical function that maps the inputs of a neural network to outputs in a Gaussian distribution. This function is often used in the hidden layers of a neural network to introduce non-linearity, as it can help the network to better fit more complex patterns in the data. The Gaussian activation function is a popular choice in many deep learning models and is a type of activation function commonly used in artificial neural networks.


  • chat
    • Q: What are its pros and cons?
    • A:
    • Pros of Gaussian Activation Function:
      • Smoothness: The Gaussian activation function is continuous and smooth, which means that it is differentiable everywhere and has a well-behaved gradient. This property makes it easy for optimization algorithms to converge to the optimal solution.
      • Non-Linearity: Gaussian activation function introduces non-linearity into the network, allowing it to model complex relationships between inputs and outputs.
    • Cons of Gaussian Activation Function:
      • Complexity: The Gaussian activation function can increase the complexity of the network and may require more computational resources to train.
      • Overfitting: If the network becomes too complex, it may start to overfit the training data, meaning that it memorizes the training data instead of learning general patterns.
      • Local Minima: Gaussian activation function may also have a tendency to get stuck in local minima, which can make optimization more difficult and may result in sub-optimal solutions.


Name Plot Equation Derivative (with respect to x) Range Order of continuity Monotonic Derivative Monotonic Approximates identity near the origin
Identity [math]\displaystyle{ f(x)=x }[/math] [math]\displaystyle{ f'(x)=1 }[/math] [math]\displaystyle{ (-\infty,\infty) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes Yes Yes
Binary step [math]\displaystyle{ f(x) = \begin{cases} 0 & \text{for } x \lt 0\\ 1 & \text{for } x \ge 0\end{cases} }[/math] [math]\displaystyle{ f'(x) = \begin{cases} 0 & \text{for } x \ne 0\\ ? & \text{for } x = 0\end{cases} }[/math] [math]\displaystyle{ \{0,1\} }[/math] [math]\displaystyle{ C^{-1} }[/math] Yes No No
Logistic (a.k.a. Soft step) [math]\displaystyle{ f(x)=\frac{1}{1+e^{-x}} }[/math] [math]\displaystyle{ f'(x)=f(x)(1-f(x)) }[/math] [math]\displaystyle{ (0,1) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes No No
(...) (...) (...) (...) (...) (...) (...) (...) (...)
Gaussian [math]\displaystyle{ f(x)=e^{-x^2} }[/math] [math]\displaystyle{ f'(x)=-2xe^{-x^2} }[/math] [math]\displaystyle{ (0,1] }[/math] [math]\displaystyle{ C^\infty }[/math] No No No

Here, H is the Heaviside step function.

α is a stochastic variable sampled from a uniform distribution at training time and fixed to the expectation value of the distribution at test time.