Softmax Activation Function

From GM-RKB
(Redirected from softmax activation function)
Jump to navigation Jump to search

A Softmax Activation Function is a neuron activation function that is based on a Softmax function (that can convert an input into a posterior probability, i.e. [math]\displaystyle{ f_i(x)=\dfrac{\exp(x_i)}{\sum_j\exp(x_j)} }[/math]).



References

2018a

  • (Pyttorch, 2018) ⇒ http://pytorch.org/docs/master/nn.html#softmax
    • QUOTE: class torch.nn.Softmax(dim=None) source

      Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range (0,1) and sum to 1

       Softmax is defined as [math]\displaystyle{ f_i(x)=\dfrac{\exp(x_i)}{\sum_j\exp(x_j)} }[/math]

      Shape:

      *** Input: any shape

      • Output: same as input
Returns: a Tensor of the same dimension and shape as the input with values in the range [0, 1]

Parameters: dim(int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).

Note

This module doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use Logsoftmax instead (it’s faster and has better numerical properties).

Examples:

>>> m = nn.Softmax()
>>> input = autograd.Variable(torch.randn(2, 3))
>>> print(input)
>>> print(m(input))

2018b

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Softmax_function#Artificial_neural_networks Retrieved:2018-2-11.
    • The softmax function is often used in the final layer of a neural network-based classifier. Such networks are commonly trained under a log loss (or cross-entropy) regime, giving a non-linear variant of multinomial logistic regression. Since the function maps a vector and a specific index i to a real value, the derivative needs to take the index into account: [math]\displaystyle{ \frac{\partial}{\partial q_k}\sigma(\textbf{q}, i) = \cdots = \sigma(\textbf{q}, i)(\delta_{ik} - \sigma(\textbf{q}, k)) }[/math]
      Here, the Kronecker delta is used for simplicity (cf. the derivative of a sigmoid function, being expressed via the function itself).
      See Multinomial logit for a probability model which uses the softmax activation function.

2018c

2017

2013