Multinomial Logistic (Softmax) Function

From GM-RKB
Jump to navigation Jump to search

A Multinomial Logistic (Softmax) Function is a logistic function that is a multinomial function.



References

2018b

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Softmax_function Retrieved:2018-2-11.
    • In mathematics, the softmax function, or normalized exponential function, is a generalization of the logistic function that "squashes" a -dimensional vector [math]\displaystyle{ \mathbf{z} }[/math] of arbitrary real values to a -dimensional vector [math]\displaystyle{ \sigma(\mathbf{z}) }[/math] of real values in the range [0, 1] that add up to 1. The function is given by :

      [math]\displaystyle{ \sigma:\mathbb{R}^K \to [0,1]^K }[/math]

      : [math]\displaystyle{ \sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} }[/math]    for j = 1, …, K.

      In probability theory, the output of the softmax function can be used to represent a categorical distribution – that is, a probability distribution over different possible outcomes. In fact, it is the gradient-log-normalizer of the categorical probability distribution. The softmax function is used in various multiclass classification methods, such as multinomial logistic regression (also known as softmax regression)[1] [1], multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks. [2] Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of distinct linear functions, and the predicted probability for the 'th class given a sample vector x and a weighting vector w is:

      [math]\displaystyle{ P(y=j\mid \mathbf{x}) = \frac{e^{\mathbf{x}^\mathsf{T}\mathbf{w}_j}}{\sum_{k=1}^K e^{\mathbf{x}^\mathsf{T}\mathbf{w}_k}} }[/math] This can be seen as the composition of linear functions [math]\displaystyle{ \mathbf{x} \mapsto \mathbf{x}^\mathsf{T}\mathbf{w}_1, \ldots, \mathbf{x} \mapsto \mathbf{x}^\mathsf{T}\mathbf{w}_K }[/math] and the softmax function (where [math]\displaystyle{ \mathbf{x}^\mathsf{T}\mathbf{w} }[/math] denotes the inner product of [math]\displaystyle{ \mathbf{x} }[/math] and [math]\displaystyle{ \mathbf{w} }[/math]). The operation is equivalent to applying a linear operator defined by [math]\displaystyle{ \mathbf{w} }[/math] to vectors [math]\displaystyle{ \mathbf{x} }[/math], thus transforming the original, probably highly-dimensional, input to vectors in a -dimensional space [math]\displaystyle{ R^K }[/math] .

  1. Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer.
  2. ai-faq What is a softmax activation function?

2017