Multinomial Logistic (Softmax) Function

A Multinomial Logistic (Softmax) Function is a logistic function that is a multinomial function.

Context:
- It can be used in Multiclass Classification Methods, such as multinomial logistic regression, multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks.
- It can (often) be used as a Multinomial Probability Function.
Example(s):
- Hierarchical Softmax.
- Softmax Activation Function.
- …
Counter-Example(s):
- a Multinomial Logit Function.
- a Linear Function.
- a Classification Tree.
See: Multiclass Classification, Linear Discriminant Analysis, Naive Bayes Classifier, Function Composition.

References

2018b

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Softmax_function Retrieved:2018-2-11.
- In mathematics, the softmax function, or normalized exponential function, is a generalization of the logistic function that "squashes" a -dimensional vector [math]\displaystyle{ \mathbf{z} }[/math] of arbitrary real values to a -dimensional vector [math]\displaystyle{ \sigma(\mathbf{z}) }[/math] of real values in the range [0, 1] that add up to 1. The function is given by :
  [math]\displaystyle{ \sigma:\mathbb{R}^K \to [0,1]^K }[/math]
  : [math]\displaystyle{ \sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} }[/math] for j = 1, …, K.
  In probability theory, the output of the softmax function can be used to represent a categorical distribution – that is, a probability distribution over different possible outcomes. In fact, it is the gradient-log-normalizer of the categorical probability distribution. The softmax function is used in various multiclass classification methods, such as multinomial logistic regression (also known as softmax regression)^[1] [1], multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks. ^[2] Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of distinct linear functions, and the predicted probability for the 'th class given a sample vector $x$ and a weighting vector $w$ is:
  [math]\displaystyle{ P(y=j\mid \mathbf{x}) = \frac{e^{\mathbf{x}^\mathsf{T}\mathbf{w}_j}}{\sum_{k=1}^K e^{\mathbf{x}^\mathsf{T}\mathbf{w}_k}} }[/math] This can be seen as the composition of linear functions [math]\displaystyle{ \mathbf{x} \mapsto \mathbf{x}^\mathsf{T}\mathbf{w}_1, \ldots, \mathbf{x} \mapsto \mathbf{x}^\mathsf{T}\mathbf{w}_K }[/math] and the softmax function (where [math]\displaystyle{ \mathbf{x}^\mathsf{T}\mathbf{w} }[/math] denotes the inner product of [math]\displaystyle{ \mathbf{x} }[/math] and [math]\displaystyle{ \mathbf{w} }[/math]). The operation is equivalent to applying a linear operator defined by [math]\displaystyle{ \mathbf{w} }[/math] to vectors [math]\displaystyle{ \mathbf{x} }[/math], thus transforming the original, probably highly-dimensional, input to vectors in a -dimensional space [math]\displaystyle{ R^K }[/math] .

↑ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer.
↑ ai-faq What is a softmax activation function?

2017

https://isaacchanghau.github.io/2017/05/22/Activation-Functions-in-Artificial-Neural-Networks/
- QUOTE: The Softmax function (Used for multi-classification neural network output), or normalized exponential function, in mathematics, is a generalization of the logistic function that “squashes” a [math]\displaystyle{ K }[/math]-dimensional vector [math]\displaystyle{ z }[/math] from arbitrary real values to a [math]\displaystyle{ K }[/math]-dimensional vector [math]\displaystyle{ σ(z) }[/math] of real values in the range [math]\displaystyle{ [0,1] }[/math] that add up to 1. The function is given by: [math]\displaystyle{ σ(z)_j = \frac{e^{z_j}}{∑_{k=1}^Ke^{z_k}},j = 1,2,…,K }[/math]
  In probability theory, the output of the Softmax function can be used to represent a categorical distribution, that is, a probability distribution over [math]\displaystyle{ K }[/math] different possible outcomes. In fact, it is the gradient-log-normalizer of the categorical probability distribution.
  Here is an example of Softmax application
  
  The softmax function is used in various multiclass classification methods, such as multinomial logistic regression, multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks. Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of K distinct linear functions, and the predicted probability for the [math]\displaystyle{ j }[/math]th class given a sample vector xx and a weighting vector ww is: [math]\displaystyle{ P(y=j|x)=exTwj∑k=1KexTwk }[/math] <BT> This can be seen as the composition of KK linear functions x↦xTw1,…,x↦xTwKx↦xTw1,…,x↦xTwK and the softmax function (where xTwxTw denotes the inner product of xx and ww). The operation is equivalent to applying a linear operator defined by ww to vectors xx, thus transforming the original, probably highly-dimensional, input to vectors in a KK-dimensional space RKRK.

[bishop-1] Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer.

[2] -faq What is a softmax activation function?

[1]

[2]

Multinomial Logistic (Softmax) Function

References

2018b

2017

Navigation menu

Search