Multiclass Cross-Entropy Measure

A Multiclass Cross-Entropy Measure is a dispersion measure that quantifies the average number of bits needed to identify an event from a set of possibilities.

AKA: Relative Entropy, [math]\displaystyle{ H(P,Q) }[/math].
Context:
- It can range from being a Normalized Cross-Entropy to being an Unnormalized Cross-Entropy.
- ...
- It can generalize the Log-Loss Function for Multi-Class Classification tasks.
- It can measure the performance of a Classification Model with multiple classes.
- It can evaluate the discrepancy between the True Probability Distribution and the Predicted Probability Distribution over multiple classes.
- It can be defined mathematically as [math]\displaystyle{ H(P, Q) = -\sum_{i} P(i) \log Q(i) }[/math], where [math]\displaystyle{ P }[/math] is the True Distribution and [math]\displaystyle{ Q }[/math] is the Predicted Distribution.
- It can relate to the concept of Information Entropy and extend the Binary Cross-Entropy to multi-class problems.
- It can be used in conjunction with the Softmax Activation Function in the output layer of a Neural Network.
- ...
Example(s):
- Theano's implementation: theano.tensor.nnet.nnet.categorical_crossentropy(), which computes the multiclass cross-entropy loss between predicted and true distributions.
- An implementation in PyTorch using torch.nn.CrossEntropyLoss() for multi-class classification problems.
- An implementation in TensorFlow using tf.keras.losses.CategoricalCrossentropy().
- ...
Counter-Example(s):
- An Accuracy Measure, which only accounts for the number of correct predictions and ignores probability distributions.
- A Binary Cross-Entropy, used for binary classification tasks.
- A Mean Squared Error, commonly used in regression tasks rather than classification.
See: Cross-Entropy Loss Function, Information Entropy, Probability Distribution, Bit, Kullback–Leibler Divergence, Discrete Random Variable, Continuous Random Variable, Joint Entropy, Perplexity Measure, Squared Error

References

2017

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/cross_entropy Retrieved:2017-6-7.
- In information theory, the cross entropy between two probability distributions [math]\displaystyle{ p }[/math] and [math]\displaystyle{ q }[/math] over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an "unnatural" probability distribution [math]\displaystyle{ q }[/math] , rather than the "true" distribution [math]\displaystyle{ p }[/math] .
  The cross entropy for the distributions [math]\displaystyle{ p }[/math] and [math]\displaystyle{ q }[/math] over a given set is defined as follows: : [math]\displaystyle{ H(p, q) = \operatorname{E}_p[-\log q] = H(p) + D_{\mathrm{KL}}(p \| q),\! }[/math] where [math]\displaystyle{ H(p) }[/math] is the entropy of [math]\displaystyle{ p }[/math] , and [math]\displaystyle{ D_{\mathrm{KL}}(p \| q) }[/math] is the Kullback–Leibler divergence of [math]\displaystyle{ q }[/math] from [math]\displaystyle{ p }[/math] (also known as the relative entropy of p with respect to q — note the reversal of emphasis).
  For discrete [math]\displaystyle{ p }[/math] and [math]\displaystyle{ q }[/math] this means : [math]\displaystyle{ H(p, q) = -\sum_x p(x)\, \log q(x). \! }[/math] The situation for continuous distributions is analogous. We have to assume that [math]\displaystyle{ p }[/math] and [math]\displaystyle{ q }[/math] are absolutely continuous with respect to some reference measure [math]\displaystyle{ r }[/math] (usually [math]\displaystyle{ r }[/math] is a Lebesgue measure on a Borel σ-algebra). Let [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math] be probability density functions of [math]\displaystyle{ p }[/math] and [math]\displaystyle{ q }[/math] with respect to [math]\displaystyle{ r }[/math] . Then : [math]\displaystyle{ -\int_X P(x)\, \log Q(x)\, dr(x) = \operatorname{E}_p[-\log Q]. \! }[/math] NB: The notation [math]\displaystyle{ H(p,q) }[/math] is also used for a different concept, the joint entropy of [math]\displaystyle{ p }[/math] and [math]\displaystyle{ q }[/math]

2017

http://deeplearning.net/software/theano/library/tensor/nnet/nnet.html#theano.tensor.nnet.nnet.categorical_crossentropy
- QUOT:E: Return the cross-entropy between an approximating distribution and a true distribution. The cross entropy between two probability distributions measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the “true” distribution p. Mathematically, this function computes H(p,q) = - \sum_x p(x) \log(q(x)), where p=true_dist and q=coding_dist.

Multiclass Cross-Entropy Measure

References

2017

2017

2011a

2011b

2004

Navigation menu

Search