Logistic (Log) Loss Function
A Logistic (Log) Loss Function is a convex loss function that is defined as the negative log-likelihood of a logistic model.
- AKA: Binary Cross-Entropy.
- Context:
- input: [math]\displaystyle{ y_{train}$, [[training data]]. ** [[Function Output|output]]: \lt math\gt y_{pred}$, a [[Log Loss Value]] (predicted [[probabiliti]]es for each [[training data]] value, it ranges from 0 to 1). ** It can [[measure]] the [[performance]] of a [[classification model]] whose [[output]] is a [[probability]] value between 0 and 1. ** It can be used in [[binary classification]] tasks with [[predicted probabilities]]. ** It can be calculated using the formula: \lt math\gt L = -[y \log(p) + (1 - y) \log(1 - p)] }[/math], where [math]\displaystyle{ y }[/math] is the true label and [math]\displaystyle{ p }[/math] is the predicted probability.
- It can be minimized during Logistic Regression Training and other probabilistic classification models.
- It can produce a Convex Optimization Problem, facilitating efficient optimization algorithms.
- It can be generalized to multiclass classification using the cross-entropy loss function.
- It can be interpreted as the uncertainty between the true distribution and the predicted distribution.
- It can handle imbalanced datasets better than some other loss functions due to its probabilistic nature.
- ...
- Example(s):
- An implementation in Theano:
theano.tensor.nnet.nnet.sigmoid_binary_crossentropy()
, which computes the logistic loss in a numerically stable way. - An implementation in Scikit-Learn:
sklearn.metrics.log_loss()
, which provides a function to compute the log loss given true labels and predicted probabilities. - A Python-based Logistic (Log) Loss Function:
import numpy as np
def log_loss(y_true, y_pred, eps=1e-15):
y_pred = np.clip(y_pred, eps, 1 - eps)
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) - ...
- An implementation in Theano:
- Counter-Example(s):
- An Exponential Loss Function, used in AdaBoost algorithms for boosting weak learners.
- A Hinge Loss Function, as used by Support Vector Machines for maximizing the margin between classes.
- A Mean Squared Error loss function, commonly used in regression tasks rather than classification.
- A Kullback-Leibler Divergence Loss Function, which measures how one probability distribution diverges from a second, expected probability distribution.
- a Huber Loss Function,
- a Savage Loss Function,
- a Square Loss Function,
- a Tangent Loss Function.
- See: Squared Error Function, Cross-Entropy Measure, Mean Absolute Error, Mean Squared Error.
References
2021a
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss Retrieved:2021-3-7.
- The logistic loss function can be generated using (2) and Table-I as follows : \begin{align} \phi(v) &= C[f^{-1}(v)]+\left(1-f^{-1}(v)\right)\, C'\left[f^{-1}(v)\right] \\ &= \frac{1}{\log(2)}\left [\frac{-e^v}{1+e^v}\log\frac{e^v}{1+e^v}-\left(1-\frac{e^v}{1+e^v}\right)\log\left(1-\frac{e^v}{1+e^v}\right)\right ]+\left(1-\frac{e^v}{1+e^v}\right) \left [\frac{-1}{\log(2)}\log\left(\frac{\frac{e^v}{1+e^v}}{1-\frac{e^v}{1+e^v}}\right)\right] \\ &=\frac{1}{\log(2)}\log(1+e^{-v}). \end{align} The logistic loss is convex and grows linearly for negative values which make it less sensitive to outliers. The logistic loss is used in the LogitBoost algorithm.
The minimizer of I[f] for the logistic loss function can be directly found from equation (1) as : f^*_\text{Logistic}= \log\left(\frac{\eta}{1-\eta}\right)=\log\left(\frac{p(1\mid x)}{1-p(1\mid x)}\right). This function is undefined when p(1\mid x)=1 or p(1\mid x)=0 (tending toward ∞ and −∞ respectively), but predicts a smooth curve which grows when p(1\mid x) increases and equals 0 when p(1\mid x)= 0.5 .
It's easy to check that the logistic loss and binary cross entropy loss (Log loss) are in fact the same (up to a multiplicative constant \frac{1}{\log(2)} ). The cross entropy loss is closely related to the Kullback–Leibler divergence between the empirical distribution and the predicted distribution. The cross entropy loss is ubiquitous in modern deep neural networks.
- The logistic loss function can be generated using (2) and Table-I as follows : \begin{align} \phi(v) &= C[f^{-1}(v)]+\left(1-f^{-1}(v)\right)\, C'\left[f^{-1}(v)\right] \\ &= \frac{1}{\log(2)}\left [\frac{-e^v}{1+e^v}\log\frac{e^v}{1+e^v}-\left(1-\frac{e^v}{1+e^v}\right)\log\left(1-\frac{e^v}{1+e^v}\right)\right ]+\left(1-\frac{e^v}{1+e^v}\right) \left [\frac{-1}{\log(2)}\log\left(\frac{\frac{e^v}{1+e^v}}{1-\frac{e^v}{1+e^v}}\right)\right] \\ &=\frac{1}{\log(2)}\log(1+e^{-v}). \end{align} The logistic loss is convex and grows linearly for negative values which make it less sensitive to outliers. The logistic loss is used in the LogitBoost algorithm.
2021b
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Cross_entropy Retrieved:2021-3-7.
- In information theory, the cross-entropy between two probability distributions and over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution , rather than the true distribution p .
2021c
- (ML Glossary, 2021) ⇒ https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html Retrieved:2021-03-06.
- QUOTE: Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. (...).
Cross-entropy and log loss are slightly different depending on context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing.
Code
def CrossEntropy(yHat, y):if y == 1:
return -log(yHat)
else:
return -log(1 - yHat)
In binary classification, where the number of classes $M$ equals 2, cross-entropy can be calculated as:
$−\left(y\log\left(p\right)+\left(1−y\right)\log\left(1−p\right)\right)$If $M>2$ (i.e. multiclass classification), we calculate a separate loss for each class label per observation and sum the result.
$−\displaystyle \sum_{c=1}^My_{o,c}\log\left(p_{o,c}\right)$
- QUOTE: Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. (...).
2018a
- (Fast AI, 2018a) ⇒ http://wiki.fast.ai/index.php/Log_Loss
- QUOTE: Logarithmic loss (related to cross-entropy) measures the performance of a classification model where the prediction input is a probability value between 0 and 1. The goal of our machine learning models is to minimize this value. A perfect model would have a log loss of 0. Log loss increases as the predicted probability diverges from the actual label. So predicting a probability of 0.012 when the actual observation label is 1 would be bad and result in a high log loss. There is a more detailed explanation of the justifications and math behind log loss here. …
… To calculate log loss from scratch, we need to include the MinMax function (see below). Numpy implements this for us with np.clip()
def logloss(true_label, predicted, eps=1e-15):p = np.clip(predicted, eps, 1 - eps)
if true_label == 1:
return -log(p)
else:
return -log(1 - p)
- QUOTE: Logarithmic loss (related to cross-entropy) measures the performance of a classification model where the prediction input is a probability value between 0 and 1. The goal of our machine learning models is to minimize this value. A perfect model would have a log loss of 0. Log loss increases as the predicted probability diverges from the actual label. So predicting a probability of 0.012 when the actual observation label is 1 would be bad and result in a high log loss. There is a more detailed explanation of the justifications and math behind log loss here. …
2018b
- (DeepLearning,2018) ⇒ http://deeplearning.net/software/theano/library/tensor/nnet/nnet.html#theano.tensor.nnet.nnet.sigmoid_binary_crossentropy
- QUOTE: It is equivalent to binary_crossentropy(sigmoid(output), target), but with more efficient and numerically stable computation, especially when taking gradients.
2017a
- (WikiFastAI) ⇒ http://wiki.fast.ai/index.php/Log_Loss#Log_Loss_vs_Cross-Entropy
- QUOTE: Log loss and cross-entropy are slightly different depending on the context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing. As a demonstration, where p and q are the sets p∈{y, 1−y} and q∈{ŷ, 1−ŷ} we can rewrite cross-entropy as:
- p = set of true labels
- q = set of prediction
- y = true label
- ŷ = predicted prob
- Which is exactly the same as log loss!
- QUOTE: Log loss and cross-entropy are slightly different depending on the context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing. As a demonstration, where p and q are the sets p∈{y, 1−y} and q∈{ŷ, 1−ŷ} we can rewrite cross-entropy as:
2017b
- (Kaggle, 2017) ⇒ https://www.kaggle.com/c/bioresponse/discussion/1831
- QUOTE:
from math import log
def log_loss(predicted, target): if len(predicted) != len(target): print 'lengths not equal!' return target = [float(x) for x in target] # make sure all float values predicted = [min([max([x,1e-15]),1-1e-15]) for x in predicted] # within (0,1) interval return -(1.0/len(target))*sum([target[i]*log(predicted[i]) + \ (1.0-target[i])*log(1.0-predicted[i]) \ for i in xrange(len(target))])
if __name__=='__main__': # if you run at the command line as 'python utils.py' actual = [0, 1, 1, 1, 1, 0, 0, 1, 0, 1] pred = [0.24160452, 0.41107934, 0.37063768, 0.48732519, 0.88929869, 0.60626423, 0.09678324, 0.38135864, 0.20463064, 0.21945892] print log_loss(pred,actual)
2016
- (Program Creek, 2016) ⇒ https://www.programcreek.com/python/example/86075/sklearn.metrics.log_loss
- QUOTE
def log_loss(solution, prediction, task = 'binary.classification'): Log loss for binary and multiclass. [sample_num, label_num] = solution.shape eps = 1e-15
pred = np.copy(prediction) # beware: changes in prediction occur through this sol = np.copy(solution) if (task == 'multiclass.classification') and (label_num>1): # Make sure the lines add up to one for multi-class classification norma = np.sum(prediction, axis=1) for k in range(sample_num): pred[k,:] /= sp.maximum (norma[k], eps) # Make sure there is a single label active per line for multi-class classification sol = binarize_predictions(solution, task='multiclass.classification') # For the base prediction, this solution is ridiculous in the multi-label case
# Bounding of predictions to avoid log(0),1/0,... pred = sp.minimum (1-eps, sp.maximum (eps, pred)) # Compute the log loss pos_class_log_loss = - mvmean(sol*np.log(pred), axis=0) if (task != 'multiclass.classification') or (label_num==1): # The multi-label case is a bunch of binary problems. # The second class is the negative class for each column. neg_class_log_loss = - mvmean((1-sol)*np.log(1-pred), axis=0) log_loss = pos_class_log_loss + neg_class_log_loss # Each column is an independent problem, so we average. # The probabilities in one line do not add up to one. # log_loss = mvmean(log_loss) # print('binary {}'.format(log_loss)) # In the multilabel case, the right thing i to AVERAGE not sum # We return all the scores so we can normalize correctly later on else: # For the multiclass case the probabilities in one line add up one. log_loss = pos_class_log_loss # We sum the contributions of the columns. log_loss = np.sum(log_loss) #print('multiclass {}'.format(log_loss)) return log_loss
2015
- (SciKit-Learn, 2015) ⇒ http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html
- Log loss, aka logistic loss or cross-entropy loss.
- QUOTE: This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of the true labels given a probabilistic classifier’s predictions. For a single sample with true label yt in {0,1} and estimated probability yp that yt = 1, the log loss is : [math]\displaystyle{ -\log P(yt|yp) = -(yt log(yp) + (1 - yt) log(1 - yp)) }[/math]
2014
- (Kaggle, 2014) ⇒ https://www.kaggle.com/wiki/LogarithmicLoss
- QUOTE: [math]\displaystyle{ \operatorname{log loss} = -\frac{1}{N}\sum_{i=1}^N\sum_{j=1}^My_{ij}\log(p_{ij}) }[/math]