Loss Function Selection Task

From GM-RKB
Jump to navigation Jump to search

A Loss Function Selection Task is a function selection task that selects a loss function.



References

2021

  • (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/loss_function#Selecting_a_loss_function Retrieved:2021-3-8.
    • Sound statistical practice requires selecting an estimator consistent with the actual acceptable variation experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances. A common example involves estimating “location". Under typical statistical assumptions, the mean or average is the statistic for estimating location that minimizes the expected loss experienced under the squared-error loss function, while the median is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances. In economics, when an agent is risk neutral, the objective function is simply expressed as the expected value of a monetary quantity, such as profit, income, or end-of-period wealth. For risk-averse or risk-loving agents, loss is measured as the negative of a utility function, and the objective function to be optimized is the expected value of utility. Other measures of cost are possible, for example mortality or morbidity in the field of public health or safety engineering. For most optimization algorithms, it is desirable to have a loss function that is globally continuous and differentiable.

      Two very commonly used loss functions are the squared loss, [math]\displaystyle{ L(a) = a^2 }[/math] , and the absolute loss, [math]\displaystyle{ L(a)=|a| }[/math] . However the absolute loss has the disadvantage that it is not differentiable at [math]\displaystyle{ a=0 }[/math] . The squared loss has the disadvantage that it has the tendency to be dominated by outliers — when summing over a set of [math]\displaystyle{ a }[/math] 's (as in [math]\displaystyle{ \sum_{i=1}^n L(a_i) }[/math] ), the final sum tends to be the result of a few particularly large a-values, rather than an expression of the average a-value.

      The choice of a loss function is not arbitrary. It is very restrictive and sometimes the loss function may be characterized by its desirable properties. [1] Among the choice principles are, for example, the requirement of completeness of the class of symmetric statistics in the case of i.i.d. observations, the principle of complete information, and some others.

       W. Edwards Deming and Nassim Nicholas Taleb argue that empirical reality, not nice mathematical properties, should be the sole basis for selecting loss functions, and real losses often aren't mathematically nice and aren't differentiable, continuous, symmetric, etc. For example, a person who arrives before a plane gate closure can still make the plane, but a person who arrives after can not, a discontinuity and asymmetry which makes arriving slightly late much more costly than arriving slightly early. In drug dosing, the cost of too little drug may be lack of efficacy, while the cost of too much may be tolerable toxicity, another example of asymmetry. Traffic, pipes, beams, ecologies, climates, etc. may tolerate increased load or stress with little noticeable change up to a point, then become backed up or break catastrophically. These situations, Deming and Taleb argue, are common in real-life problems, perhaps more common than classical smooth, continuous, symmetric, differentials cases.

  1. Detailed information on mathematical principles of the loss function choice is given in Chapter 2 of the book (and references there).