F-Measure

An F-Measure is a performance metric for binary classification functions that is based on the harmonic mean for the classifier's precision and recall.

AKA: Fβ-Metric.
Context:
- Input(s):
  - True Positives, False Positives, and False Negatives (or recall and precision).
  - Positive Real weight β (a Non-Negative Real), otherwise set to One.
- Output: an F Score.
- It can be calculated as :[math]\displaystyle{ F_\beta = \frac{(1 + \beta^2)\times TP}{(1 + \beta^2)TP + \beta^2FN + 1FP} }[/math]
Example(s):
- an F1 Metric, when [math]\displaystyle{ \beta=1 }[/math]
- an F2 Metric, when [math]\displaystyle{ \beta=2 }[/math] (which emphasizes precision over recall)
- an F0.5 Metric, when [math]\displaystyle{ \beta=0.5 }[/math] (which emphasizes recall over precision).
- …
Counter-Example(s):
- a Recall Metric.
- a Precision Metric.
- an Accuracy Metric.
- a PMI Measure.
- an F-Statistic.
See: Tagging Task, Annotation Task.

References

2011

(Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/F1_score
- QUOTE:In statistics, the F₁ score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of results that should have been returned. The F₁ score can be interpreted as a weighted average of the precision and recall, where an F₁ score reaches its best value at 1 and worst score at 0.
  The traditional F-measure or balanced F-score (F₁ score) is the harmonic mean of precision and recall: :[math]\displaystyle{ F = 2 \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{\mathrm{precision} + \mathrm{recall}} }[/math].
  The general formula for positive real [math]\displaystyle{ β }[/math] is: :[math]\displaystyle{ F_\beta = (1 + \beta^2) \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{(\beta^2 \cdot \mathrm{precision}) + \mathrm{recall}} }[/math].
  The formula in terms of Type I and type II errors: :[math]\displaystyle{ F_\beta = \frac {(1 + \beta^2) \cdot \mathrm{true\ positive} }{((1 + \beta^2) \cdot \mathrm{true\ positive} + \beta^2 \cdot \mathrm{false\ negative} + \mathrm{false\ positive})}\, }[/math].
  Two other commonly used F measures are the [math]\displaystyle{ F_{2} }[/math] measure, which weights recall higher than precision, and the [math]\displaystyle{ F_{0.5} }[/math] measure, which puts more emphasis on precision than recall.
  The F-measure was derived so that [math]\displaystyle{ F_\beta }[/math] "measures the effectiveness of retrieval with respect to a user who attaches [math]\displaystyle{ β }[/math] times as much importance to recall as precision" ^[1]. It is based on van Rijsbergen's effectiveness measure :[math]\displaystyle{ E = 1 - \left(\frac{\alpha}{P} + \frac{1-\alpha}{R}\right)^{-1} }[/math].
  Their relationship is [math]\displaystyle{ F_\beta = 1 - E }[/math] where [math]\displaystyle{ \alpha=\frac{1}{1 + \beta^2} }[/math].

2009

(Hu et al., 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557066
- QUOTE:Cluster quality is evaluated by three metrics, purity [14], F-score [10], and normalized mutual information (NMI) [15]. … F-score combines the information of precision and recall which is extensively applied in information retrieval. … All the three metrics range from 0 to 1, and the higher their value, the better the clustering quality is.

↑ van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth.

[1] van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth.

[1]

F-Measure

References

2011

2009

Navigation menu

Search