Supervised Multilabel Text Classification Task

AKA: Multi-Label Text Categorization.
Context:
- It can be solved by a Supervised Multilabel Text Classification Algorithm.
Example(s):
- Each document may belong to several predefined document topics, such as "Health", "Sports", "Local Politics", "National Politics", "Global Politics".
- …
Counter-Example(s):
- a Supervised Unilabel Text Classification Task.
- a Supervised Multiclass Text Classification Task.
See: Multilabel Text Classification Algorithm, Unilabel Text Classification Task.

References

(Bekkerman et al., 2011) ⇒ Ron Bekkerman, and Matan Gavish. (2011). “High-precision Phrase-based Document Classification on a Modern Scale.” In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2011). doi:10.1145/2020408.2020449
- QUOTE: The problem of multilabel text classification is defined as follows. Each document from an unlabeled corpus [math]\displaystyle{ D }[/math] is to be categorized into one or more classes from a class set [math]\displaystyle{ C }[/math]. More formally, a rule [math]\displaystyle{ L : D \rightarrow 2^C }[/math] is to be learned from training data. Also available is a labeled test set [math]\displaystyle{ D_{test} = {( d_i, C^*_i )} }[/math], where [math]\displaystyle{ \vert D_{test}\vert \ll \vert D \vert }[/math] and each [math]\displaystyle{ C^*_i \subset C }[/math] is a set of [math]\displaystyle{ d_i }[/math]’s ground truth classes. Performance of the classification rule [math]\displaystyle{ L }[/math] is evaluated on the test set by comparing each [math]\displaystyle{ L(d_i) }[/math] with [math]\displaystyle{ C^*_i }[/math].