Supervised Sequence-Member Classification Algorithm
A Supervised Sequence-Member Classification Algorithm is a supervised classification algorithm that can solve a supervised sequence-member labeling task.
- Context:
- It can range from (typically) being a Supervised Model-based Structured-Input Classification Algorithm to being a Supervised Model-based Structured-Input Classification Algorithm (such as kNN).
- It can be a Supervised Tuple-based Classification Algorithm with
- It can make use of a Supervised Tagging Feature.
- Example(s):
- Counter-Example(s):
- See: Supervised Classification Algorithm.
References
2015
- (Huang et al., 2015) ⇒ Zhiheng Huang, Wei Xu, and Kai Yu. (2015). “Bidirectional LSTM-CRF Models for Sequence Tagging.” In: arXiv preprint arXiv:1508.01991.
- QUOTE: In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence tagging. These models include LSTM networks, bidirectional LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer (LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. ...
2012
- (Graves, 2012) ⇒ Alex Graves. (2012). “Supervised Sequence Labelling with Recurrent Neural Networks". Springer Berlin Heidelberg,
2010
- (Mejer & Crammer, 2010) ⇒ Avihai Mejer, and Koby Crammer. (2010). “Confidence in Structured-prediction Using Confidence-weighted Models.” In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010).
- QUOTE: We employ a general approach (Collins, 2002; Crammer et al., 2009a) to generalize binary classification and use a joined feature mapping of an instance [math]\displaystyle{ x }[/math] and a labeling [math]\displaystyle{ y }[/math] into a common vector space, [math]\displaystyle{ \Phi(x, y) \in \mathbb{R}^d }[/math].
Given an input instance [math]\displaystyle{ x }[/math] and a model [math]\displaystyle{ μ \in \mathbb{R}^d }[/math] we predict the labeling with the highest score, [math]\displaystyle{ \hat{y} = \text{arg max}_z\mu \cdot \bf{\Phi}(x,z) }[/math]. A brute-force approach evaluates the value of the score [math]\displaystyle{ μ \cdot \Phi(x, z) }[/math] for each possible labeling [math]\displaystyle{ z \in \mathcal{Y}^n }[/math], which is not feasible for large values of [math]\displaystyle{ n }[/math]. Instead, we follow standard factorization and restrict the joint mapping to be of the form, [math]\displaystyle{ \Phi(x,y) = \sum^{n}_{p=1} \Phi(x,y_p) + \sum^{n}_{q=2} \Phi(x, y_q, y_q-1) }[/math]. That is, the mapping is a sum of mappings, each taking into consideration only a label of a single part, or two consecutive parts. The time required to compute the max operator is linear in [math]\displaystyle{ n }[/math] and quadratic in [math]\displaystyle{ K }[/math] using the dynamic-programming Viterbi algorithm.
- QUOTE: We employ a general approach (Collins, 2002; Crammer et al., 2009a) to generalize binary classification and use a joined feature mapping of an instance [math]\displaystyle{ x }[/math] and a labeling [math]\displaystyle{ y }[/math] into a common vector space, [math]\displaystyle{ \Phi(x, y) \in \mathbb{R}^d }[/math].
1999
- (Tufis, 1999) ⇒ Dan Tufis. (1999). “Tiered Tagging and Combined Language Models Classifiers.” In: Proceedings of the Second International Workshop on Text, Speech and Dialogue.