2005 DiscriminativeRerankingForNLP
Jump to navigation
Jump to search
- (Collins & Koo, 2005) ⇒ Michael Collins, Terry Koo. (2005). “Discriminative Reranking for Natural Language Parsing.” In: Computational Linguistics, 31(1) doi:10.1162/0891201053630273
Subject Headings: Reranking Algorithm, Probabilistic Natural Language Parser.
Notes
Cited By
Quotes
Abstract
- This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this initial ranking, using additional features of the tree as evidence. The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account. We introduce a new method for the reranking task, based on the boosting approach to ranking problems described in Freund et al. (1998). We apply the boosting method to parsing the Wall Street Journal treebank. The method combined the log-likelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000 features over parse trees that were not included in the original model. The new model achieved 89.75% F-measure, a 13% relative decrease in F-measure error over the baseline model's score of 88.2%. The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach. We argue that the method is an appealing alternative-in terms of both simplicity and efficiency-to work on feature selection methods within log-linear (maximum-entropy) models. Although the experiments in this article are on natural language parsing (NLP), the approach should be applicable to many other NLP problems which are naturally framed as ranking tasks, for example, speech recognition, machine translation, or natural language generation.
2. History-based Models
- Before discussing the reranking approaches, we describe history-based models (Black et al. 1992). They are important for a few reasons. First, several of the best-performing parsers on the WSJ treebank (e.g., Ratnaparkhi 1997; Charniak 1997, 2000; Collins 1997, 1999; Henderson 2003) are cases of history-based models. Many systems applied to part-of-speech tagging, speech recognition, and other language or speech tasks also fall into this class of model. Second, a particular history-based model (that of Collins [1999]) is used as the initial model for our approach. Finally, it is important to describe history-based models — and to explain their limitations — to motivate our departure from them.
3. Logistic Regression and Boosting
- We now turn to machine-learning methods for the ranking task. In this section we review two methods for binary classification problems: logistic regression (or maximum-entropy) models and boosting. These methods form the basis for the reranking approaches described in later sections of the article. Maximum-entropy models are a very popular method within the computational linguistics community; see, for example, Berger, Della Pietra, and Della Pietra (1996) for an early article which introduces the models and motivates them. Boosting approaches to classification have received considerable attention in the machine-learning community since the introduction of AdaBoost by Freund and Schapire (1997).
- Boosting algorithms, and in particular the relationship between boosting algorithms and maximum-entropy models, are perhaps not familiar topics in the NLP literature. However there has recently been much work drawing connections between the two methods (Friedman, Hastie, and Tibshirani 2000; Lafferty 1999; Duffy and Helmbold 1999; Mason, Bartlett, and Baxter 1999; Lebanon and Lafferty 2001; Collins, Schapire, and Singer 2002); in this section we review this work. Much of this work has focused on binary classification problems, and this section is also restricted to problems of this type. Later in the article we show how several of the ideas can be carried across to reranking problems.
6.5 Boosting, Perceptron, and Support Vector Machine Approaches for Ranking Problems
- Freund et al. (1998) introduced a formulation of boosting for ranking problems. The problem we have considered is a special case of the problem in Freund et al. (1998), in that we have considered a binary distinction between candidates (i.e., the best parse vs. other parses), whereas Freund et al. consider learning full or partial orderings over candidates. The improved algorithm that we introduced in Figure 4 is, however, a new algorithm that could perhaps be generalized to the full problem of Freund et al. (1998); we leave this to future research.
- Altun, Hofmann, and Johnson (2003) and Altun, Johnson, and Hofmann (2003) describe experiments on tagging tasks using the ExpLoss function, in contrast to the LogLoss function used in Lafferty, McCallum, and Pereira (2001). Altun, Hofmann, and Johnson (2003) describe how dynamic programming methods can be used to calculate gradients of the ExpLoss function even in cases in which the set of candidates again includes all possible tagged sequences, a set which grows exponentially in size with the length of the sentence being tagged. Results in Altun, Johnson, and Hofmann (2003) suggest that the choice of ExpLoss versus LogLoss does not have a major impact on accuracy for the tagging task in question.
- Perceptron-based algorithms, or the voted perceptron approach of Freund and Schapire (1999), are another alternative to boosting and LogLoss methods. See Collins (2002a, 2002b) and Collins and Duffy (2001, 2002) for applications of the perceptron algorithm. Collins (2002b) gives convergence proofs for the methods; Collins (2002a) directly compares the boosting and perceptron approaches on a named entity task; and Collins and Duffy (2001, 2002) use a reranking approach with kernels, which allow representations of parse trees or labeled sequences in very-high-dimensional spaces.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2005 DiscriminativeRerankingForNLP | Michael Collins Terry Koo | Discriminative Reranking for Natural Language Parsing | http://www.aclweb.org/anthology/J/J05/J05-1003.pdf | 10.1162/0891201053630273 |