2002 RankingAlgorithmsforNamedEntity
- (Collins, 2002) ⇒ Michael Collins. (2002). “Ranking Algorithms for Named-entity Extraction: Boosting and the Voted Perceptron.” In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. doi:10.3115/1073083.1073165
Subject Headings: Voted Perceptron Algorithm, Boosting Algorithm, Named Entity Recognition Task.
Notes
- This is a companion paper to (Collins, 2002b)
Cited By
- ~198 http://scholar.google.com/scholar?q=%22Ranking+algorithms+for+named-entity+extraction%3A+boosting+and+the+voted+perceptron%22+2002
- http://dl.acm.org/citation.cfm?id=1073083.1073165&preflayout=flat#citedby
2007
- (Nadeau & Sekine, 2007) ⇒ David Nadeau, and Satoshi Sekine. (2007). “A Survey of Named Entity Recognition and Classification.” In: Lingvisticae Investigationes, 30(1).
- QUOTE: Pattern features were introduced by M. Collins (2002) and then used by others (W. Cohen & Sarawagi 2004 and B. Settles 2004). Their role is to map words onto a small set of patterns over character types. For instance, a pattern feature might map all uppercase letters to “A”, all lowercase letters to “a”, all digits to “0” and all punctuation to “-”:
x = "G.M.": GetPattern(x) = "A-A-"
x = "Machine-223": GetPattern(x) = "Aaaaaaa-000"
The summarized pattern feature is a condensed form of the above in which consecutive character types are not repeated in the mapped string. For instance, the preceding examples become:
x = "G.M.": GetSummarizedPattern(x) = "A-A-"
;x = "Machine-223": GetSummarizedPattern(x) = "Aa-0"
- QUOTE: Pattern features were introduced by M. Collins (2002) and then used by others (W. Cohen & Sarawagi 2004 and B. Settles 2004). Their role is to map words onto a small set of patterns over character types. For instance, a pattern feature might map all uppercase letters to “A”, all lowercase letters to “a”, all digits to “0” and all punctuation to “-”:
2005
- (Collins & Koo, 2005) ⇒ Michael Collins, and Terry Koo. (2005). “Discriminative Reranking for Natural Language Parsing.” In: Computational Linguistics, 31(1) doi:10.1162/0891201053630273
- QUOTE: Perceptron-based algorithms, or the voted perceptron approach of Freund and Schapire (1999), are another alternative to boosting and LogLoss methods. See Collins (2002a, 2002b) and Collins and Duffy (2001, 2002) for applications of the perceptron algorithm. Collins (2002b) gives convergence proofs for the methods; Collins (2002a) directly compares the boosting and perceptron approaches on a named entity task; and Collins and Duffy (2001, 2002) use a reranking approach with kernels, which allow representations of parse trees or labeled sequences in very-high-dimensional spaces.
2004
- (Peng & McCallum, 2004) ⇒ Fuchun Peng, and Andrew McCallum. (2004). “Accurate Information Extraction from Research Papers using Conditional Random Fields.” In: Proceedings of HLT-NAACL Conference (HLT-NAACL 2004).
Quotes
Abstract
This paper describes algorithms which rerank the top N hypotheses from a maximum-entropy tagger, the application being the recovery of named-entity boundaries in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples.
1. Introduction
Recent work in statistical approaches to parsing and tagging has begun to consider methods which incorporate global features of candidate structures. Examples of such techniques are Markov Random Fields (Abney 1997; Della Pietra et al. 1997; Johnson et al. 1999), and boosting algorithms (Freund et al. 1998; Collins 2000; Walker et al. 2001). One appeal of these methods is their flexibility in incorporating features into a model: essentially any features which might be useful in discriminating good from bad structures can be included. A second appeal of these methods is that their training criterion is often discriminative, attempting to explicitly push the score or probability of the correct structure for each training sentence above the score of competing structures. This discriminative property is shared by the methods of (Johnson et al. 1999; Collins 2000), and also the Conditional Random Field methods of (Lafferty et al. 2001).
In a previous paper (Collins 2000), a boosting algorithm was used to rerank the output from an existing statistical parser, giving significant improvements in parsing accuracy on Wall Street Journal data. Similar boosting algorithms have been applied to natural language generation, with good results, in (Walker et al. 2001). In this paper we apply reranking methods to named-entity extraction. A state-of-the-art (maximum-entropy) tagger is used to generate 20 possible segmentations for each input sentence, along with their probabilities. We describe a number of additional global features of these candidate segmentations. These additional features are used as evidence in reranking the hypotheses from the max-ent tagger. We describe two learning algorithms: the boosting method of (Collins 2000), and a variant of the voted perceptron algorithm, which was initially described in (Freund & Schapire 1999). We applied the methods to a corpus of over one million words of tagged web data. The methods give significant improvements over the maximum-entropy tagger (a 17.7% relative reduction in error-rate for the voted perceptron, and a 15.6% relative improvement for the boosting method).
One contribution of this paper is to show that existing reranking methods are useful for a new domain, named-entity tagging, and to suggest global features which give improvements on this task. We should stress that another contribution is to show that a new algorithm, the voted perceptron, gives very credible results on a natural language task. It is an extremely simple algorithm to implement, and is very fast to train (the testing phase is slower, but by no means sluggish). It should be a viable alternative to methods such as the boosting or Markov Random Field algorithms described in previous work.
2.2 The baseline tagger
The problem can be framed as a tagging task – to tag each word as being either the start of an entity, a continuation of an entity, or not to be part of an entity at all (we will use the tags S, C and N respectively for these three cases). As a baseline model we used a maximum entropy tagger, very similar to the ones described in (Ratnaparkhi 1996; Borthwick et. al 1998; McCallum et al. 2000). Max-ent taggers have been shown to be highly competitive on a number of tagging tasks, such as part-of-speech tagging (Ratnaparkhi 1996), named-entity recognition (Borthwick et. al 1998), and information extraction tasks (McCallum et al. 2000). Thus the maximum-entropy tagger we used represents a serious baseline for the task.
…
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2002 RankingAlgorithmsforNamedEntity | Michael Collins | Ranking Algorithms for Named-entity Extraction: Boosting and the Voted Perceptron | 10.3115/1073083.1073165 | 2002 |