2003 HeadDrivenStatModelsForNLP
Jump to navigation
Jump to search
- (Collins, 2003) ⇒ Michael Collins. (2003). “Head-Driven Statistical Models for Natural Language Parsing.” In: Computational Linguistics, 29(4). doi:10.1162/089120103322753356.
Subject Headings: Statistical Parsing Algorithm, Head-Driven Grammar, Generative Statistical Model, Probabilistic Context-Free Grammar.
Notes
Cited By
~1,300 http://scholar.google.com/scholar?cites=5109188117478372275
2005
- (Collins & Koo, 2005) ⇒ Michael Collins, and Terry Koo. (2005). “Discriminative Reranking for Natural Language Parsing.” In: Computational Linguistics, 31(1) doi:10.1162/0891201053630273
Quotes
Abstract
- This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.
1. Introduction
- Ambiguity is a central problem in natural language parsing. Combinatorial effects mean that even relatively short sentences can receive a considerable number of parses under a wide-coverage grammar. Statistical parsing approaches tackle the ambiguity problem by assigning a probability to each parse tree, thereby ranking competing trees in order of plausibility. In many statistical models the probability for each candidate tree is calculated as a product of terms, each term corresponding to some substructure within the tree. The choice of parameterization is essentially the choice of how to represent parse trees. There are two critical questions regarding the parameterization of a parsing approach:
- 1. Which linguistic objects (e.g., context-free rules, parse moves) should the model’s parameters be associated with? In other words, which features should be used to discriminate among alternative parse trees?
- 2. How can this choice be instantiated in a sound probabilistic model?
- In this article we explore these issues within the framework of generative models, more precisely, the history-based models originally introduced to parsing by Black et al. (1992). In a history-based model, a parse tree is represented as a sequence of decisions, the decisions being made in some derivation of the tree. Each decision has an associated probability, and the product of these probabilities defines a probability distribution over possible derivations.
- We first describe three parsing models based on this approach. The models were originally introduced in Collins (1997); the current article1 gives considerably more detail about the models and discusses them in greater depth. In Model 1 we show one approach that extends methods from probabilistic context-free grammars (PCFGs) to lexicalized grammars. Most importantly, the model has parameters corresponding to dependencies between pairs of headwords. We also show how to incorporate a “distance” measure into these models, by generalizing the model to a history-based approach. The distance measure allows the model to learn a preference for close attachment, or right-branching structures.
- In Model 2, we extend the parser to make the complement/adjunct distinction, which will be important for most applications using the output from the parser. Model 2 is also extended to have parameters corresponding directly to probability distributions over subcategorization frames for headwords. The new parameters lead to an improvement in accuracy.
- In Model 3 we give a probabilistic treatment of wh-movement that is loosely based on the analysis of wh-movement in generalized phrase structure grammar (GPSG) (Gazdar et al. 1985). The output of the parser is now enhanced to show trace coindexations in wh-movement cases. The parameters in this model are interesting in that they correspond directly to the probability of propagating GPSG-style slash features through parse trees, potentially allowing the model to learn island constraints. In the three models a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then follow naturally, leading to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. For this reason we refer to the models as head-driven statistical models*.
- We describe evaluation of the three models on the Penn Wall Street Journal Treebank (Marcus, Santorini, and Marcinkiewicz 1993). Model 1 achieves 87.7% constituent precision and 87.5% consituent recall on sentences of up to 100 words in length in section 23 of the treebank, and Models 2 and 3 give further improvements to 88.3% constituent precision and 88.0% constituent recall. These results are competitive with those of other models that have been applied to parsing the Penn Treebank. Models 2 and 3 produce trees with information about wh-movement or subcategorization. Many NLP applications will need this information to extract predicate-argument structure from parse trees.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2003 HeadDrivenStatModelsForNLP | Michael Collins | Head-Driven Statistical Models for Natural Language Parsing | http://www.aclweb.org/anthology/J/J03/J03-4003.pdf | 10.1162/089120103322753356 |