2007 LargeScaleSupervisedModelsforNo

(Vadas & Curran, 2007) ⇒ David Vadas, and James R. Curran. (2007). “Large-Scale Supervised Models for Noun Phrase Bracketing .” In: Proceedings of 10th Conference of the Pacific Association for Computational Linguistics (PACLING).

Subject Headings: Noun Compound Bracketing Algorithm; Noun Compound Bracketing Task

Notes

Using a large corpus of manually annotated Penn Treebank NPs we have developed a supervised model that brackets simple NPs with 93.01% F-score. We extend the evaluation to include longer, more complex NPs that are rarely dealt with in the literature, attaining 91.44% F-score. Finally, we implement a post-processing module that brackets NPs identified by the Bikel (2004) parser, which outperforms the parser itself by 8.13% F-score.

Noun phrase (NP) bracketing is a requirement for the syntactic and semantic analysis of NPs. In the literature, e.g. Marcus (1980, p253) and Lauer (1995), the task is generally framed as follows: given a 3 word noun phrase like those below, decide whether it is left branching (1) or right branching (2).

((crude oil) prices)(1)

(world (oil prices)) (2)

NP bracketing is crucial for many Natural Language Processing (NLP) tasks. For example, question answering (QA) and anaphora resolution both require (potentially nested) candidate NPs, typically identified using a parser. If the answer or antecedent is not the complete NP, e.g. crude oil above, then it cannot be found …

NP bracketing is similar to chunking (Ramshaw and Marcus, 1995), as both tasks aim to identify NP structure. Recursive NP bracketing, as in the CoNLL 1999 shared task and as performed by Daume III and Marcu (2004) is closer still. However both these task are strictly less difficult than NP bracketing as defined in this paper, as they do not attempt to recover the full extent of sub-NP structure. This is in part because gold-standard annotations for this task have not been available in the past.

A basic method for solving the simple NP bracketing task was first described in Marcus (1980). This adjacency model compares the semantic association of words 1–2 to that between words 2–3. If the former is more likely, then the compound is left branching, otherwise it is right branching. Various methods of measuring the semantic association between a pair of words have been proposed for NP bracketing (Pustejovsky et al., 1993; Resnik, 1993) but they all depend on counting occurrences of bigrams in some corpus. Metrics such as [math]\displaystyle{ \chi^2 }[/math] and mutual information can be used instead of the raw counts, and have been shown to perform well (Nakov and Hearst, 2005a).

Lauer (1995) proposes a new variation: the dependency model. In this case, we compare the semantic association of words 1–2 to that of words 1–3. This change is motivated by the dependencies that arise from the structure of the NP. We would expect a dependency between words 2–3 whether the compound was left or right branching, so there is no reason to analyse it.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 LargeScaleSupervisedModelsforNo	David Vadas James R. Curran			Large-Scale Supervised Models for Noun Phrase Bracketing						2007