2007 LargeScaleSupervisedModelsforNo

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Noun Compound Bracketing Algorithm; Noun Compound Bracketing Task

Notes

Cited By

Quotes

Abstract

Interpreting the structure of noun phrases (NPs) is important for many Natural Language Processing (NLP) tasks. This work extends the state-of-the-art in NP bracketing by: creating supervised models trained on a large annotated corpus; applying these to longer, more complex NPs; and using the resulting system to improve the output of the Bikel (2004) parser.

Using a large corpus of manually annotated Penn Treebank NPs we have developed a supervised model that brackets simple NPs with 93.01% F-score. We extend the evaluation to include longer, more complex NPs that are rarely dealt with in the literature, attaining 91.44% F-score. Finally, we implement a post-processing module that brackets NPs identified by the Bikel (2004) parser, which outperforms the parser itself by 8.13% F-score.

1 Introduction

Noun phrase (NP) bracketing is a requirement for the syntactic and semantic analysis of NPs. In the literature, e.g. Marcus (1980, p253) and Lauer (1995), the task is generally framed as follows: given a 3 word noun phrase like those below, decide whether it is left branching (1) or right branching (2).

((crude oil) prices)(1)

(world (oil prices)) (2)

 NP bracketing is crucial for many Natural Language Processing (NLP) tasks. For example, question answering (QA) and anaphora resolution both require (potentially nested) candidate NPs, typically identified using a parser. If the answer or antecedent is not the complete NP, e.g. crude oil above, then it cannot be found …

2 Background

NP bracketing is similar to chunking (Ramshaw and Marcus, 1995), as both tasks aim to identify NP structure. Recursive NP bracketing, as in the CoNLL 1999 shared task and as performed by Daume III and Marcu (2004) is closer still. However both these task are strictly less difficult than NP bracketing as defined in this paper, as they do not attempt to recover the full extent of sub-NP structure. This is in part because gold-standard annotations for this task have not been available in the past.

A basic method for solving the simple NP bracketing task was first described in Marcus (1980). This adjacency model compares the semantic association of words 1–2 to that between words 2–3. If the former is more likely, then the compound is left branching, otherwise it is right branching. Various methods of measuring the semantic association between a pair of words have been proposed for NP bracketing (Pustejovsky et al., 1993; Resnik, 1993) but they all depend on counting occurrences of bigrams in some corpus. Metrics such as [math]\displaystyle{ \chi^2 }[/math] and mutual information can be used instead of the raw counts, and have been shown to perform well (Nakov and Hearst, 2005a).

 Lauer (1995) proposes a new variation: the dependency model. In this case, we compare the semantic association of words 1–2 to that of words 1–3. This change is motivated by the dependencies that arise from the structure of the NP. We would expect a dependency between words 2–3 whether the compound was left or right branching, so there is no reason to analyse it.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 LargeScaleSupervisedModelsforNoDavid Vadas
James R. Curran
Large-Scale Supervised Models for Noun Phrase Bracketing2007