Noun Compound Bracketing Task
A Noun Compound Bracketing Task is a Linguistic Syntactic Parsing Task that can detect groups of nouns within a Compound Nouns and with three nouns or more.
- AKA: NCBT, Noun Phrase Parsing Task, NC Bracketing, Noun Compound Bracketing, Compound Noun Bracketing Task.
- Context:
- Task Input: a Linguistic Expression, noun phrase, or Compound Noun.
- Task Output:
- a Word Mention String resulting from a binary decision between grouping the input nouns (e.g. right-bracketing or left-bracketing).
- optional: indication of the Head Noun.
- Task Requirements:
- It can be solved by a Noun Compound Bracketing System (that implements a Noun Compound Bracketing Algorithm).
- It can be supported by:
- Example(s):
- A 3-word NCBT. The input is a 3-word linguistic expression (
A B C
), the output is either[A [B C]
(right-bracketing interpretation) or[[A B] C]
(left-bracketing interpretation). e.g.:- NCBT (
woman aid worker
) =[woman [aid worker]]
(right-bracketing). - NCBT (
copper alloy rod
) =[copper alloy] rod]
(left-bracketing). - NCBT (
world oil prices)
) =[world [oil prices]]
(right-bracketing). - NCBT (
crude oil prices
) =[crude oil] prices]
(left-bracketing).
- NCBT (
- NCBT(
"human embryonic stem cell research is topical"
) ⇒"[[human [embryonic [stem cell]]] research] is topical."
. - NCBT(
"This personal data policy has six main sections."
) ⇒"This [[personal data] policy] has six main sections."
. - NCBT(
"This personal data base has six main tables."
) ⇒"This [personal [[data base]] has six main tables."
. - … ⇒
"The [popular [[high school]] musical]] is … “
- … ⇒ “
That [funny ([[life insurance]] company] employee]] ..
”- notice that “insurance company” will likely also exist in a lexicon but “life insurance company” is incorrect.
- … ⇒ “
The [ex-[attorney generall]] is inside.
” - … ⇒ “
The test was positive for [anti-[[yellow fever]] virus] antibodies] and negative for [anti-DENV antibodies].
” - … ⇒ “
That is a [[red herring fallacy]] argument].
” - … ⇒ “
It is a [very big [[fast food restaurant]] chain]]"
. - … ⇒ “
[U.S. [[commercial [real estate]] [loan defaults and delinquencies]]) skyrocketed in 2008.
" - … ⇒ “
We need an [integrated [[mass [rapid transit]] system]].
” - … ⇒ “
([[Non rapid eye movement]] sleep] homeostasis] plays a role in perceptual learning.
” - … ⇒ “
[[Cancer causing]] [food additives]] are everywhere.
”. - … ⇒ “
The [never finished] [too expensive] [terrace house] extension is there.
”
- A 3-word NCBT. The input is a 3-word linguistic expression (
- Counter-Example(s):
- See: Prepositional Phrase Attachment, Noun Phrase Coordination, Natural Language Processing Task, Computational Linguistics, Computer Speech Processing, Word Sense Disambiguation, Sentiment Analysis, Word Similarity, Keyword Extraction, Text Summarization, Text Analysis.
References
2016
- (Fares, 2016) ⇒ Murhaf Fares. (2016). “A Dataset for Joint Noun-Noun Compound Bracketing and Interpretation.” In: Proceedings of 54th Annual Meeting of the Association for Computational Linguistics - ACL 2016 Student Research Workshop.
- QUOTE: Noun-noun compound bracketing can be defined as the disambiguation of the internal structure of compounds with three nouns or more. For example, we can bracket the compound noon fashion show in two ways:
- 1. Left-bracketing:
[[noon fashion] show]
- 2. Right-bracketing:
[noon [fashion show]]
- 1. Left-bracketing:
- In this example, the right-bracketing interpretation (
a fashion show happening at noon
) is more likely than the left-bracketing one (a show of noon fashion
). However, the correct bracketing need not always be as obvious, some compounds can be subtler to bracket, e.g.car radio equipment
(Girju et al., 2005).
2014a
- (Barriere & Menard, 2014) ⇒ Caroline Barriere, and Pierre Andre Menard. (2014). “Multiword Noun Compound Bracketing Using Wikipedia.” In: Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014).
- QUOTE: The noun compound bracketing task consists in determining related subgroups of nouns within a larger compound. For example (from Lauer (1995)),
(woman (aid worker))
requires a right-bracketing interpretation, contrarily to((copper alloy) rod)
requiring a left-bracketing interpretation. When only three words are used,n1 n2 n3
, bracketing is defined as a binary decision between grouping(n1,n2)
or grouping(n2,n3)
. Two models, described in early work by Lauer (1995), are commonly used to inform such decision: the adjacency model and the dependency model. The former compares probabilities (or more loosely, strength of association) of two alternative adjacent noun compounds, that ofn1 n2
and ofn2 n3
. The latter compares probabilities of two alternative dependencies, either betweenn1
andn3
or betweenn2
andn3
.
- QUOTE: The noun compound bracketing task consists in determining related subgroups of nouns within a larger compound. For example (from Lauer (1995)),
2014b
- (Menard & Barriere, 2014) ⇒ Pierre Andre Menard, and aCaroline Barriere. (2014). “Linked Open Data and Web Corpus Data for Noun Compound Bracketing.” In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014).
- QUOTE: In the field of computational linguistics, large corpora have been shown to be quite good for the task of noun compound bracketing. Such task consists in determining which nouns within a larger noun compound form subgroups. For example (from Lauer (1995)),
woman aid worker
would be bracketed aswoman [aid worker]
, called a right-bracketing, contrarily tocopper alloy rod
, which would be bracketed as[copper alloy] rod
, called a leftbracketing.In compound bracketing, when only three words are used, [math]\displaystyle{ n_1\; n_2\; n_3 }[/math], the task becomes a binary decision between grouping [math]\displaystyle{ n_1 }[/math] and [math]\displaystyle{ n_2 }[/math] or grouping [math]\displaystyle{ n_2 }[/math] and [math]\displaystyle{ n_3 }[/math]. Two models, described in early work by Lauer (1995) and still used in recent work, are the adjacency model and the dependency model. The former compares probabilities (or more loosely strength of association) of two alternative adjacent noun compounds, that of [math]\displaystyle{ n_1 }[/math] [math]\displaystyle{ n_2 }[/math] and of [math]\displaystyle{ n_2 }[/math] [math]\displaystyle{ n_3 }[/math]. The latter compares probabilities of two alternative attachment (modifying) noun relations, that of [math]\displaystyle{ n_1 }[/math] [math]\displaystyle{ n_3 }[/math] and of [math]\displaystyle{ n_2 }[/math] [math]\displaystyle{ n_3 }[/math] (...)
Noun compound bracketing, sometimes referred to as NP parsing (Pitler et al., 2010), has been studied as a task in itself (e.g. Lauer (1995), Vadas and Curran (2007a), Nakov and Hearst (2005)). It is also studied as the first step of semantic analysis of NPs (Girju et al., 2005) where not only subgroups of words are found within the compound, but semantic relations between these groups are looked at (Nastase et al., 2013).
- QUOTE: In the field of computational linguistics, large corpora have been shown to be quite good for the task of noun compound bracketing. Such task consists in determining which nouns within a larger noun compound form subgroups. For example (from Lauer (1995)),
2008
- (Vadas, 2008) ⇒ David Vadas. (2008). “Noun Phrase Bracketing Guidelines, Version 1.0." The University of Sydney, School of Information Technologies.
2007a
- (Vadas & Curran, 2007) ⇒ David Vadas, and James R. Curran. (2007). “Adding Noun Phrase Structure to the Penn Treebank.” In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007).
2007b
- (Vadas & Curran, 2007b) ⇒ David Vadas, and James R. Curran. (2007). “Large-Scale Supervised Models for Noun Phrase Bracketing .” In: Proceedings of 10th Conference of the Pacific Association for Computational Linguistics (PACLING).
- QUOTE: Noun phrase (NP) bracketing is a requirement for the syntactic and semantic analysis of NPs. In the literature, e.g. Marcus (1980, p253) and Lauer (1995), the task is generally framed as follows: given a 3 word noun phrase like those below, decide whether it is left branching (1) or right branching (2).
((crude oil) prices)
(1)(world (oil prices))
(2) NP bracketing is crucial for many Natural Language Processing (NLP) tasks. For example, question answering (QA) and anaphora resolution both require (potentially nested) candidate NPs, typically identified using a parser ...
NP bracketing is similar to chunking (Ramshaw and Marcus, 1995), as both tasks aim to identify NP structure...
A basic method for solving the simple NP bracketing task was first described in Marcus (1980). This adjacency model compares the semantic association of words 1–2 to that between words 2–3. If the former is more likely, then the compound is left branching, otherwise it is right branching ...
Lauer (1995) proposes a new variation: the dependency model. In this case, we compare the semantic association of words 1–2 to that of words 1–3. This change is motivated by the dependencies that arise from the structure of the NP. We would expect a dependency between words 2–3 whether the compound was left or right branching, so there is no reason to analyse it.
- QUOTE: Noun phrase (NP) bracketing is a requirement for the syntactic and semantic analysis of NPs. In the literature, e.g. Marcus (1980, p253) and Lauer (1995), the task is generally framed as follows: given a 3 word noun phrase like those below, decide whether it is left branching (1) or right branching (2).
2006
- (Kim & Baldwin, 2006) ⇒ Su Nam Kim, and Timothy Baldwin. (2006). “Interpreting Semantic Relations in Noun Compounds via Verb Semantics.” In: Proceedings of the COLING/ACL on Main conference poster sessions (COLING-ACL 2006). ACM DL:1273137
- ABSTRACT. We propose a novel method for automatically interpreting compound nouns based on a predefined set of semantic relations. First we map verb tokens in sentential contexts to a fixed set of seed verbs using WordNet: Similarity and Moby's Thesaurus. We then match the sentences with semantic relations based on the semantics of the seed verbs and grammatical roles of the head noun and modifier. Based on the semantics of the matched sentences, we then build a classifier using TiMBL. The performance of our final system at interpreting NCs is 52.6%.
2005a
- (Girju et al., 2005) ⇒ Roxana Girju, Dan Moldovan, Marta Tatu, and Daniel Antohe. (2005). “On the Semantics of Noun Compounds.” In: Computer Speech & Language, 19(4).
- ABSTRACT: This paper provides new insights on the semantic characteristics of two and three noun compounds. An analysis is performed using two sets of semantic classification categories: a list of 8 prepositional paraphrases previously proposed by Lauer (Designing statistical language learners: experiments on noun compounds, Ph.D. Thesis, Macquarie University, Australia) and a new set of 35 semantic relations introduced by us. We show the distribution of these semantic categories on a corpus of noun compounds and present several models for the bracketing and the semantic classification of noun compounds. The results are compared against state-of-the-art models reported in the literature.
- NOTES: supervised model.
- NOTES: bracketing in context.
- NOTES: requires WordNet senses
2005
- (Nakov & Hearst, 2005) ⇒ Preslav Nakov, and Marti Hearst. (2005). “Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing.” In: Proceedings of CoNLL-2005.
- QUOTE: An important but understudied language analysis problem is that of noun compound bracketing, which is generally viewed as a necessary step towards noun compound interpretation. Consider the following contrastive pair of noun compounds:
- (1)
liver cell antibody
- (2)
liver cell line
- (1)
- In example (1) an
antibody
targets aliver cell
, while (2) refers to acell line
which is derived from theliver
.- (1b)
[[liver cell] antibody]
(left bracketing) - (2b)
[liver [cell line]]
(right bracketing)
- (1b)
- QUOTE: An important but understudied language analysis problem is that of noun compound bracketing, which is generally viewed as a necessary step towards noun compound interpretation. Consider the following contrastive pair of noun compounds:
2005
- (Lapata & Keller, 2005) ⇒ Mirella Lapata, and Frank Keller. (2005). “Web-based Models for Natural Language Processing.” In: ACM Transactions on Speech and Language Processing (TSLP), 2(1).
- QUOTE: The first analysis task we consider is the syntactic disambiguation of compound nouns, which has received a fair amount of attention in the NLP literature (Pustejovsky et al. 1993; Resnik 1993; Lauer 1995). The task can be summarized as follows: given a three word compound $n_1$ $n_2$ $n_3$, determine the correct binary bracketing of the word sequence (see (4) for an example).
- 4a.
[[backup compiler] disk]
- 4b.
[backup [compiler disk]]
- 4a.
- QUOTE: The first analysis task we consider is the syntactic disambiguation of compound nouns, which has received a fair amount of attention in the NLP literature (Pustejovsky et al. 1993; Resnik 1993; Lauer 1995). The task can be summarized as follows: given a three word compound $n_1$ $n_2$ $n_3$, determine the correct binary bracketing of the word sequence (see (4) for an example).
1995
- (Lauer, 1995a) ⇒ Mark Lauer. (1995). “Corpus Statistics Meet the Noun Compound: Some empirical results.” In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics.
- ABSTRACT: A variety of statistical methods for noun compound analysis are implemented and compared. The results support two main conclusions. First, the use of conceptual association not only enables a broad coverage, but also improves the accuracy. Second, an analysis model based on dependency grammar is substantially more accurate than one based on deepest constituents, even though the latter is more prevalent in the literature.
1995
- (Lauer, 1995b) ⇒ Mark Lauer. (1995). “Designing Statistical Language Learners: Experiments on Noun Compounds. Ph.D. thesis, Macquarie University.
1993a
- (Pustejovsky et al., 1993) ⇒ James Pustejovsky, Sabine Bergler, and Peter G. Anick. (1993). “Lexical Semantic Techniques for Corpus Analysis.” In: Computational Linguistics, 19(2).
1993b
- (Resnik, 1993) ⇒ Philip S. Resnik. (1993). “Selection and information: A Class-based Approach to Lexical Relationships." Ph.D. thesis, University of Pennsylvania.
1993c
- (Keller & Lapata, 1993) ⇒ Frank Keller, and Mirella Lapata. (1993). “Using the Web to Obtain Frequencies for Unseen Bigrams.” In: Computational Linguistics, 29(3).
1992
- (Liberman & Sproat, 1992) ⇒ Mark Liberman, and Richard Sproat. (1992). “The Stress and Structure of Modified Noun Phrases in English.” In: I. Sag and A. Szabolcsi (eds.), Lexical Matters, CSLI Lecture Notes No. 24.
1983
- (Spark Jones, 1983) ⇒ Karen Spärck Jones. (1983). “Compound Noun Interpretation Problems.” In: Fallside, F. and Woods, W.A., editors, Computer Speech Processing. Prentice-Hall.
1980
- (Marcus, 1980) ⇒ Mitchell Marcus. (1980). “A Theory of Syntactic Recognition for Natural Language.” Cambridge, MA: MIT Press.