NP Chunking Task

An NP Chunking Task is a phrase chunking task that is restricted to the identification of all base noun phrases.

AKA: [ Noun Phrase Chunking, NP Chunking, NPCT, Base NP Chunking Task, BaseNP Chunking.
Context:
- Performance:
  - http://www.cnts.ua.ac.be/conll2000/chunking/conlleval.txt
- It can range from being a Supervised NP Chunking Task to being an Unsupervised NP Chunking Task.
- It can support:
  - a Noun Compound Bracketing Task.
  - a Noun Phrase Parsing Task.
- It can be solved by a Noun Phrase Chunking System (that implements a BaseNP Chunking Algorithm/Noun Compound Bracketing Algorithm.
- It can be supported by a Morphological Parsing Task.
Example(s):
- [math]\displaystyle{ f }[/math]("He reckons the current account deficit will narrow to only $ 1.8 billion in September.") ⇒ ([He], reckons, [the current account deficit], will narrow to, [only # 1.8 billion], in, [September], .).
- …
Counter-Example(s):
- a Chinking Task,
- a Text Tokenization Task,
- a Noun Compound Bracketing Task,
- a Sentiment Analysis Task,
- a Sentence Segmantation Task,
- a Part-of-Speech Tagging Task.
- a Named Entity Recognition Task.
See: Term Recognition Task, Information Extraction System.

References

(Vadas, 2008) ⇒ David Vadas. (2008). “Noun Phrase Bracketing Guidelines, Version 1.0." The University of Sydney, School of Information Technologies.

(Vadas & Curran, 2007) ⇒ David Vadas, and James R. Curran. (2007). “Adding Noun Phrase Structure to the Penn Treebank.” In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007).

(Rus & Ravi, 2006) ⇒ Vasile Rus, and Sireesha Ravi. (2006). “Towards a Base Noun Phrase Parser using Web Counts.” In: Journal of Computing Sciences in Colleges, 21(5).
- ABSTRACT: Syntactic parsing is an important processing step for various language processing applications including Information Extraction, Question Answering, and Machine Translation. Parsing base Noun Phrases is one particular parsing case that has not been addressed so far in the literature. In this paper we present a semester-long research project that aimed at investigating the base Noun Phrase parsing problem and efficiently implementing a base Noun Phrase parser based on a statistical model and web counts. Using web counts, instead of manually annotated data, to induce the parameters of the statistical model makes our method unsupervised. Although, this was a project for a graduate independent study class we plan to use it as a team project for an undergraduate class.

(Sha & Pereira, 2003a) ⇒ Fei Sha, and Fernando Pereira. (2003). “Shallow Parsing with Conditional Random Fields.” In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003). doi:10.3115/1073445.1073473
- QUOTE: Figure 1 shows the base NPs in an example sentence. Following Ramshaw and Marcus (1995), the input to the NP chunker consists of the words in a sentence annotated automatically with part-of-speech (POS) tags. The chunker's task is to label each word with a label indicating whether the word is outside a chunk (O), starts a chunk (B), or continues a chunk (I). For example, the tokens in first line of Figure 1 would be labeled BIIBIIOBOBIIO.
  [Rockwell International Corp.] ['s Tulsa unit] said it signed a tentative agreement extending [its contract] with [Boeing Co.] to provide structural parts for [Boeing] 's [747 jetliners] .

http://www.ai.mit.edu/projects/jmlr/papers/volume2/tks02a/html/node14.html
- Noun phrase parsing is similar to noun phrase chunking but this time the goal is to find noun phrases at all levels. This means that just like in the clause identification task we need to be able to recognize embedded phrases. The following example sentence will illustrate this:
  - In (early trading ) in (Hong Kong ) (Monday ), (gold ) was quoted at (( $ 366.50 ) (an ounce ) ) .
- This sentence contains seven noun phrases of which the one containing the final four words of the sentence consists of two embedded noun phrases. If we use the same approach as for clause identification, retrieving brackets of all phrase levels in one step and balancing these, we will probably not detect this noun phrase because it starts and ends together with other noun phrases. Therefore we will use a different approach here.
- We will recover noun phrases at different levels by performing repeated chunking [Tjong Kim Sang(2000a)].

(Kudo & Matsumoto, 2001) ⇒ Taku Kudo, and Yuji Matsumoto. (2001). “Chunking with Support Vector Machines.” In: Proceedings of NAACL 2001.
- Had achieved the highest performance by 2002 on the RM95 NP Chunking Benchmark Task.
  - P=94.15%, R=94.29%, F=94.22

CoNLL-2000 Shared Task: Chunking Webpage.
- http://www.cnts.ua.ac.be/conll2000/chunking/
(Tjong Kim Sang & Buchholz, 2000) ⇒ Erik Tjong Kim Sang, and Sabine Buchholz. (2000). “Introduction to the CoNLL-2000 Shared Task: Chunking.” In: Proceedings of CoNLL-2000.
(Tjong et al., 2000) ⇒ Erik F. Tjong Kim Sang, Walter Daelemans, Hervé Déjean, Rob Koeling, Yuval Krymolowski, Vasin Punyakanok, and Dan Roth. (2000). “Applying System Combination to Base Noun Phrase Identification.” In: Proceedings of the 18th conference on Computational Linguistics.
- We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this data set.

(Evans & Zhai, 1996) ⇒ David A. Evans, and Chengxiang Zhai. (1996). “Noun-Phrase Analysis in Unrestricted Text for Information Retrieval.” In: Proceedings of the 34th annual meeting on Association for Computational Linguistics.

(Ramshaw & Marcus, 1995) ⇒ Lance Ramshaw, and Mitch Marcus. (1995). “Text Chunking Using Transformation-based Learning.” In: Proceedings of the Third ACL Workshop on Very Large Corpora (WVLC 1995).
- The goal of the "baseNP" chunks was to identify essentially the initial portions of nonrecursive noun phrases up to the head, including determiners but not including postmodifying prepositional phrases or clauses.
- [less time], the [other hand], [binary addressing and instruction formats], a [purely binary computer].

(Voutilainen, 1993) ⇒ Atro Voutilainen. (1993). “NPTool, a detector of English Noun Phrases.” In: Proceedings of the ACL-1993 Workshop on Very Large Corpora.

(Abney, 1991) ⇒ Steven P. Abney. (1991). “Parsing by chunks." In Berwick, Abney, and Tenny, editors, Principle-based Parsing. Kluwer Academic Publishers.