Phrase Chunking System

A Phrase Chunking System is a Automatic Text Chunking System that is also a phrase classification system and that implements a phrase chunking algorithm to solve a phrase chunking task.

Example(s):
Counter-Example(s):
See: NLP System, Prepositional Phrase, Entropy Guided Transformation Learning.

References

2019

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Phrase_chunking Retrieved:2019-6-9.
- Phrase chunking is a natural language process that separates and segments a sentence into its subconstituents, such as noun, verb, and prepositional phrases.

2012

https://cogcomp.cs.illinois.edu/page/software_view/Chunker
- QUOTE: A chunker or ("shallow parser"), is a program that partitions plain text into sequences of semantically related words. The type of partition is also computed.

2009

(Diab, 2009) ⇒ Mona T. Diab. (2009). “Second Generation AMIRA Tools for Arabic Processing: Fast and Robust Tokenization, POS Tagging, and Base Phrase Chunking.” In: Proceedings of 2nd International Conference on Arabic Language Resources and Tools.
- QUOTE: Base phrase chunking is the process by which a sequence of adjacent words are grouped together to form syntactic phrases such as NPs and VPs. An English example of base phrases would be [I] _NP [would eat]_VP [red luscious apples]_NP [on Sundays]_PP . BPC is the first step towards shallow syntactic parsing. Many high end applications such as information extraction and semantic role labeling in English have been proven to benefit tremendously from BPC at a relatively low loss in performance when compared to the use of deep syntactic parsing.
  In the current version of AMIRA, the BPC module produces the longest possible base phrases with not much internal recursion. The internal recursion is done as a deterministic post process. We have modified the BPC rules to be more appropriate for the Arabic language (...)

2008

(Milidiu et al., 2008) ⇒ Ruy Luiz Milidiu, Cicero Nogueira dos Santos, and Julio C. Duarte. (2008). “Phrase Chunking Using Entropy Guided Transformation Learning.” In: Proceedings of ACL-08: HLT (2008).
- QUOTE: Phrase Chunking is a Natural Language Processing (NLP) task that consists in dividing a text into syntactically correlated parts of words. Theses phrases are non-overlapping, i.e., a word can only be a member of one chunk (Sang and Buchholz, 2000). It provides a key feature that helps on more elaborated NLP tasks such as parsing and information extraction(...)
  In this work, we apply Entropy Guided Transformation Learning (ETL) for phrase chunking. ETL is a new machine learning strategy that combines the advantages of Decision Trees (DT) and TBL (dos Santos and Milidiu, 2007a). The ETL key idea is to ´ use decision tree induction to obtain feature combinations (templates) and then use the TBL algorithm to generate transformation rules. ETL produces transformation rules that are more effective than decision trees and also eliminates the need of a problem domain expert to build TBL templates.

2006

(Wu et al., 2006) ⇒ Yu-Chieh Wu, Chia-Hui Chang, and Yue-Shi Lee. (2006). “A General and Multi-lingual Phrase Chunking Model based on Masking Method.” In: Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. doi:10.1007/11671299_17
- QUOTE: In this paper, we present a novel chunking method to improve the chunking accuracy. The mask method we propose is designed to solve the “unknown word problem” as many chunking errors occur due to unknown words. Imagine the cases when unknown words occur in the testing data, all lexical-related features, for example, unigram, can not be properly represented, thus the chunk type has to be determined by other non-lexical features. To remedy this, we propose a mask method to collect unknown word examples from the original training data. These examples are derived from mapping variant incomplete lexical-related features. By including these instances, the chunker can handle testing data, which contains unknown words. In addition, we also combine a richer feature set to enhance the performance. Based on the two constituents, the mask method and richer feature sets, higher performance is obtained. In the two main chunking tasks, our method outperforms the other famous systems. Besides, this model is portable to other languages. In the Chinese base-chunking task, our chunking system achieves 92.19 in F rate. In terms of time efficiency, our model is satisfactory, and thus able to handle the real-time processes, for example, information retrieval and real-time web-page translation. In a 500K words document, the complete chunking time is about 50 seconds.

2002

(Zhang & Zhou, 2002) ⇒ Yuqi Zhang, and Qiang Zhou. (2002). “Chinese Base-phrases Chunking.” In: Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18. doi:10.3115/1118824.1118842.