Snowball System
(Redirected from Snowball system)
Jump to navigation
Jump to search
A Snowball System is a Semi-Supervised Relation Recognition System for generating patterns and extracting tuples from plain-text documents.
- Context:
- It implements a Snowball Algorithm to solve a Snowball Task.
- It uses a Five-Tuple Lexically-based Relation Recognition Classifier Relation Mention Recognition Model.
- It uses Clustering to generalize Patterns.
- It uses Bootstrapping to benefit from Unlabeled Data.
- It uses Pattern Precision and a minimum precision Threshold to stop Pattern Generation.
- It is optimized for One-to-One Relations and One-to-Many Relations.
- It can be found at http://snowball.cs.columbia.edu/
- Example(s):
- Counter-Example(s):
- See: Relation Extraction System, Pattern Recognition System, Frequent Pattern, Word Sense Disambiguation, Self-Trained Binary Relation Classifier.
References
2007
- (Bach & Badaskar, 2007) ⇒ Nguyen Bach, and Sameer Badaskar (2007). "A Review Of Relation Extraction". Literature review for Language and Statistics II, 2.
- QUOTE: More recently, semi-supervised and boostrapping approaches have gained special attention. In section 3, we will review DIPRE (Brin, 1998), and Snowball (Agichtein & Gravano, 2000) systems which only require a small set of tagged seed instances or a few hand-crafted extraction patterns per relation to launch the training process. They all use a semi-supervised approach similar to the Yarowsky’s algorithm in word sense disambiguation (Yarowsky, 1995). Also, KnowItAll (Etzioni et al., 2005) and TextRunner (Banko et al., 2007) propose large scale relation extraction systems which have a self-trained binary relation classifier
2006
- (Xia, 2006) ⇒ L. Xia. (2006). “Adaptive Relationship Extraction by Machine Learning." Masters Thesis, Sheffield University.
2003
- (Yu & Agichtein, 2003) ⇒ H. Yu and Eugene Agichtein. (2003). “Extracting Synonymous Gene and Protein Terms from Biological Literature.” In: Proceedings of the 11th International Conference on Intelligent Systems for Molecular Biology (ISMB-2003). (paper.pdf)
2000
- (Agichtein & Gravano, 2000) ⇒ Eugene Agichtein and Luis Gravano. (2000). “Snowball: Extracting Relations from Large Plain-Text Collections.” In: Proceedings of the 5th ACM International Conference on Digital Libraries (DL-2000).
- QUOTE: In this section we present the Snowball system (Figure 2), which develops key components of the basic DIPRE method. More specifically, Snowball presents a novel technique to generate patterns and extract tuples from text documents (Sections 2.1 and 2.2). Also, Snowball introduces a strategy for evaluating the quality of the patterns and the tuples that are generated in each iteration of the extraction process (Section 2.3). Only those tuples and patterns that are regarded as being “sufficiently reliable” will be kept by Snowball for the following iterations of the system (Section 2.3). These new strategies for generation and filtering of patterns and tuples improve the quality of the extracted tables significantly, as the experimental evaluation in Section 5 will show.
- QUOTE: In this section we present the Snowball system (Figure 2), which develops key components of the basic DIPRE method. More specifically, Snowball presents a novel technique to generate patterns and extract tuples from text documents (Sections 2.1 and 2.2). Also, Snowball introduces a strategy for evaluating the quality of the patterns and the tuples that are generated in each iteration of the extraction process (Section 2.3). Only those tuples and patterns that are regarded as being “sufficiently reliable” will be kept by Snowball for the following iterations of the system (Section 2.3). These new strategies for generation and filtering of patterns and tuples improve the quality of the extracted tables significantly, as the experimental evaluation in Section 5 will show.