1999 LearningDictsForIEbyBootstrapping
Jump to navigation
Jump to search
- (Riloff and Jones, 1999) ⇒ Ellen Riloff, Rosie Jones. (1999). “Learning Dictionaries for Information Extraction by Multi-level Bootstrapping.” In: Proceedings of AAAI Conference (AAAI 1999).
Subject Headings: Semi-Supervised Named Entity Recognition Algorithm.
Notes
- Presentation at http://www.cs.cmu.edu/~wcohen/10-707/ppts/Yi-Chia.ppt
- Presents an unsupervised learning algorithm for named entity (and nominal) detection & classification
- Uses a wordlist for each category that contains Words and Phrases that belong to a given category.
- Uses a list of Lexical Patterns to specify contexts typically associated with a given category. E.g. “operates in x."
- AutoSlog is used to generate the patterns.
- Commences with a "seed list" of words that are known to be in-category.
- The algorithm then uses bootstrapping to improve their pattern identification.
- Finds the pattern that best matches the current wordlist.
- "Best match" is determined by a combination of precision and coverage.
- Update the pattern list.
- Update the worlist of the words and phrases extracted by the new pattern.
Cited By
~443 http://scholar.google.com/scholar?cites=11190526739252407918
2005
- (Etzioni et al., 2005) ⇒ Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. (2005). “Unsupervised Named-Entity Extraction from the Web: An Experimental Study.” In: Artificial Intelligence, 165(1).
2000
- (Agichtein & Gravano, 2000) ⇒ Eugene Agichtein, and Luis Gravano. (2000). “Snowball: Extracting Relations from Large Plain-Text Collections.” In: Proceedings of the 5th ACM International Conference on Digital Libraries (DL 2000).
Quotes
Abstract
- Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multi-level bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of seed words for a category. We use a mutual bootstrapping technique to alternately select the best extraction pattern for the category and bootstrap its extractions into the semantic lexicon, which is the basis for selecting the next extraction pattern. To make this approach more robust, we add a second level of bootstrapping (metabootstrapping) that retains only the most reliable lexicon entries produced by mutual bootstrapping and then restarts the process. We evaluated this multilevel bootstrapping technique on a collection of corporate web pages and a corpus of terrorism news articles. The algorithm produced high-quality dictionaries for several semantic categories.
References
- Wordnet: An On-line Lexical Database (context) - Miller - 1990
- Combining Labeled and Unlabeled Data with Co-Training - Blum, Mitchell - 1998
- Learning to Extract Symbolic Knowledge from the World Wide W.. - Craven, DiPasquo et al. - 1998
- Learning Information Extraction Rules for Semi-structured an.. - Soderland - 1999
- CRYSTAL: Inducing a conceptual dictionary - Soderland, Fisher et al. - 1995
- Automatically Constructing a Dictionary for Information Extr.. - Riloff - 1993
- Automatically Generating Extraction Patterns from Untagged T.. - Riloff
- Learning information extraction patterns from examples - Huffman - 1996
- An Empirical Study of Automated Dictionary Construction for .. - Riloff
- Relational Learning Techniques for Natural Language Informat.. - Califf - 1998
- Toward General-Purpose Learning for Information Extraction - Freitag - 1998
- Noun-phrase Cooccurrence Statistics for Semi-automatic Seman.. - Roark, Charniak - 1998
- A Corpus-based Approach for Building Semantic Lexicons - Riloff, Shepherd - 1997
- Acquisition of Semantic Patterns for Information Extraction .. (context) - Berlin, Kim et al. - 1993
- MUC-4 Proceedings (context) - of - 1992,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
1999 LearningDictsForIEbyBootstrapping | Ellen Riloff Rosie Jones | Learning Dictionaries for Information Extraction by Multi-level Bootstrapping | Proceedings of AAAI Conference | http://www.cs.utah.edu/~riloff/pdfs/aaai99.pdf | 1999 |