TextRunner System
(Redirected from TEXTRUNNER)
Jump to navigation
Jump to search
A TextRunner System is an Information Extraction System developed at the University of Washington that can solve a Web-based Open Information Extraction Task.
- Context:
- It can solve a TextRunner Task by implementing TextRunner Algorithm.
- It is based on the Hypothesis that: “We can automatically discover high-quality instances of a large, diverse set of relationships from unstructured Web text using an amount of time and effort that is independent of the number of target relations." (Banko, 2009).
- System's Architecture:
- Example(s):
- Counter-Example(s):
- See: TextRunner Algorithm, Open Information Extraction Task.
References
2009
- (Banko, 2009) ⇒ Michele Banko. (2009). “Open Information Extraction for the Web". PhD Thesis, University of Washington.
2007
- (Banko et al., 2007) ⇒ Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. (2007). “Open Information Extraction from the Web.” In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-2007).
- QUOTE: TEXTRUNNER's sole input is a corpus and its output is a set of extractions that are efficiently indexed to support exploration via user queries. TEXTRUNNER consists of three key modules:
- 1. Self-Supervised Learner: Given a small corpus sample as input, the Learner outputs a classifier that labels candidate extractions as "trustworthy" or not. The Learner requires no hand-tagged data.
- 2. Single-Pass Extractor: The Extractor makes a single pass over the entire corpus to extract tuples for all possible relations. The Extractor does not utilize a parser. The Extractor generates one or more candidate tuples from each sentence, sends each candidate to the classifier, and retains the ones labeled as trustworthy.
- 3. Redundancy-Based Assessor: The Assessor assigns a probability to each retained tuple based on a probabilistic model of redundancy in text introduced in (Downey et al., 2005).
2005
- (Downey et al., 2005) ⇒ D. Downey, M. Broadhead, and O. Etzioni. "Locating Complex Named Entities in Web Text". In: Proc. of IJCAI, 2007.