KnowItAll System
Jump to navigation
Jump to search
The KnowItAll System is a Web Information Extraction System.
- AKA: KnowItAll, KnowItAll Web IE System.
- Context:
- It implements the KnowItAll Algorithm, a Semi-Supervised Named Entity Recognition Algorithm.
- It is a Knowledge Base that populates its facts, concepts, and relationships by extracting the information from the web.
- It is intended to be scalability and have high throughput in order to access the information on the Web.
- It avoids the use of deep parsing techniques.
- It is divided into four components:
- Extractor,
- Search Engine Interface,
- Assessor,
- Database.
- It requires a set of generic Extraction Pattern seeds (not Instance seeds).
- It includes a predefined set of topics (ontology) such as:
- cities, states, countries, actors and films.
- It is developed at the University of Washington.
- It is based on the KnowItAll Hypothesis that: "Extractions drawn more frequently from distinct sentences in a corpus are more likely to be correct.
- Counter-Example(s):
- See: Open Information Extraction. ReVerb System.
References
2008
- (Downey, 2008) ⇒ Doug Downey. (2008). “Redundancy in Web-scale Information Extraction: Probabilistic Model and Experimental Results." PhD Thesis, University of Washington.
2005
- (Etzioni et al., 2005) ⇒ Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. (2005). “Unsupervised Named-Entity Extraction from the Web: An Experimental Study.” In: Artificial Intelligence, 165(1).
- (Downey et al., 2005) ⇒ Doug Downey, Oren Etzioni, and Stephen Soderland. (2005). “A Probabilistic Model of Redundancy in Information Extraction." Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-2005).
2004
- (Etzioni et al., 2004) ⇒ Oren Etzioni, Michael J. Cafarella, Doug Downey, S. Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. (2004). “Web-scale Information Extraction in KnowItAll: (preliminary results).” In: Proceedings of the 13th International World Wide Web Conference (WWW 2004).