Information Extraction Algorithm
Jump to navigation
Jump to search
An Information Extraction Algorithm is a data processing algorithm that can be applied by an information extraction system (to solve an information extraction task.
- AKA: IE Algorithm, Information Extraction from Text Algorithm.
- Context:
- It can range from being an Information Extraction from Tables Algorithm to being an Information Extraction from Text Algorithm to being an Information Extraction from Images Algorithm.
- It can range from being: a Heuristic IE Algorithm, Data-Driven IE Algorithm (such as an Unsupervised IE Algorithm, Semi-Supervised IE Algorithm, Fully-Supervised IE Algorithm).
- It can be supported by:
- an Information Retrieval Algorithm.
- a Syntactic Analysis Algorithm.
- a Lexical Semantic Analysis Algorithm, such as an Entity Mention Recognition Algorithm, Entity Mention Coreference Resolution Algorithm, Entity Mention Normalization Algorithm, or Semantic Relation Mention Recognition Algorithm,
- a Semantic Relation Recognition Algorithm (e.g. Semantic Relation Mention Recognition Algorithm),
- a Duplicate Record Detection Algorithm, to identify records with redundant information.
- a Record Canonicalization Algorithm, to create a single non-redundant record.
- Example(s):
- Information Extraction from Text Algorithms, such as: Snowball, AutoSlog, TextRunner Algorithm, KnowItAll Algorithm.
- any Terminology Extraction Algorithm.
- any unified IE Algorithm(?) as proposed by (McCallum & Jensen, 2003).
- …
- Counter-Example(s):
- See: Relation Recognition from Text Algorithm.
References
2008
- (Sarawagi, 2008) ⇒ Sunita Sarawagi. (2008). “Information Extraction.” In: Foundations and Trends in Databases, 1(3). doi:10.1561/1900000003
2007
- (McCallum, 2007) ⇒ Andrew McCallum. (2007). “Information Extraction.” In: Introduction to Natural Language Processing, CMPSCI 585, Fall (2007).
2006
- (Chang et al., 2006) ⇒ C. H. Chang, M. Kayed, M. R. Girgis, and K. Shaalan. (2006). “A Survey of Web Information Extraction Systems.” In: IEEE Transactions On Knowledge and Data Engineering, 18(10).
2005
- (Agichtein, 2005) ⇒ Eugene Agichtein. (2005). “Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull., 28(4).
2003
- (McCallum & Jensen, 2003) ⇒ Andrew McCallum, and David Jensen. (2003). “A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models.” In: Proceedings of the IJCAI03 Workshop on Learning Statistical Models from Relational Data.
- 1) DM begins from a populated DB, unaware of where the data came from, or its inherent errors and uncertainties.
- 2) IE is unaware of emerging patterns and regularities in the DB.
2002
- (Laender et al., 2002) ⇒ Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, and Juliana S. Teixeira. (2002). “A Brief Survey of Web Data Extraction Tools.” In: SIGMOD Record, 31(2). doi:10.1145/565117.565137
2001
- (Yangarber, 2001) ⇒ R. Yangarber. (2001). “Scenario Customization for Information Extraction." PhD Thesis, New York University.
1999
- (Soderland, 1999) ⇒ Stephen Soderland. (1999). “Learning Information Extraction Rules for Semi-structured and Free Text.” In: Machine Learning, 44(1-3):233–272, 1999.
1997
- (Kushmerick, 1997) ⇒ Nicholas Kushmerick. (1997). “Wrapper Induction for Information Extraction." Ph.D. Thesis, Dept of Computer Science & Engineering, Univ of Washington. Technical Report UW-CSE-97-11-04
1996
- (Strzalkowski & Wang, 1996) ⇒ Tomek Strzalkowski, and Jin Wang. (1996). “A self-learning universal concept spotter.” In: Proceedings of 16th International Conference on Computational Linguistics (COLING-96), Copenhagen, August 1996.
1993
- (Riloff, 1993) ⇒ Ellen Riloff. (1993). “Automatically Constructing a Dictionary for Information Extraction Tasks.” In: Proceedings of AAAI-93.
- (Cardie, 1993) ⇒ Claire Cardie. (1993). “A Case-based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis.” In: Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93).