Information Extraction Algorithm

AKA: IE Algorithm, Information Extraction from Text Algorithm.
Context:
- It can range from being an Information Extraction from Tables Algorithm to being an Information Extraction from Text Algorithm to being an Information Extraction from Images Algorithm.
- It can range from being: a Heuristic IE Algorithm, Data-Driven IE Algorithm (such as an Unsupervised IE Algorithm, Semi-Supervised IE Algorithm, Fully-Supervised IE Algorithm).
- It can be supported by:
  - an Information Retrieval Algorithm.
  - a Syntactic Analysis Algorithm.
  - a Lexical Semantic Analysis Algorithm, such as an Entity Mention Recognition Algorithm, Entity Mention Coreference Resolution Algorithm, Entity Mention Normalization Algorithm, or Semantic Relation Mention Recognition Algorithm,
  - a Semantic Relation Recognition Algorithm (e.g. Semantic Relation Mention Recognition Algorithm),
  - a Duplicate Record Detection Algorithm, to identify records with redundant information.
  - a Record Canonicalization Algorithm, to create a single non-redundant record.
Example(s):
- Information Extraction from Text Algorithms, such as: Snowball, AutoSlog, TextRunner Algorithm, KnowItAll Algorithm.
- any Terminology Extraction Algorithm.
- any unified IE Algorithm(?) as proposed by (McCallum & Jensen, 2003).
- …
Counter-Example(s):
See: Relation Recognition from Text Algorithm.

References

(McCallum, 2007) ⇒ Andrew McCallum. (2007). “Information Extraction.” In: Introduction to Natural Language Processing, CMPSCI 585, Fall (2007).

(Chang et al., 2006) ⇒ C. H. Chang, M. Kayed, M. R. Girgis, and K. Shaalan. (2006). “A Survey of Web Information Extraction Systems.” In: IEEE Transactions On Knowledge and Data Engineering, 18(10).

(McCallum & Jensen, 2003) ⇒ Andrew McCallum, and David Jensen. (2003). “A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models.” In: Proceedings of the IJCAI03 Workshop on Learning Statistical Models from Relational Data.
- 1) DM begins from a populated DB, unaware of where the data came from, or its inherent errors and uncertainties.
- 2) IE is unaware of emerging patterns and regularities in the DB.

(Laender et al., 2002) ⇒ Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, and Juliana S. Teixeira. (2002). “A Brief Survey of Web Data Extraction Tools.” In: SIGMOD Record, 31(2). doi:10.1145/565117.565137

(Soderland, 1999) ⇒ Stephen Soderland. (1999). “Learning Information Extraction Rules for Semi-structured and Free Text.” In: Machine Learning, 44(1-3):233–272, 1999.

(Kushmerick, 1997) ⇒ Nicholas Kushmerick. (1997). “Wrapper Induction for Information Extraction." Ph.D. Thesis, Dept of Computer Science & Engineering, Univ of Washington. Technical Report UW-CSE-97-11-04

(Strzalkowski & Wang, 1996) ⇒ Tomek Strzalkowski, and Jin Wang. (1996). “A self-learning universal concept spotter.” In: Proceedings of 16th International Conference on Computational Linguistics (COLING-96), Copenhagen, August 1996.