2003 TableExtractionUsingCRFs

(Pinto et al., 2003) ⇒ David Pinto, Andrew McCallum, Xing Wei, and W. Bruce Croft. (2003). “Table Extraction Using Conditional Random Fields.” In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2003). doi:10.1145/860435.860479

Subject Headings: Structured Data, Automated Information Extraction, Conditional Random Fields.

Notes

It proposes the use of Conditional Random Fields for Information Extraction from Tables Task.
It integrates features from both content and layout
It proposes the use of six overlapping steps:
1. Locate the table
2. Identify the row positions and types
3. Identify the column positions and types
4. Segment the table into cells
5. Tag the cells as data or headers
6. Associate data cells with their corresponding headers.

Cited By

~229 http://scholar.google.com/scholar?cites=14554084008477956453

Quotes

Abstract

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content present difficulties for traditional language modeling techniques, however. This paper presents the use of conditional random fields (CRFs) for table extraction, and compares them with hidden Markov models (HMMs). Unlike HMMs, CRFs support the use of many rich and overlapping layout and language features, and as a result, they perform significantly better. We show experimental results on plain-text government statistical reports in which tables are located with 92% F1, and their constituent lines are classified into 12 table-related categories with 94% accuracy. We also discuss future work on undirected graphical models for segmenting columns, finding cells, and classifying them as data cells or label cells.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 TableExtractionUsingCRFs				Table Extraction Using Conditional Random Fields			http://ciir.cs.umass.edu/pubfiles/ir-276.pdf	10.1145/860435.860479

2003 TableExtractionUsingCRFs

Notes

Cited By

Quotes

Abstract

Navigation menu

Search