2002 AMachineLearningBasedApproachForTableDetection

(Wang & Hu, 2002) ⇒ Yalin Wang, and Jianying Hu. (2002). “A Machine Learning Based Approach for Table Detection on the Web.” In: Proceedings of the Eleventh International World Wide Web Conference (WWW 2002). doi:10.1145/511446.511478

Subject Headings: Automated Information Extraction, Information Extraction from Tables Task.

Notes

It proposes a Machine Learning Algorithm to Table Detection Task.
It proposes and tests the use of content type features and word group features
It trains a decision tree and an SVM Classifier and uses it to classify.
It uses some unsupervised learning.

Cited By

~91 http://scholar.google.com/scholar?q=%22A+Machine+Learning+Based+Approach+for+Table+Detection+on+the+Web%22+2002

Quotes

Abstract

Table is a commonly used presentation scheme, especially for describing relational information. However, table understanding remains an open problem. In this paper, we consider the problem of table detection in web documents. Its potential applications include web mining, knowledge management, and web content summarization and delivery to narrow-bandwidth devices. We describe a machine learning based approach to classify each given table entity as either genuine or non-genuine. Various features reflecting the layout as well as content characteristics of tables are studied. In order to facilitate the training and evaluation of our table classifier, we designed a novel web document table ground truthing protocol and used it to build a large table ground truth database. The database consists of 1,393 HTML files collected from hundreds of different web sites and contains 11,477 leaf TABLE elements, out of which 1,740 are genuine tables. Experiments were conducted using the cross validation method and an F-measure of 95.89% was achieved.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2002 AMachineLearningBasedApproachForTableDetection	Yalin Wang Jianying Hu			A Machine Learning Based Approach for Table Detection on the Web			http://www2002.org/CDROM/refereed/199/	10.1145/511446.511478

2002 AMachineLearningBasedApproachForTableDetection

Notes

Cited By

Quotes

Abstract

Navigation menu

Search