2011 AutomaticWrapforLarScaWebExt

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables us to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data, e.g., using dictionaries and regular expressions. By removing the site-level supervision that wrapper-based techniques require, we are able to perform information extraction at web-scale, with accuracy unattained with existing unsupervised extraction techniques. Our system is used in production at Yahoo! and powers live applications.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 AutomaticWrapforLarScaWebExtRavi Kumar
Mohamed Soliman
Automatic Wrappers for Large Scale Web Extractionhttp://vldb.org/pvldb/vol4/p219-dalvi.pdf