2011 AutomaticWrapforLarScaWebExt
Jump to navigation
Jump to search
- (Dalvi et al., 2011) ⇒ Nilesh Dalvi, Ravi Kumar, and Mohamed Soliman. (2011). “Automatic Wrappers for Large Scale Web Extraction.” In: Proceedings of the VLDB Endowment (VLDB 2011),(4:4).
Subject Headings:
Notes
Cited By
Quotes
Abstract
We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables us to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data, e.g., using dictionaries and regular expressions. By removing the site-level supervision that wrapper-based techniques require, we are able to perform information extraction at web-scale, with accuracy unattained with existing unsupervised extraction techniques. Our system is used in production at Yahoo! and powers live applications.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2011 AutomaticWrapforLarScaWebExt | Ravi Kumar Mohamed Soliman | Automatic Wrappers for Large Scale Web Extraction | http://vldb.org/pvldb/vol4/p219-dalvi.pdf |