2003 MiningDataRecordsInWebPages
- (Liu et al., 2003) ⇒ Bing Liu, Robert L. Grossman, and Yanhong Zhai. (2003). “Mining Data Records in Web Pages.” In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003). doi:10.1145/956750.956826
Subject Headings: Information Extraction from Tables Task.
Notes
- It proposes an algorithm Mining Data Records in Web Pages Algorithm (MDR) for the Information Extraction from Tables Task.
- It is based on the observation that
- Data Records that have descriptions for a set of similar objects are usually showing at a specific region of a page and normally they are formatted in the form of similar HTML tags.
- It can detect a group of data records placed in a specific region.
- It can work effectively for Contiguous Data Records and Non-Contiguous Data Records.
Cited By
Quotes
Abstract
A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the essential information of their host pages, e.g., lists of products or services. It is useful to mine such data records in order to extract information from them to provide value-added services. Existing automatic techniques are not satisfactory because of their poor accuracies. In this paper, we propose a more effective technique to perform the task. The technique is based on two observations about data records on the Web and a string matching algorithm. The proposed technique is able to mine both contiguous and non-contiguous data records. Our experimental results show that the proposed technique outperforms existing techniques substantially.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2003 MiningDataRecordsInWebPages | Bing Liu Robert L. Grossman Yanhong Zhai | Mining Data Records in Web Pages | http://grossmanreport.com/dl/proc-075.pdf | 10.1145/956750.956826 |