2014 LaSEWebAutomatingSearchStrategi
- (Polozov & Gulwani, 2014) ⇒ Oleksandr Polozov, and Sumit Gulwani. (2014). “LaSEWeb: Automating Search Strategies over Semi-structured Web Data.” In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2014) Journal. ISBN:978-1-4503-2956-9 doi:10.1145/2623330.2623761
Subject Headings:
Notes
Cited By
Quotes
Author Keywords
- Clustering; domain-specific languages; information filtering; query languages; question answering.; [[search process; semi-structured data; structure extraction; web programming
Abstract
We show how to programmatically model processes that humans use when extracting answers to queries (e.g., " Who invented typewriter? "," List of Washington national parks ") from semi-structured Web pages returned by a search engine. This modeling enables various applications including automating repetitive search tasks, and helping search engine developers design micro-segments of factoid questions.
We describe the design and implementation of a domain-specific language that enables extracting data from a webpage based on its structure, visual layout, and linguistic patterns. We also describe an algorithm to rank multiple answers extracted from multiple webpages.
On 100,000 + queries (across 7 micro-segments) obtained from Bing logs, our system LaSEWeb answered queries with an average recall of 71%. Also, the desired answer (s) were present in top-3 suggestions for 95% + cases.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2014 LaSEWebAutomatingSearchStrategi | Oleksandr Polozov Sumit Gulwani | LaSEWeb: Automating Search Strategies over Semi-structured Web Data | 10.1145/2623330.2623761 | 2014 |