2008 EntityCategorizationoverLargeDo
- (Ganti et al., 2008) ⇒ Venkatesh Ganti, Arnd C. König, and Rares Vernica. (2008). “Entity Categorization over Large Document Collections.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401927
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%22Entity+categorization+over+large+document+collections%22+2008
- http://portal.acm.org/citation.cfm?doid=1401890.1401927&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Extracting entities (such as people, movies) from documents and identifying the categories (such as painter, writer) they belong to enable structured querying and data analysis over unstructured document collections. In this paper, we focus on the problem of categorizing extracted entities. Most prior approaches developed for this task only analyzed the local document context within which entities occur. In this paper, we significantly improve the accuracy of entity categorization by (i) considering an entity's context across multiple documents containing it, and (ii) exploiting existing large lists of related entities (e.g., lists of actors, directors, books). These approaches introduce computational challenges because (a) the context of entities has to be aggregated across several documents and (b) the lists of related entities may be very large. We develop techniques to address these challenges. We present a thorough experimental study on real data sets that demonstrates the increase in accuracy and the scalability of our approaches.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 EntityCategorizationoverLargeDo | Venkatesh Ganti Rares Vernica Arnd C. König | Entity Categorization over Large Document Collections | KDD-2008 Proceedings | 10.1145/1401890.1401927 | 2008 |