2008 EntityCategorizationoverLargeDo

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Extracting entities (such as people, movies) from documents and identifying the categories (such as painter, writer) they belong to enable structured querying and data analysis over unstructured document collections. In this paper, we focus on the problem of categorizing extracted entities. Most prior approaches developed for this task only analyzed the local document context within which entities occur. In this paper, we significantly improve the accuracy of entity categorization by (i) considering an entity's context across multiple documents containing it, and (ii) exploiting existing large lists of related entities (e.g., lists of actors, directors, books). These approaches introduce computational challenges because (a) the context of entities has to be aggregated across several documents and (b) the lists of related entities may be very large. We develop techniques to address these challenges. We present a thorough experimental study on real data sets that demonstrates the increase in accuracy and the scalability of our approaches.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 EntityCategorizationoverLargeDoVenkatesh Ganti
Rares Vernica
Arnd C. König
Entity Categorization over Large Document CollectionsKDD-2008 Proceedings10.1145/1401890.14019272008