2008 DecodingWikipediaCatsForKnowAcq

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Wikipedia Category Network

Notes

Quotes

Abstract

  • This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. We decode the names and refer back to the network to induce relations between concepts in Wikipedia represented through pages or categories. The category structure allows us to propagate a relation detected between constituents of a category name to numerous concept links. The results of the process are evaluated against ResearchCyc and a subset also by human judges. The results support the idea that Wikipedia category names are a rich source of useful and accurate knowledge.

Conclusion

  • We have explored category names and category structure in Wikipedia as sources of relations between concepts. The analysis and experiments performed show a wealth of information that can be induced from these elements: instances of relations, relation types and class attributes. We will refine this work by testing other methods for determining the semantic relation between concept pairs, and expand the category name analysis to even finer category name constituents. In both statistical and semantic analysis tasks it is useful to be able to generalize a concept - to address the data sparseness issue, or to be able to cluster similar entities. Even when a taxonomy is available, finding the most appropriate level of generalization is not easy. We plan to explore in future work people's preferences for generalizations, as captured in the "by" categories.
  • This research has started from the observation that Wikipedia categories have complex names, which encode some form of human knowledge of organization and classification. Splitting category names into smaller strings, we retrieve concepts that are of interest in language processing, and salient relations between them. Our goal is to transform Wikipedia's category network into a network of concepts linked by a variety of semantic relations, ready to provide knowledge to higher end NLP applications such as coreference resolution, summarization and question answering.
  • Resource. The triples extracted with this method is available on our web page (http://www.eml-research.de/nlp/download/wikirelations.php),


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 DecodingWikipediaCatsForKnowAcqMichael Strube
Vivi Nastase
Decoding Wikipedia Categories for Knowledge Acquisitionhttp://www.eml-research.de/nlp/papers/nastase08b.pdf