Semantically Enriched Wikipedia (SEW) Corpus
Jump to navigation
Jump to search
A Semantically Enriched Wikipedia (SEW) Corpus is a Word Sense Annotated Corpus that is automatically built from Wikipedia.
- Context:
- Website: http://lcl.uniroma1.it/sew/
- It is the corpus that SEW-EMBED is based on.
- It was initially built by Raganato et al. (2016).
- Example(s):
- Counter-Example(s):
- See: Annotated Text Corpus, Wikipedia, Wikipedia Corpus, Wikipedia Dataset, SemEval-2017 Task, Semantic Word Similarity Benchmark Task, Semantic Textual Similarity Benchmark Task, Semantic Similarity Modelling System, Semantic Similarity Measure, Semantic Relatedness Measure.
References
2021
- (SEW, 2021) ⇒ http://lcl.uniroma1.it/sew/ Retrieved: 2021-07-25.
- QUOTE: SEW (Semantically Enriched Wikipedia) is a sense-annotated corpus, automatically built from Wikipedia, in which the overall number of linked mentions has been more than tripled solely by exploiting the hyperlink structure of Wikipedia pages and categories, along with the wide-coverage sense inventory of BabelNet. As a result SEW constitutes both a large-scale Wikipedia-based semantic network and a sense-tagged dataset with more than 200 million annotations of over 4 million different concepts and named entities.
2017a
- (Camacho-Collados et al., 2017) ⇒ Jose Camacho-Collados, Mohammad Taher Pilehvar, Nigel Collier, and Roberto Navigli. (2017). “SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval@ACL 2017).
- QUOTE: The global ranking for this subtask was computed by averaging the results of the six datasets on which each system performed best. The global rankings are displayed in Table 9. Luminoso was the only system outperforming the baseline, achieving the best overall results. OoO achieved the second best overall performance using an extension of the Bilingual Bag-of-Words without Alignments (BilBOWA) approach of Gouws et al. (2015) on the shared Europarl corpus. The third overall system was SEW, which leveraged Wikipedia-based concept vectors (Raganato et al., 2016) and pre-trained word embeddings for learning language-independent concept embeddings.
2017b
- (Bovi & Raganato, 2017) ⇒ Claudio Delli Bovi, and Alessandro Raganato. (2017). “Sew-Embed at SemEval-2017 Task 2: Language-Independent Concept Representations from a Semantically Enriched Wikipedia.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval ACL 2017).
- QUOTE: In this paper we propose SEW-EMBED, an embedded augmentation of SEW's original representations in which sparse vectors, defined in the high-dimensional space of Wikipedia pages, are mapped to continuous vector representations via a weighted average of embedded vectors from an arbitrary, pre-specified word (or sense) representation. Regardless of the particular representation used, the resulting vectors are still defined at the concept level, and hence immediately expendable in a multilingual and cross-lingual setting.
2016
- (Ragenato et al., 2016) ⇒ Alessandro Raganato, Claudio Delli Bovi and Roberto Navigli (2016). "Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia". In: Proceedings of 25th International Joint Conference on Artificial Intelligence (IJCAI-16).
- QUOTE: Our approach for building a Semantically Enriched Wikipedia (SEW) takes as input a Wikipedia dump and outputs a sense-annotated corpus, built upon the original Wikipedia text, where mentions are annotated according to the sense inventory of BabelNet (...)