NASARI (Novel Approach to a Semantically-Aware Representation of Items) System
Jump to navigation
Jump to search
A NASARI (Novel Approach to a Semantically-Aware Representation of Items) System is a Multilingual Semantic Vector Representation System for BabelNet synsets and Wikipedia pages.
- Context:
- Website: http://lcl.uniroma1.it/nasari/
- It was initially developed by Camacho-Collados et al. (2015 - 2016).
- It was a baseline system for SemEval-2017 Task 2, participating system in SemEval-2014 Task on Cross-Level Semantic Similarity and other SemEval tasks (SemEval-2007, SemEval-2013, SemEval-15).
- It has been evaluated by NASARI Benchmark Task.
- It can range from being a Word-based NASARI System to being a Synset-based NASARI System.
- It can be sued to solve the following NLP tasks:
- Example(s):
- Counter-Example(s):
- See: Multilingual and Cross-lingual Semantic Word Similarity System, Semantic Similarity System, SemEval-2017, Word Embedding System, Semantic Word Similarity Benchmark Task, Word Sense Disambiguation System, Sense Clustering System, Domain Labeling System.
References
2021
- (Nasari, 2021) ⇒ http://lcl.uniroma1.it/nasari/ Retrieved: 05-06-2021.
- QUOTE: NASARI semantic vector representations for BabelNet synsets[1] and Wikipedia pages in several languages. Currently available three vector types: lexical, unified and embedded. NASARI provides a large coverage of concepts and named entities and has been proved to be useful for many Natural Language Processing tasks such as multilingual semantic similarity, sense clustering or word sense disambiguation, tasks on which NASARI has contributed to achieve state-of-the-art results on standard benchmarks.
- ↑ Please note that BabelNet covers WordNet and Wikipedia among other resources, enabling our vectors to be applicable for representations of concepts and named entities in each of these resources.
2017
- (Camacho-Collados et al., 2017) ⇒ Jose Camacho-Collados, aMohammad Taher Pilehvar, Nigel Collier, and Roberto Navigli. (2017). “SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval ACL 2017).
- QUOTE: As the baseline system we included the results of the concept and entity embeddings of NASARI (Camacho-Collados et al., 2016). These embeddings were obtained by exploiting knowledge from Wikipedia and WordNet coupled with general domain corpus-based Word2Vec embeddings (Mikolov et al., 2013). We performed the evaluation with the 300-dimensional English embedded vectors (version 3.0)[1] and used them for all languages. For the comparison within and across languages NASARI relies on the lexicalizations provided by BabelNet (Navigli and Ponzetto, 2012) for the concepts and entities in each language.
2016
- (Camacho-Collados et al., 2016) ⇒ Jose Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. (2016). “Nasari: Integrating Explicit Knowledge And Corpus Statistics For A Multilingual Representation Of Concepts And Entities.” In: Elsevier - Artificial Intelligence, 240.
- QUOTE: (...) we proposed a method that exploits the structural knowledge derived from semantic networks, together with distributional statistics from text corpora, to produce effective representations of individual word senses or concepts. (...) . Firstly, it is multilingual, as it can be directly applied for the representation of concepts in dozens of languages. Secondly, each vector represents a concept, irrespective of its language, in a unified semantic space having concepts as its dimensions, permitting direct comparison of different representations across languages and hence enabling cross-lingual applications.
In this article, we improve our approach, referred to as Nasari (Novel Approach to a Semantically-Aware Representation of Items) henceforth, and extend their application to a wider range of tasks in lexical semantics.
(...)A brief overview of the evaluation benchmarks and the results across the four tasks follows:
- QUOTE: (...) we proposed a method that exploits the structural knowledge derived from semantic networks, together with distributional statistics from text corpora, to produce effective representations of individual word senses or concepts. (...) . Firstly, it is multilingual, as it can be directly applied for the representation of concepts in dozens of languages. Secondly, each vector represents a concept, irrespective of its language, in a unified semantic space having concepts as its dimensions, permitting direct comparison of different representations across languages and hence enabling cross-lingual applications.
- 1. Semantic similarity. Nasari proved to be highly reliable in the task of semantic similarity measurement, as it provides state-of-the-art performance on several datasets across different evaluation benchmarks:
- Mono-lingual word similarity on four standard word similarity datasets, namely, MC-30 (...), WS-Sim (...), SimLex-999 (...) and RG-65 (...)
- Cross-lingual word similarity on six different cross-lingual datasets on the basis of RG-65 (...)
- 2. Sense clustering. We constructed a highly competitive unsupervised system on the basis of the Nasari representations, outperforming state-of-the-art supervised systems on two manually-annotated Wikipedia sense clustering datasets (...).
- 3. Domain labeling. We used our system for annotating synsets of a large lexical semantic resource (BabelNet), and benchmarked our system against three automatic baselines on two gold standard datasets:(...)
- 4. Word Sense Disambiguation. We proposed a simple framework for a knowledge-rich unsupervised disambiguation system. Our system obtained state-of-the-art results on multilingual All-Words Word Sense Disambiguation using Wikipedia as sense inventory, evaluated on the SemEval-2013 dataset (...)
- 1. Semantic similarity. Nasari proved to be highly reliable in the task of semantic similarity measurement, as it provides state-of-the-art performance on several datasets across different evaluation benchmarks:
2015
- (Camacho-Collados et al., 2015) ⇒ Jose Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. (2015). “NASARI: A Novel Approach to a Semantically-Aware Representation of Items.” In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (NAACL-HLT 2015).
- QUOTE: In this paper we put forward a novel concept representation technique, called NASARI, which exploits the knowledge available in both types of resource in order to obtain effective representations of arbitrary concepts. The contributions of this paper are threefold. First, we propose a novel technique for rich semantic representation of arbitrary WordNet synsets or Wikipedia pages. Second, we provide improvements over the conventional tf-idf weighting scheme by applying lexical specificity (Lafon, 1980), a statistical measure mainly used for term extraction, to the task of computing vector weights in a vector representation. Third, we propose a semantically-aware dimensionality reduction technique that transforms a lexical item's representation from a semantic space of words to one of WordNet synsets, simultaneously providing an implicit disambiguation and a distribution smoothing. We demonstrate that our representation achieves state-of-the-art performance on two different tasks: (1) word similarity on multiple standard datasets: MC30, RG-65, and WordSim-353 similarity, and (2) Wikipedia sense clustering, in which our unsupervised system surpasses the performance of a state-of-the-art supervised technique that exploits knowledge available Wikipedia in several languages.