Semantic Word Similarity Dataset

A Semantic Word Similarity Dataset is a Benchmark Dataset used in a Semantic Word Similarity Benchmark Task.

Example(s):
- a SemEval-2017 Task 2 Benchmark Datasets such as:
- a Semantic Similarity SART Dataset,
- …
Counter-Example(s):
See: Training Dataset, Semantic Word Similarity Measure, Semantic Word Similarity System, SemEval-2017 Task 2.

References

2021a

(Chandrasekaran & Mago, 2021) ⇒ Dhivya Chandrasekaran, and Vijay Mago. (2021). “Evolution of Semantic Similarity - A Survey.” In: ACM Computing Surveys, 54(2).
- QUOTE: Semantic similarity methods usually give a ranking or percentage of similarity between texts, rather than a binary decision as similar or not similar. Semantic similarity is often used synonymously with semantic relatedness. However, semantic relatedness not only accounts for the semantic similarity between texts but also considers a broader perspective analyzing the shared semantic properties of two words. For example, the words ‘coffee’ and ‘mug’ may be related to one another closely, but they are not considered semantically similar whereas the words ‘coffee’ and ‘tea’ are semantically similar. Thus, semantic similarity may be considered, as one of the aspects of semantic relatedness. The semantic relationship including similarity is measured in terms of semantic distance, which is inversely proportional to the relationship (...)

2021a

(Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Semantic_similarity Retrieved:2021-5-29.
- Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.^[1] ^[2] The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations.
  For example, "car" is similar to "bus", but is also related to "road" and "driving".
  Computationally, semantic similarity can be estimated by defining a topological similarity, by using ontologies to define the distance between terms/concepts. For example, a naive metric for the comparison of concepts ordered in a partially ordered set and represented as nodes of a directed acyclic graph (e.g., a taxonomy), would be the shortest-path linking the two concept nodes. Based on text analyses, semantic relatedness between units of language (e.g., words, sentences) can also be estimated using statistical means such as a vector space model to correlate words and textual contexts from a suitable text corpus. The evaluation of the proposed semantic similarity / relatedness measures are evaluated through two main ways. The former is based on the use of datasets designed by experts and composed of word pairs with semantic similarity / relatedness degree estimation. The second way is based on the integration of the measures inside specific applications such the information retrieval, recommender systems, natural language processing, etc.

↑ Harispe S.; Ranwez S. Janaqi S.; Montmain J. (2015). “Semantic Similarity from Natural Language and Ontology Analysis". Synthesis Lectures on Human Language Technologies. 8:1: 1–254.
↑ Feng Y.; Bagheri E.; Ensan F.; Jovanovic J. (2017). “The state of the art in semantic relatedness: a framework for comparison". Knowledge Engineering Review. 32: 1–30. doi:10.1017/S0269888917000029.

[harispe2015-1] Harispe S.; Ranwez S. Janaqi S.; Montmain J. (2015). “Semantic Similarity from Natural Language and Ontology Analysis". Synthesis Lectures on Human Language Technologies. 8:1: 1–254.

[Feng2017-2] Feng Y.; Bagheri E.; Ensan F.; Jovanovic J. (2017). “The state of the art in semantic relatedness: a framework for comparison". Knowledge Engineering Review. 32: 1–30. doi:10.1017/S0269888917000029.

[1]

[2]

Semantic Word Similarity Dataset

References

2021a

2021a

Navigation menu

Search