Sentence Meaning Similarity Measure

A Sentence Meaning Similarity Measure is a text-item meaning similarity measure (a meaning similarity measure) between two or more linguistic sentences.

AKA: SentSim.
Context:
- ouput: Sentence Similarlity Score.
- It can be created by a Sentence Similarlity System (based on a sentence similarity algorithm).
- It can support tasks such as: information retrieval, text summarization, and machine translation to identify or group sentences with similar meanings.
- It can be associated to an Identical Sentence Meaning Measure.
- ...
Example(s):
- SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT).
  - SentSim("Roberto Mancini gets the boot from Man City"; “Roberto Mancini has been sacked by Manchester City with the Blues saying") ⇒ 0.98
- one based on a Sentence Embedding Model.
- a Contract Sentence Similarity Measure.
- …
Counter-Example(s):
- A Sentence Syntax Similarity Measure, which focuses on grammatical rather than semantic similarity.
- A Word Meaning Similarity Measure, which compares the similarity of individual words rather than whole sentences.
- A Passage Meaning Similarity Measure, which applies to larger text units like paragraphs or entire documents.
- A Document Meaning Similarity Measure, which evaluates the overall thematic or topic similarity across entire documents.
See: Paraphrase, Subsumption, Sentence Transformer, Semantic Analysis.

References

2022

(Sun et al., 2022) ⇒ X. Sun, Y. Meng, X. Ao, F. Wu, T. Zhang, J. Li, and others. (2022). “Sentence Similarity Based on Contexts.” In: Transactions of the Association for Computational Linguistics. MIT Press
- NOTE: It introduces a novel framework for measuring sentence similarity based on the context, suggesting that the meaning of a word is determined by its usage in sentences.

2019

(Farouk, 2019) ⇒ M. Farouk. (2019). “Measuring Sentences Similarity: A Survey.” arXiv preprint arXiv:1910.03940.
- NOTE: It provides a comprehensive survey of methods for measuring sentence similarity, highlighting the growing interest and variety of approaches in this area.

2016

(Agirre et al., 2016) ⇒ Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. (2016). “Semeval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-lingual Evaluation.” In: SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL;
- ABSTRACT:

Semantic Textual Similarity (STS) seeks to measure the degree of semantic equivalence between two snippets of text. Similarity is expressed on an ordinal scale that spans from semantic equivalence to complete unrelatedness. Intermediate values capture specifically defined levels of partial similarity. While prior evaluations constrained themselves to just monolingual snippets of text, the 2016 shared task includes a pilot subtask on computing semantic similarity on cross-lingual text snippets. This year’s traditional monolingual subtask involves the evaluation of English text snippets from the following four domains: Plagiarism Detection, Post-Edited Machine Translations, Question-Answering and News Article Headlines. From the questionanswering domain, we include both questionquestion and answer-answer pairs. The cross-lingual subtask provides paired SpanishEnglish text snippets drawn from the same sources as the English data as well as independently sampled news data. The English subtask attracted 43 participating teams producing 119 system submissions, while the crosslingual Spanish-English pilot subtask attracted 10 teams resulting in 26 systems. -

2015

http://alt.qcri.org/semeval2015/task1/
- QUOTE: Given two sentences, the participants are asked to determine whether they express the same or very similar meaning and optionally a degree score between 0 and 1. Following the literature on paraphrase identification, system performance is primarily evaluated by the F-1 score and Accuracy against human judgments. Additional evaluations include Pearson correlation and PINC (Chen and Dolan, 2011), which measures lexical dissimilarity between sentence pairs.

2008

(Achananuparp et al., 2008) ⇒ P. Achananuparp, X. Hu, and X. Shen. (2008). “The Evaluation of Sentence Similarity Measures.” In: Proceedings of DaWaK 2008 Turin, Italy, September 2-5. Springer
- NOTE: It discusses the evaluation of various sentence similarity measures, emphasizing the importance of accurate similarity judgments even when sentences do not share exact words or phrases.

2006

(Li et al., 2006) ⇒ Y. Li, D. McLean, Z.A. Bandar, J.D. O'Shea, and K. Crockett. (2006). “Sentence Similarity Based on Semantic Nets and Corpus Statistics.” In: IEEE Transactions on Knowledge and Data Engineering.
- NOTE: It explores the application of semantic nets and corpus statistics to compute sentence similarity, addressing the limitations of existing measures.