Text-Items Meaning Similarity Measure
A Text-Items Meaning Similarity Measure is a items meaning similarity measure (a meaning similarity measure) between two or more linguistic sentences.
- AKA: SentSim.
- Context:
- ouput: Text-Items Similarlity Score.
- It can be created by a Text-Items Similarlity System (based on a text-items similarity algorithm).
- It can support tasks such as: information retrieval, text summarization, and machine translation.
- It can be associated to an Identical Text-Item Meaning Measure.
- ...
- Example(s):
- Counter-Example(s):
- A Sentence Syntax Similarity Measure, which focuses on grammatical rather than semantic similarity.
- A Word Meaning Similarity Measure, which compares the similarity of individual words rather than whole sentences.
- A Passage Meaning Similarity Measure, which applies to larger text units like paragraphs or entire documents.
- A Document Meaning Similarity Measure, which evaluates the overall thematic or topic similarity across entire documents.
- See: Paraphrase, Subsumption, Sentence Transformer, Semantic Analysis.
References
2021
- (Chandrasekaran & Mago, 2021) ⇒ Dhivya Chandrasekaran, and Vijay Mago. (2021). “Evolution of Semantic Similarity — a Survey.” ACM Computing Surveys (CSUR) 54, no. 2
- ABSTRACT: Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.
2016
- (Agirre et al., 2016) ⇒ Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. (2016). “Semeval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-lingual Evaluation.” In: SemEval-2016. 10th International Workshop on Semantic Evaluation.
- ABSTRACT:
Semantic Textual Similarity (STS) seeks to measure the degree of semantic equivalence between two snippets of text. Similarity is expressed on an ordinal scale that spans from semantic equivalence to complete unrelatedness. Intermediate values capture specifically defined levels of partial similarity. While prior evaluations constrained themselves to just monolingual snippets of text, the 2016 shared task includes a pilot subtask on computing semantic similarity on cross-lingual text snippets. This year’s traditional monolingual subtask involves the evaluation of English text snippets from the following four domains: Plagiarism Detection, Post-Edited Machine Translations, Question-Answering and News Article Headlines. From the questionanswering domain, we include both questionquestion and answer-answer pairs. The cross-lingual subtask provides paired SpanishEnglish text snippets drawn from the same sources as the English data as well as independently sampled news data. The English subtask attracted 43 participating teams producing 119 system submissions, while the crosslingual Spanish-English pilot subtask attracted 10 teams resulting in 26 systems. -