2021 EvolutionofSemanticSimilarityAS

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Semantic Similarity; Semantic Textual Similarity; Semantic Relatedness.

Notes

Cited By

Quotes

Author Keywords

Abstract

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principle as knowledge-based, corpus-based, deep neural network'-based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

1. Introduction

2. Datasets

3. Knowledge-Based Semantic-Similarity Methods

4. Corpus-Based Semantic-Similarity Methods

5. Deep Neural Network-Based Methods

6. Hybrid Methods

7. Analysis Of Survey

8. Conclusion

Measuring semantic similarity between two text snippets has been one of the most challenging tasks in the field of Natural Language Processing. Various methodologies have been proposed over the years to measure semantic similarity and this survey discusses the evolution, advantages, and disadvantages of these methods. Knowledge-based methods taken into consideration the actual meaning of text however, they are not adaptable across different domains and languages. Corpus-based methods have a statistical background and can be implemented across languages but they do not take into consideration the actual meaning of the text. Deep neural network-based methods show better performance, but they require high computational resources and lack interpretability

 Hybrid methods are formed to take advantage of the benefits from different methods compensating for the shortcomings of each other. It is clear from the survey that each method has its advantages and disadvantages and it is difficult to choose one best model, however, most recent hybrid methods have shown promising results over other independent models. While the focus of recent research is shifted towards building more semantically aware word embeddings, and the transformer models have shown promising results, the need for determining a balance between computational efficiency and performance is still a work in progress. Research gaps can also be seen in areas such as building domain-specific word embeddings, addressing the need for an ideal corpus. This survey would serve as a good foundation for researchers who intend to find new methods to measure semantic similarity.

Acknowledgements

The authors would like to extend our gratitude to the research team in the DaTALab at Lakehead University for their support, in particular Abhijit Rao, Mohiuddin Qudar, Punardeep Sikka, and Andrew Heppner for their feedback and revisions on this publication. We would also like to thank Lakehead University, CASES, and the Ontario Council for Articulation and Transfer (ONCAT), without their support this research would not have been possible.

References

BibTex

@article{2021_EvolutionofSemanticSimilarityA,
  author    = {Dhivya Chandrasekaran and
               Vijay Mago},
  title     = {Evolution of Semantic Similarity - A Survey},
  journal   = {ACM Computing Surveys},
  volume    = {54},
  number    = {2},
  pages     = {41:1--41:37},
  year      = {2021},
  url       = {https://doi.org/10.1145/3440755},
  doi       = {10.1145/3440755},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2021 EvolutionofSemanticSimilarityASDhivya Chandrasekaran
Vijay Mago
Evolution of Semantic Similarity - A Survey2021