Node-based Semantic Similarity Measure
Jump to navigation
Jump to search
A Node-based Semantic Similarity Measure is a Topological Semantic Similarity Measure that calculates the similarity between ontological concepts based on the information content of the nodes in a semantic network.
- AKA: Content-based Semantic Similarity Measure.
- Example(s):
- Alignment-based Similarity Measure (Pilehvar et al., 2013),
- Bodenreider-Aubry-Burgun Semantic Similarity Measure (Bodenreider et al. 2008),
- XOA Semantic Similarity Measure, (Riensche et al. 2007),
- SimRel-FunSim (Schlicker et al. 2006),
- GraSM (Couto et al., 2005),
- Generalized Lin's Semantic Ontology Term Similarity Measure (Maguitman et al., 2005),
- Lin's Semantic Similarity Measure (Lin, 1998),
- Jiang-Conrath Node-based Semantic Similarity Measure (Jiang & Conrath, 1997),
- Resnik's Semantic Similarity Measure (Resnik, 1995),
- …
- Counter-Example(s):
- See: Semantic Similarity Measure, Semantic Similarity Neural Network, Semantic Word Similarity Measure, Gene Semantic Similarity Measure, Semantic Relatedness Measure, Similarity Matrix, Generalized Cosine-Similarity Measure (GCSM).
References
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Semantic_similarity#Topological_similarity Retrieved:2021-8-7.
- There are essentially two types of approaches that calculate topological similarity between ontological concepts:
- Edge-based: which use the edges and their types as the data source;
- Node-based: in which the main data sources are the nodes and their properties.
- Other measures calculate the similarity between ontological instances:
- Pairwise: measure functional similarity between two instances by combining the semantic similarities of the concepts they represent
- Groupwise: calculate the similarity directly not combining the semantic similarities of the concepts they represent
- There are essentially two types of approaches that calculate topological similarity between ontological concepts:
2013
- (Pilehvar et al., 2013) ⇒ Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. (2013). “Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.” In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013) Volume 1: Long Papers.
2010
- (Benabderrahmane et al., 2010 ) ⇒ Sidahmed Benabderrahmane, Malika Smail-Tabbone, Olivier Poch, Amedeo Napoli, and Marie-Dominique Devignes (2010). "IntelliGO: a New Vector-based Semantic Similarity Measure Including Annotation Origin. BMC bioinformatics, 11(1), 1-16.
- QUOTE: Concerning the comparison between individual ontology terms, the two types of approaches reviewed by Pesquita et al. (2009) are similar to those proposed by Blanchard et al. (2008), namely the edge-based measures which rely on counting edges in the graph, and node-based measures which exploit information contained in the considered term, its descendants and its parents.
In most edge-based measures, the Shortest Path-Length (SPL) is used as a distance measure between two terms in a graph.
- QUOTE: Concerning the comparison between individual ontology terms, the two types of approaches reviewed by Pesquita et al. (2009) are similar to those proposed by Blanchard et al. (2008), namely the edge-based measures which rely on counting edges in the graph, and node-based measures which exploit information contained in the considered term, its descendants and its parents.
2009
- (Pesquita et al., 2009 ) ⇒ Catia Pesquita, Daniel Faria, Andre O. Falcao, Phillip Lord, and Francisco M. Couto (2009). "Semantic Similarity in Biomedical Ontologies". In: PLoS Computational Biology 5(7): e1000443.
- QUOTE: Node-based approaches rely on comparing the properties of the terms involved, which can be related to the terms themselves, their ancestors, or their descendants. One concept commonly used in these approaches is information content (IC), which gives a measure how specific and informative a term is. The IC of a term $c$ can be quantified as the negative log likelihood, $-\log p(c)$
where $p(c)$ is the probability of occurrence of $c$ in a specific corpus (such as the UniProt Knowledgebase), being normally estimated by its frequency of annotation. Alternatively, the IC can also be calculated from the number of children a term has in the GO structure[1], although this approach is less commonly used.
- QUOTE: Node-based approaches rely on comparing the properties of the terms involved, which can be related to the terms themselves, their ancestors, or their descendants. One concept commonly used in these approaches is information content (IC), which gives a measure how specific and informative a term is. The IC of a term $c$ can be quantified as the negative log likelihood,
- ↑ Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in wordnet. ECAI. 2004. pp. 1089–1090.
2008
- (Bodenreider et al., 2008) ⇒ Olivier Bodenreider, Marc Aubry, and Anita Burgun (2008). "Non-Lexical Approaches To Identifying Associative Relations In The Gene Ontology". In: Pac Symp Biocomputing.
2007
- (Riensche et al., 2007) ⇒ Roderick M. Riensche, Bob L. Baddeley, Antonio P. Sanfilippo, Christian Posse, and Banu Gopalan (2007)."XOA: Web-Enabled Cross-Ontological Analytics". In: 2007 IEEE Congress on Services.
2006
- (Schlicker et al., 2006) ⇒ Andreas Schlicker, Francisco S. Domingues, Jorg Rahnenfuhrer, and Thomas Lengauer (2006) ⇒ "A new measure for functional similarity of gene products based on Gene Ontology. In: BMC Bioinformatics 7: 302.
2005a
- (Couto et al., 2005) ⇒ Francisco M. Couto, Mario J. Silva, and Pedro Coutinho (2005). "Semantic Similarity over the Gene Ontology: Family Correlation and Selecting Disjunctive Ancestors". In: Proceedings of the ACM Conference in Information and Knowledge Management (ACM-CIKM 2005).
2005b
- (Maguitman et al., 2006) ⇒ Ana G. Maguitman, Filippo Menczer, Heather Roinestad, and Alessandro Vespignani (2005, May). "Algorithmic Detection of Semantic Similarity". In: Proceedings of the 14th International Conference on World Wide Web (pp. 107-116).
1998
- (Lin, 1998) ⇒ Dekang Lin. (1998). “An Information-Theoretic Definition of Similarity.” In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998).
1997
- (Jiang & Conrath, 1997) ⇒ Jay J. Jiang, and David W. Conrath. (1997). “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.” In: Proceedings on International Conference on Research in Computational Linguistics (ROCLING X).
1995
- (Resnik, 1995) ⇒ Philip Resnik. (1995). “Using Information Content to Evaluate Semantic Similarity in a Taxonomy.” In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI 1995).