Edge-based Semantic Similarity Measure
An Edge-based Semantic Similarity Measure is a Topological Semantic Similarity Measure that calculates the similarity between ontological concepts by counting edges in a semantic network.
- AKA: Distance-based Semantic Similarity Measure, Conceptual Distance Measure, Semantic Distance Measure.
- Context:
- It usually uses a Shortest Path-Length (SPL) Algorithm to measure the distance between two nodes.
- Example(s):
- IntelliGO Semantic Similarity Measure (Benabderrahmane et al., 2010),
- Pozo-Pazos-Valencia Semantic Similarity Measure (Pozo et al., 2008),
- Wu-Zhu-Guo-Zhang-Lin Semantic Similarity Measure (Wu et al., 2006),
- Wu-Su-Mao-Olman-Xu Semantic Similarity Measure (Wu et al., 2005),
- GESTS (Yu et al., 2005),
- Cheng-Cline-Martin Semantic Similarity Measure (Cheng et al., 2004),
- Pekat-Staab Taxonomic Similarity Measure (Pekar & Staab, 2002),
- …
- Counter-Example(s):
- See: Semantic Similarity Measure, Semantic Similarity Neural Network, Semantic Word Similarity Measure, Gene Semantic Similarity Measure, Semantic Relatedness Measure, Similarity Matrix, Generalized Cosine-Similarity Measure (GCSM), Path Distance Similarity Measure, Edge-based Gene Semantic Similarity Measure.
References
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Semantic_similarity#Topological_similarity Retrieved:2021-8-7.
- There are essentially two types of approaches that calculate topological similarity between ontological concepts:
- Edge-based: which use the edges and their types as the data source;
- Node-based: in which the main data sources are the nodes and their properties.
- Other measures calculate the similarity between ontological instances:
- Pairwise: measure functional similarity between two instances by combining the semantic similarities of the concepts they represent
- Groupwise: calculate the similarity directly not combining the semantic similarities of the concepts they represent
- There are essentially two types of approaches that calculate topological similarity between ontological concepts:
2010
- (Benabderrahmane et al., 2010 ) ⇒ Sidahmed Benabderrahmane, Malika Smail-Tabbone, Olivier Poch, Amedeo Napoli, and Marie-Dominique Devignes (2010). "IntelliGO: a New Vector-based Semantic Similarity Measure Including Annotation Origin. BMC bioinformatics, 11(1), 1-16.
- QUOTE: Concerning the comparison between individual ontology terms, the two types of approaches reviewed by Pesquita et al. (2009) are similar to those proposed by Blanchard et al. (2008), namely the edge-based measures which rely on counting edges in the graph, and node-based measures which exploit information contained in the considered term, its descendants and its parents.
In most edge-based measures, the Shortest Path-Length (SPL) is used as a distance measure between two terms in a graph.
- QUOTE: Concerning the comparison between individual ontology terms, the two types of approaches reviewed by Pesquita et al. (2009) are similar to those proposed by Blanchard et al. (2008), namely the edge-based measures which rely on counting edges in the graph, and node-based measures which exploit information contained in the considered term, its descendants and its parents.
2009
- (Pesquita et al., 2009 ) ⇒ Catia Pesquita, Daniel Faria, Andre O. Falcao, Phillip Lord, and Francisco M. Couto (2009). "Semantic Similarity in Biomedical Ontologies". In: PLoS Computational Biology 5(7): e1000443.
- QUOTE: Edge-based approaches are based mainly on counting the number of edges in the graph path between two terms[1]. The most common technique, distance, selects either the shortest path or the average of all paths, when more than one path exists. This technique yields a measure of the distance between two terms, which can be easily converted into a similarity measure. Alternatively, the common path technique calculates the similarity directly by the length of the path from the lowest common ancestor of the two terms to the root node[2].
- ↑ Rada R, Mili H, Bicknell E, Blettner M. Development and application of a metric on semantic nets. 1989. pp. 17–30. In: IEEE Transaction on Systems, Man, and Cybernetics. 19.
- ↑ Wu Z, Palmer MS. Verb semantics and lexical selection. Proceedings of the 32nd. Annual Meeting of the Association for Computational Linguistics (ACL 1994) 1994. pp. 133–138. URL http://dblp.uni-trier.de/db/conf/acl/acl94.html#WuP94.
2008
- (Pozo et al., 2008) ⇒ Angela del Pozo, Florencio Pazos, and Alfonso Valencia (2008). "Defining Functional Distances over Gene Ontology". In: BMC Bioinformatics, 9(1), 1-15.
- QUOTE: Here, we propose a new method to derive ' functional distances' between GO terms based on the co-occurrence of them in the same set of proteins. The simultaneous occurrence of terms in Interpro entries provides a natural biological link between the GO functions. The relationship between terms in the GO structure provides additional semantic information that helps to refine the metric model.
In this method, an initial profile is constructed for each GO term representing its association with a set of Interpro domains (after expanding the Interpro annotations with the parenthood relationships of the GO terms). These profiles are used to generate a matrix of co-occurrence between GO terms. A graph is constructed where the nodes are the GO terms and the edges are weighted according to the distances extracted from this co-occurrence matrix. Spectral clustering is applied to this graph in order to obtain an optimal number of groups of functionally similar GO terms. The distances derived in this way provide a hierarchical clustering of GO terms (functional tree) where the groups of terms with similar biological meaning tend to be close.
- QUOTE: Here, we propose a new method to derive ' functional distances' between GO terms based on the co-occurrence of them in the same set of proteins. The simultaneous occurrence of terms in Interpro entries provides a natural biological link between the GO functions. The relationship between terms in the GO structure provides additional semantic information that helps to refine the metric model.
2006
- (Wu et al., 2006) ⇒ Xiaomei Wu, Lei Zhu, Jie Guo, Da-Yong Zhang, and Kui Lin (2006). "Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations". In: Nucleic Acids Research 34(7).
2005a
- (Wu et al., 2005) ⇒ Hongwei Wu, Zhengchang Su, Fenglou Mao, Victor Olman, and Ying Xu (2005). "Prediction of functional modules based on comparative genome analysis and Gene Ontology application". In: Nucleic Acids Research 33(9).
- QUOTE: In this paper, we define a similarity measure among GO terms to evaluate the functional relationship of genes.
Each of the three measures provides a different perspective about functional relationships among genes. Information derived through each of them is then combined using a Bayesian inference framework. Using this combined score, we predict whether two genes belong to the same functional module. We use a graph representation to describe such a functional relatedness relationship. That is, if two genes are predicted to belong to the same functional module, they will have an edge linking their representative nodes in this graph representation.
- QUOTE: In this paper, we define a similarity measure among GO terms to evaluate the functional relationship of genes.
2005b
- (Yu et al., 2005) ⇒ Hui Yu, Lei Gao, Kang Tu, and Zheng Guo (2005). "Broadly predicting specific gene functions with expression similarity and taxonomy similarity". In: Gene 352(6), Elsevier.
2004
- (Cheng et al., 2004) ⇒ Jill Cheng, Melissa Cline, John Martin, David Finkelstein, Tarif Awad, David Kulp, and Michael A. Siani-Rose (2004). "A Knowledge-Based Clustering Algorithm Driven by Gene Ontology". In: Journal of biopharmaceutical statistics, 14(3), 687-700.
2002
- (Pekar & Steeb, 2002) ⇒ Viktor Pekar, and Steffen Staab (2002). "Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision". In: Proceeding of the 19th International Conference on Computational Linguistics (COLING 2002).
- QUOTE: ... where $\delta(a,b)$ describes the number of edges on the shortest path between $a$ and $b$. The taxonomic similarity between $a$ and $b$ is then given by
$T(a, b)=\dfrac{\delta(\operatorname{root}, c)}{\delta(a, c)+\delta(b, c)+\delta(root, c)}$ |
(2) |
- where $c = lcs(a,b)$. $T$ is such that $0\leq T \leq 1$, with 1 standing for the maximum taxonomic similarity.
$T$ is directly proportional to the number of edges from the least common super-concept to the root, which agrees with the intuition that a given number of edges between two concrete concepts signifies greater similarity than the same number of edges between two abstract concepts.
- where $c = lcs(a,b)$. $T$ is such that $0\leq T \leq 1$, with 1 standing for the maximum taxonomic similarity.