Edge-based Semantic Similarity Measure

AKA: Distance-based Semantic Similarity Measure, Conceptual Distance Measure, Semantic Distance Measure.
Context:
- It usually uses a Shortest Path-Length (SPL) Algorithm to measure the distance between two nodes.
Example(s):
Counter-Example(s):
See: Semantic Similarity Measure, Semantic Similarity Neural Network, Semantic Word Similarity Measure, Gene Semantic Similarity Measure, Semantic Relatedness Measure, Similarity Matrix, Generalized Cosine-Similarity Measure (GCSM), Path Distance Similarity Measure, Edge-based Gene Semantic Similarity Measure.

References

(Benabderrahmane et al., 2010 ) ⇒ Sidahmed Benabderrahmane, Malika Smail-Tabbone, Olivier Poch, Amedeo Napoli, and Marie-Dominique Devignes (2010). "IntelliGO: a New Vector-based Semantic Similarity Measure Including Annotation Origin. BMC bioinformatics, 11(1), 1-16.
- QUOTE: Concerning the comparison between individual ontology terms, the two types of approaches reviewed by Pesquita et al. (2009) are similar to those proposed by Blanchard et al. (2008), namely the edge-based measures which rely on counting edges in the graph, and node-based measures which exploit information contained in the considered term, its descendants and its parents.
  In most edge-based measures, the Shortest Path-Length (SPL) is used as a distance measure between two terms in a graph.

(Pesquita et al., 2009 ) ⇒ Catia Pesquita, Daniel Faria, Andre O. Falcao, Phillip Lord, and Francisco M. Couto (2009). "Semantic Similarity in Biomedical Ontologies". In: PLoS Computational Biology 5(7): e1000443.
- QUOTE: Edge-based approaches are based mainly on counting the number of edges in the graph path between two terms^[1]. The most common technique, distance, selects either the shortest path or the average of all paths, when more than one path exists. This technique yields a measure of the distance between two terms, which can be easily converted into a similarity measure. Alternatively, the common path technique calculates the similarity directly by the length of the path from the lowest common ancestor of the two terms to the root node^[2].

↑ Rada R, Mili H, Bicknell E, Blettner M. Development and application of a metric on semantic nets. 1989. pp. 17–30. In: IEEE Transaction on Systems, Man, and Cybernetics. 19.
↑ Wu Z, Palmer MS. Verb semantics and lexical selection. Proceedings of the 32nd. Annual Meeting of the Association for Computational Linguistics (ACL 1994) 1994. pp. 133–138. URL http://dblp.uni-trier.de/db/conf/acl/acl94.html#WuP94.

(Pozo et al., 2008) ⇒ Angela del Pozo, Florencio Pazos, and Alfonso Valencia (2008). "Defining Functional Distances over Gene Ontology". In: BMC Bioinformatics, 9(1), 1-15.
- QUOTE: Here, we propose a new method to derive ' functional distances' between GO terms based on the co-occurrence of them in the same set of proteins. The simultaneous occurrence of terms in Interpro entries provides a natural biological link between the GO functions. The relationship between terms in the GO structure provides additional semantic information that helps to refine the metric model.
  In this method, an initial profile is constructed for each GO term representing its association with a set of Interpro domains (after expanding the Interpro annotations with the parenthood relationships of the GO terms). These profiles are used to generate a matrix of co-occurrence between GO terms. A graph is constructed where the nodes are the GO terms and the edges are weighted according to the distances extracted from this co-occurrence matrix. Spectral clustering is applied to this graph in order to obtain an optimal number of groups of functionally similar GO terms. The distances derived in this way provide a hierarchical clustering of GO terms (functional tree) where the groups of terms with similar biological meaning tend to be close.

(Wu et al., 2005) ⇒ Hongwei Wu, Zhengchang Su, Fenglou Mao, Victor Olman, and Ying Xu (2005). "Prediction of functional modules based on comparative genome analysis and Gene Ontology application". In: Nucleic Acids Research 33(9).
- QUOTE: In this paper, we define a similarity measure among GO terms to evaluate the functional relationship of genes.
  Each of the three measures provides a different perspective about functional relationships among genes. Information derived through each of them is then combined using a Bayesian inference framework. Using this combined score, we predict whether two genes belong to the same functional module. We use a graph representation to describe such a functional relatedness relationship. That is, if two genes are predicted to belong to the same functional module, they will have an edge linking their representative nodes in this graph representation.

$T(a, b)=\dfrac{\delta(\operatorname{root}, c)}{\delta(a, c)+\delta(b, c)+\delta(root, c)}$

(2)

where $c = lcs(a,b)$. $T$ is such that $0\leq T \leq 1$, with 1 standing for the maximum taxonomic similarity.

$T$ is directly proportional to the number of edges from the least common super-concept to the root, which agrees with the intuition that a given number of edges between two concrete concepts signifies greater similarity than the same number of edges between two abstract concepts.