Graph-based Semantic Similarity Measure
Jump to navigation
Jump to search
A Graph-based Semantic Similarity Measure is a Semantic Similarity Measure that based on the topology and properties a directed acyclic graph that represent relationship concepts or ontology terms.
- Example(s):
- Counter-Example(s):
- See: Weighted Directed Graph, Semantic Similarity Neural Network, Topological Semantic Similarity Measure,
References
2018
- (Zhao & Wang, 2018) &rArr Chenguang Zhao, and Zheng Wang (2018)."GOGO: An Improved Algorithm to Measure the Semantic Similarity between Gene Ontology Terms". In: Scientific Reports volume 8, Article number: 15107.
- QUOTE: Measuring the semantic similarity between Gene Ontology (GO) terms is an essential step in functional bioinformatics research. We implemented a software named GOGO for calculating the semantic similarity between GO terms. GOGO has the advantages of both information-content-based and hybrid methods, such as Resnik’s and Wang’s methods. Moreover, GOGO is relatively fast and does not need to calculate information content (IC) from a large gene annotation corpus but still has the advantage of using IC. This is achieved by considering the number of children nodes in the GO directed acyclic graphs when calculating the semantic contribution of an ancestor node giving to its descendent nodes. GOGO can calculate functional similarities between genes and then cluster genes based on their functional similarities. Evaluations performed on multiple pathways retrieved from the saccharomyces genome database (SGD) show that GOGO can accurately and robustly cluster genes based on functional similarities.
2010
- (Benabderrahmane et al., 2010) ⇒ Sidahmed Benabderrahmane, Malika Smail-Tabbone, Olivier Poch, Amedeo Napoli, and Marie-Dominique Devignes (2010). "IntelliGO: A New Vector-based Semantic Similarity Measure Including Annotation Origin". In: BMC Bioinformatics volume 11, Article number: 588.
- QUOTE: Graph-based similarity measures are currently implemented in the Bioconductor GOstats package (Gentleman,2005). Each protein or gene can be associated with a graph which is induced by taking the most specific GO terms annotating the protein, and by finding all parents of those terms up to the root node. The union-intersection and longest shared path (SimUI) method can be used to calculate the between-graph similarity, for example. This method was tested by Guo et al. on human regulatory pathways (Guo el al., 2006). Recently, the SimGIC method was introduced to improve the SimUI method by weighting terms with their information content (Pesquita et al., 2008).
2009
- (Gentleman, 2009) ⇒ R. Gentleman (2009). "Visualizing and Distances Using GO".
- QUOTE: The relationships between different GO terms, within a specific ontology are represented in the form of a directed acyclic graph. The leaves of this graph represent the most specific terms and their are edges from a specific term (child) to all less specific terms (each is a parent). The induced GO graph is the graph that obtains from taking a set of GO terms and finding all parents of those terms, and so on until the root node has been obtained.
2008
- (Pesquita et al., 2008) &rArr Catia Pesquita, Daniel Faria, Hugo Bastos, Antonio EN Ferreira, Andre O Falcao, and Francisco M Couto (2008). "Metrics for GO based protein semantic similarity: a systematic evaluation". In: BMC Bioinformatics volume 9, Article number: S4.
- QUOTE: A total of fourteen semantic similarity measures were tested: Resnik's, Lin's, and Jiang and Conrath's term similarity measures, each with the average, maximum, best-match average (BMA), and BMA plus GraSM approaches; plus the graph-based simUI (Gentleman,2005) and simGIC measures (Pesquita et al., 2007). We evaluated the influence of using electronic annotations by testing the measures on two distinct datasets: one with all annotations (full dataset) and one without electronic annotations (non-electronic dataset).
2007
- (Pesquita et al., 2007) ⇒ Catia Pesquita, Daniel Faria, Hugo Bastos, Antonio EN Ferreira, Andre O Falcao, and Francisco M Couto (2008). "Evaluating GO-based Semantic Similarity Measures". In: SMB/ECCB 2007 SIG Meeting Program Materials, International Society for Computational Biology.
- QUOTE: We also use two graph-based similarity measures: simUI (Gentleman, 2005) and the novel simGIC (for Graph Information Content). simUI calculates similarity as the number of GO terms shared by two proteins divided by the number of GO terms they have together. simGIC is an expansion of simUI where instead of counting the terms we sum their IC.
2006
- (Guo et al., 2006) ⇒ Xiang Guo, Rongxiang Liu, Craig D. Shriver, Hai Hu, and Michael N. Liebman (2006). "Assessing Semantic Similarity Measures For The Characterization Of Human Regulatory Pathways". In: Bioinformatics, Volume 22, Issue 8.
- QUOTE: Graph similarity-based measures are estimated using GOstats package of Bioconductor (Gentleman, 2005). Each protein is associated with an induced graph that is obtained by taking the most specific GO terms annotated with the protein and by finding all parents of those terms until the root node has been obtained. Two methods, union-intersection (UI) and longest shared path (LP), are used to calculate the between-graph similarity. The first method uses the number of nodes two induced graphs share divided by the total number of nodes in two graphs. The resulting similarity values are bounded between 0 and 1 with more similar proteins having values near 1. The second method, LP, adopts the depth of the longest path shared by two induced graphs as the similarity score. The larger the depth the more similar two proteins are. If two proteins are both quite specific and similar, they should have long shared path and thus high similarity score.