2003 InvestSemantSimMeasAcrossTheGeneOntoTheRelBetwSeqAndAnno
- (Lord, et al., 2003) ⇒ Phillip W. Lord, Robert D. Stevens, Andy Brass, and Carole A. Goble. (2003). “Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation" In: Bioinformatics, (2003).
Subject Headings: Semantic Similarity Measure.
Notes
Cited By
Quotes
Abstract
Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or ‘semantic similarity’ between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases.
References
- Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
- Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48.
- Blagosklonny,M.V. and Pardee,A.B. (2002). Unearthing the gems. Nature, 416, 373.
- Budanitsky,A. and Hirst,G. (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh.
- Camon,E., Magrane,M., Barrell,D., Binns,D., Fleischmann,W., Kersey,P., Mulder,N., Oinn,T. and Apweiler,R. (2002). The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL and InterPro. Genome Res., 13, 666–672.
- Chang,J., Raychaudhuri,S. and Altman,R. (2001) Including biological literature improves homology search. Pac. Symp. Biocomput., 6, 374–383
- Fellbaum,C. (ed.) (1998) WordNet. An electronic lexical database. Massachusetts, Cambridge, MIT Press.
- Jiang,J.J. and Conrath,D.W. (1998) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics. ROCLING X, Taiwan.
- Dekang Lin (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, pp. 296–304.
- Lord,P., Stevens,R., Brass,A. and Goble,C. (2003). Semantic similarity measures as tools for exploring the Gene Ontology. Pac. Symp. Biocomput., 8, 601–612.
- MacCallum,R.M., Kelley,L.A. and Sternberg,M.J. (2000) SAWTED: structure assignment with text description–enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. Bioinformatics, 16, 125–129.
- Odell,J. (1998) Six Different Kinds of Aggregation. In advanced object-oriented analysis and design using UML. Cambridge University Press, pp. 139–149.
- Rada,R., Mili,H., Bicknell,E. and Blettner,M. (1989) Development and application of a metric on semantic nets. IEEE Transaction on Systems, Man, and Cybernetics, 1, 17–30.
- Resnik,P. (1999) Semantic similarity in a taxonomy: an informationbased measure and its application to problems of ambiguity in natural language. J. Artif. Intelligence Res., 11, 95–130.
- Stevens,R., Goble,C. and Sean Bechhofer. (2000) Ontology-based Knowledge Representation for Bioinformatics. Briefings in Bioinformatics, 1, 398–416.
- The Gene Ontology Consortium (2001) Creating the Gene Ontology resource: design and implementation. Genome Res., 11, 1425– 1433.
- Wilbur,W.J. and Yang,Y. (1996) An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Comput. Biol. Med., 26, 209–222.
- Winston,M., Chaffin,R. and Herrmann,D. (1987) A taxonomy of part-whole relations. Cognitive Science, 11, 417–444.
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2003 InvestSemantSimMeasAcrossTheGeneOntoTheRelBetwSeqAndAnno | Phillip W. Lord Robert D. Stevens Andy Brass Carole A. Goble | Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation | Bioinformatics Subject Area | http://www.cs.man.ac.uk/~stevensr/papers/bioinformatics-semantic-similarity.pdf | 2003 |