1997 SemanticSimBasedOnCorpusStats
- (Jiang & Conrath, 1997) ⇒ Jay J. Jiang, and David W. Conrath. (1997). “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.” In: Proceedings on International Conference on Research in Computational Linguistics (ROCLING X).
Subject Headings: Jiang-Conrath Similarity Measure, Word Sense Disambiguation Algorithm, Lexical Semantic Similarity Function.
Notes
- It proposes a measure of semantic similarity between word pairs that combines statistical and lexical information (Turney, 2001).
Cited By
2001
- (Turney, 2001) ⇒ Peter D. Turney. (2001). “Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL.” In: Proceedings of the 12th European Conference on Machine Learning (ECML 2001). doi:10.1007/3-540-44795-4_42
- QUOTE: Various measures of semantic similarity between word pairs have been proposed, some using statistical (unsupervised learning from text) techniques [16, 17, 18], some using lexical databases (hand-built) [19, 20], and some hybrid approaches, combining statistics and lexical information [21, (Jiang & Conrath, 1997)]. Statistical techniques typically suffer from the sparse data problem: they perform poorly when the words are relatively rare, due to the scarcity of data. Hybrid approaches attempt to address this problem by supplementing sparse data with information from a lexical database [21, (Jiang & Conrath, 1997)].
Quotes
Abstract
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task.
References
- E. Agirre, and G. Rigau, 1995, “A proposal for Word Sense Disambiguation Using Conceptual Distance”, Proceedings of the First International Conference on Recent Advanced in NLP, Bulgaria.
- Kenneth W. Church and P. Hanks, 1989, “Word Association Norms, Mutual Information, and Lexicography”, Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, ACL27’89, 76-83.
- Grefenstette, G., 1992, “Use of Syntactic Context to Produce Term Association Lists for Text Retrieval”, Proceedings of the 15th Annual International Conference on Research and Development in Information Retrieval, SIGIR’92.
- Hindle, D., 1990, “Noun Classification from Predicate-Argument Structures”, Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, ACL28’90, 268-275.
- Kozima, H. and T. Furugori, 1993, “Similarity Between Words Computed by Spreading Activations on an English Dictionary”, Proceedings of the 5th Conference of the European Chapter of the Association for Computational Linguistics, EACL-93, 232-239.
- Lee, J.H., M.H. Kim, and Y.J. Lee, 1993, “Information Retrieval Based on Conceptual Distance in IS-A Hierarchies”, Journal of Documentation, Vol. 49, No. 2, 188-207.
- George A. Miller, 1990, “Nouns in WordNet: A Lexical Inheritance System”, International Journal of Lexicography, Vol. 3, No. 4, 245-264.
- George A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller, 1990, “Introduction to WordNet: An Online Lexical Database”, International Journal of Lexicography, Vol. 3, No. 4, 235-244.
- George A. Miller and W.G. Charles, 1991, “Contextual Correlates of Semantic Similarity”, Language and Cognitive Processes, Vol. 6, No. 1, 1-28.
- George A. Miller, C. Leacock, R. Tengi, and R.T. Bunker, 1993, “A Semantic Concordance”, Proceedings of ARPA Workshop on Human Language Technology, 303-308, March 1993.
- Morris, J. and Graeme Hirst, 1991, “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text”, Computational Linguistics, Vol. 17, 21-48.
- Niwa, Y. and Y. Nitta. 1994, “Co-occurrence Vectors from Corpora vs. Distance Vectors from Dictionaries”, Proceedings of the 17th International Conference on computational Linguistics, COLING’94, 304-309.
- Rada, R., H. Mili, E. Bicknell, and M. Bletner, 1989, “Development and Application of a Metric on Semantic Nets”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 19, No. 1, 17-30.
- Philip Resnik, 1992, “WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery”, Proceedings of the AAAI Symposium on Probabilistic Approaches to Natural Language.
- Philip Resnik, 1995, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy”, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Vol. 1, 448-453, Montreal, August 1995.
- Richardson, R. and A.F. Smeaton, 1995, “Using WordNet in a Knowledge-based Approach to Information Retrieval”, Working Paper, CA-0395, School of Computer Applications, Dublin City University, Ireland.
- Smeaton, A.F. and I. Quigley, 1996, “Experiments on Using Semantic Distance Between Words in Image Caption Retrieval”, Working Paper, CA-0196, School of Computer Applications, Dublin City University, Ireland.
- Strzalkowski, T. and B. Vauthey, 1992, “Information Retrieval Using Robust Natural Language Processing”, Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, ACL 1992.
- Sussna, M., 1993, “Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network”, Proceedings of the Second International Conference on Information and Knowledge Management, CIKM 1993.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
1997 SemanticSimBasedOnCorpusStats | Jay J. Jiang David W. Conrath | Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy | Proceedings on International Conference on Research in Computational Linguistics | 1997 |