2007 ModelingInformationScentACompar
- (Budiu et al., 2007) ⇒ Raluca Budiu, Christiaan Royer, and Peter Pirolli. (2007). “Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora.” In: Large Scale Semantic Access to Content (Text, Image, Video, and Sound).
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222007%22+Modeling+Information+Scent%3A+A+Comparison+of+LSA%2C+PMI+and+GLSA+Similarity+Measures+on+Common+Tests+and+Corpora
- http://dl.acm.org/citation.cfm?id=1931390.1931422&preflayout=flat#citedby
Quotes
Abstract
In this paper we describe a comparison among three systems that estimate semantic similarity between words: Latent Semantic Analysis (Landauer & Dumais, 1997), Pointwise Mutual Information (Turney, 2001), and Generalized Latent Semantic Analysis (Matveeva, Levow, Farahat, & Royer, 2005). We compare all these techniques on a unique corpus (TASA) and, for PMI and GLSA, we also report performance on a larger web-based corpus. The evaluation is carried out through two kinds of tests: (1) synonymy tests, and (2) comparison with human word similarity judgments. The results indicate that for large corpora PMI works best on word similarity tests, and GLSA on synonymy tests. For the smaller TASA corpus, GLSA produced the best performance on most tests. A large corpus improved the performance of PMI, but, in most cases, did not improve that of GLSA.
References
- 1. Marilyn Hughes Blackmon, Muneo Kitajima, Peter G. Polson, Tool for Accurately Predicting Website Navigation Problems, Non-problems, Problem Severity, and Effectiveness of Repairs, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 02-07, 2005, Portland, Oregon, USA doi:10.1145/1054972.1054978
- 2. Raluca Budiu, Peter Pirolli, Michael Fleetwood, Navigation in Degree of Interest Trees, Proceedings of the Working Conference on Advanced Visual Interfaces, May 23-26, 2006, Venezia, Italy doi:10.1145/1133265.1133358
- 3. Stuart K. Card, Jock D. Mackinlay, Ben Shneiderman, Information Visualization, Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1999
- 4. Ed H. Chi, Adam Rosien, Gesara Supattanasiri, Amanda Williams, Christiaan Royer, Celia Chow, Erica Robles, Brinda Dalal, Julie Chen, Steve Cousins, The Bloodhound Project: Automating Discovery of Web Usability Issues Using the InfoScentÏ Simulator, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA doi:10.1145/642611.642699
- 5. Cho, J., Garcia-Molina, H., Haveliwala, T., Lam, W., Paepcke, A., Raghavan, S., & Wesley, G. (2004). Stanford WebBase Components and Applications (Tech. Rep.). Stanford University.
- 6. Placing Search in Context: The Concept Revisited, ACM Transactions on Information Systems (TOIS), v.20 n.1, p.116-131, January 2002 doi:10.1145/503104.503110
- 7. Jarmasz, M., & Szpakowicz, S. (2003). Roget's Thesaurus and Semantic Similarity. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2003) (p. 212--219). Borovets, Bulgaria.
- 8. Ishwinder Kaur, Anthony J. Hornof, A Comparison of LSA, WordNet and PMI-IR for Predicting User Click Behavior, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 02-07, 2005, Portland, Oregon, USA doi:10.1145/1054972.1054980
- 9. Landauer, T. K., & Dumais, S. (1997). A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. Psychological Review, 104, 211--240.
- 10. Landauer, T. K., Foltz, P., & Laham, D. (1998). An Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259--284.
- 11. Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA, 1999
- 12. Matveeva, I., Levow, G., Farahat, A., & Royer, C. (2005). Terms Representation with Generalized Latent Semantic Analysis. In Proceedings of the Recent Advances in Natural Language Processing Conference (RANLP 2005).
- 13. Miller, G., & Charles, W. (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1--28.
- 14. George A. Miller, WordNet: A Lexical Database for English, Communications of the ACM, v.38 n.11, p.39-41, Nov. 1995 doi:10.1145/219717.219748
- 15. Nakov, P., Valchanova, E., & Angelova, G. (2003). Towards Deeper Understanding of the Lsa Performance. In Proceedings of the Recent Advances in Natural Language Processing Conference ((RANLP 2003) (p. 311--318). Borovetz, Bulgaria.
- 16. Nelson, D. L., Dyrdal, G. M., & Goodmon, L. B. (2005). What is Preexisting Strength? Predicting Free Association Probabilities, Similarity Ratings, and Cued Recall Probabilities. Psychonomic Bulletin & Review, 12, 711--719.
- 17. Yoshiki Niwa, Yoshihiko Nitta, Co-occurrence Vectors from Corpora Vs. Distance Vectors from Dictionaries, Proceedings of the 15th Conference on Computational Linguistics, August 05-09, 1994, Kyoto, Japan doi:10.3115/991886.991938
- 18. Pirolli, P. (2005). Rational Analyses of Information Foraging on the Web. Cognitive Science, 29(3), 343--373.
- 19. Pirolli, P., & Card, S. (1999). Information Foraging. Psychological Review.
- 20. Peter Pirolli, Stuart K. Card, Mija M. Van Der Wege, The Effect of Information Scent on Searching Information: Visualizations of Large Tree Structures, Proceedings of the Working Conference on Advanced Visual Interfaces, p.161-172, May 2000, Palermo, Italy doi:10.1145/345513.345304
- 21. Philip Resnik, Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proceedings of the 14th International Joint Conference on Artificial Intelligence, p.448-453, August 20-25, 1995, Montreal, Quebec, Canada
- 22. Rohde, D., Gonnerman, L., & Plaut, D. (2006). An Improved Model of Semantic Similarity based on Lexical Co-occurence. (Manuscript Submitted to Cognitive Science)
- 23. Herbert Rubenstein, John B. Goodenough, Contextual Correlates of Synonymy, Communications of the ACM, v.8 n.10, p.627-633, Oct. 1965 doi:10.1145/365628.365657
- 24. Spool, J., Perfetti, C., & Brittan, D. (2004). Designing for the Scent of Information. UI Engineering.
- 25. Egidio Terra, C. L. A. Clarke, Frequency Estimates for Statistical Word Similarity Measures, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.165-172, May 27-June 01, 2003, Edmonton, Canada doi:10.3115/1073445.1073477
- 26. Peter D. Turney, Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL, Proceedings of the 12th European Conference on Machine Learning, p.491-502, September 05-07, 2001
- 27. Zeno, S., Ivens, S., Millard, R., & Duvvuri, R. (1995). The Educator's Word Frequency Guide. Touchstone Applied Science Associates (TASA), Inc.
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2007 ModelingInformationScentACompar | Raluca Budiu Christiaan Royer Peter Pirolli | Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora | 2007 |