2007 ModelingInformationScentACompar

(Budiu et al., 2007) ⇒ Raluca Budiu, Christiaan Royer, and Peter Pirolli. (2007). “Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora.” In: Large Scale Semantic Access to Content (Text, Image, Video, and Sound).

Subject Headings:

Notes

Cited By

Quotes

Abstract

In this paper we describe a comparison among three systems that estimate semantic similarity between words: Latent Semantic Analysis (Landauer & Dumais, 1997), Pointwise Mutual Information (Turney, 2001), and Generalized Latent Semantic Analysis (Matveeva, Levow, Farahat, & Royer, 2005). We compare all these techniques on a unique corpus (TASA) and, for PMI and GLSA, we also report performance on a larger web-based corpus. The evaluation is carried out through two kinds of tests: (1) synonymy tests, and (2) comparison with human word similarity judgments. The results indicate that for large corpora PMI works best on word similarity tests, and GLSA on synonymy tests. For the smaller TASA corpus, GLSA produced the best performance on most tests. A large corpus improved the performance of PMI, but, in most cases, did not improve that of GLSA.

References

1. Marilyn Hughes Blackmon, Muneo Kitajima, Peter G. Polson, Tool for Accurately Predicting Website Navigation Problems, Non-problems, Problem Severity, and Effectiveness of Repairs, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 02-07, 2005, Portland, Oregon, USA doi:10.1145/1054972.1054978
2. Raluca Budiu, Peter Pirolli, Michael Fleetwood, Navigation in Degree of Interest Trees, Proceedings of the Working Conference on Advanced Visual Interfaces, May 23-26, 2006, Venezia, Italy doi:10.1145/1133265.1133358
3. Stuart K. Card, Jock D. Mackinlay, Ben Shneiderman, Information Visualization, Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1999
4. Ed H. Chi, Adam Rosien, Gesara Supattanasiri, Amanda Williams, Christiaan Royer, Celia Chow, Erica Robles, Brinda Dalal, Julie Chen, Steve Cousins, The Bloodhound Project: Automating Discovery of Web Usability Issues Using the InfoScentÏ Simulator, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA doi:10.1145/642611.642699
5. Cho, J., Garcia-Molina, H., Haveliwala, T., Lam, W., Paepcke, A., Raghavan, S., & Wesley, G. (2004). Stanford WebBase Components and Applications (Tech. Rep.). Stanford University.
6. Placing Search in Context: The Concept Revisited, ACM Transactions on Information Systems (TOIS), v.20 n.1, p.116-131, January 2002 doi:10.1145/503104.503110
7. Jarmasz, M., & Szpakowicz, S. (2003). Roget's Thesaurus and Semantic Similarity. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2003) (p. 212--219). Borovets, Bulgaria.
8. Ishwinder Kaur, Anthony J. Hornof, A Comparison of LSA, WordNet and PMI-IR for Predicting User Click Behavior, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 02-07, 2005, Portland, Oregon, USA doi:10.1145/1054972.1054980
9. Landauer, T. K., & Dumais, S. (1997). A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. Psychological Review, 104, 211--240.
10. Landauer, T. K., Foltz, P., & Laham, D. (1998). An Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259--284.
11. Christopher D. Manning, Hinrich SchÃ¼tze, Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA, 1999
12. Matveeva, I., Levow, G., Farahat, A., & Royer, C. (2005). Terms Representation with Generalized Latent Semantic Analysis. In Proceedings of the Recent Advances in Natural Language Processing Conference (RANLP 2005).
13. Miller, G., & Charles, W. (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1--28.
14. George A. Miller, WordNet: A Lexical Database for English, Communications of the ACM, v.38 n.11, p.39-41, Nov. 1995 doi:10.1145/219717.219748
15. Nakov, P., Valchanova, E., & Angelova, G. (2003). Towards Deeper Understanding of the Lsa Performance. In Proceedings of the Recent Advances in Natural Language Processing Conference ((RANLP 2003) (p. 311--318). Borovetz, Bulgaria.
16. Nelson, D. L., Dyrdal, G. M., & Goodmon, L. B. (2005). What is Preexisting Strength? Predicting Free Association Probabilities, Similarity Ratings, and Cued Recall Probabilities. Psychonomic Bulletin & Review, 12, 711--719.
17. Yoshiki Niwa, Yoshihiko Nitta, Co-occurrence Vectors from Corpora Vs. Distance Vectors from Dictionaries, Proceedings of the 15th Conference on Computational Linguistics, August 05-09, 1994, Kyoto, Japan doi:10.3115/991886.991938
18. Pirolli, P. (2005). Rational Analyses of Information Foraging on the Web. Cognitive Science, 29(3), 343--373.
19. Pirolli, P., & Card, S. (1999). Information Foraging. Psychological Review.
20. Peter Pirolli, Stuart K. Card, Mija M. Van Der Wege, The Effect of Information Scent on Searching Information: Visualizations of Large Tree Structures, Proceedings of the Working Conference on Advanced Visual Interfaces, p.161-172, May 2000, Palermo, Italy doi:10.1145/345513.345304
21. Philip Resnik, Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proceedings of the 14th International Joint Conference on Artificial Intelligence, p.448-453, August 20-25, 1995, Montreal, Quebec, Canada
22. Rohde, D., Gonnerman, L., & Plaut, D. (2006). An Improved Model of Semantic Similarity based on Lexical Co-occurence. (Manuscript Submitted to Cognitive Science)
23. Herbert Rubenstein, John B. Goodenough, Contextual Correlates of Synonymy, Communications of the ACM, v.8 n.10, p.627-633, Oct. 1965 doi:10.1145/365628.365657
24. Spool, J., Perfetti, C., & Brittan, D. (2004). Designing for the Scent of Information. UI Engineering.
25. Egidio Terra, C. L. A. Clarke, Frequency Estimates for Statistical Word Similarity Measures, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.165-172, May 27-June 01, 2003, Edmonton, Canada doi:10.3115/1073445.1073477
26. Peter D. Turney, Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL, Proceedings of the 12th European Conference on Machine Learning, p.491-502, September 05-07, 2001
27. Zeno, S., Ivens, S., Millard, R., & Duvvuri, R. (1995). The Educator's Word Frequency Guide. Touchstone Applied Science Associates (TASA), Inc.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 ModelingInformationScentACompar	Raluca Budiu Christiaan Royer Peter Pirolli			Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora						2007

2007 ModelingInformationScentACompar

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search