2007 ModelingInformationScentACompar

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

In this paper we describe a comparison among three systems that estimate semantic similarity between words: Latent Semantic Analysis (Landauer & Dumais, 1997), Pointwise Mutual Information (Turney, 2001), and Generalized Latent Semantic Analysis (Matveeva, Levow, Farahat, & Royer, 2005). We compare all these techniques on a unique corpus (TASA) and, for PMI and GLSA, we also report performance on a larger web-based corpus. The evaluation is carried out through two kinds of tests: (1) synonymy tests, and (2) comparison with human word similarity judgments. The results indicate that for large corpora PMI works best on word similarity tests, and GLSA on synonymy tests. For the smaller TASA corpus, GLSA produced the best performance on most tests. A large corpus improved the performance of PMI, but, in most cases, did not improve that of GLSA.


References

  • 1. Marilyn Hughes Blackmon, Muneo Kitajima, Peter G. Polson, Tool for Accurately Predicting Website Navigation Problems, Non-problems, Problem Severity, and Effectiveness of Repairs, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 02-07, 2005, Portland, Oregon, USA doi:10.1145/1054972.1054978
  • 2. Raluca Budiu, Peter Pirolli, Michael Fleetwood, Navigation in Degree of Interest Trees, Proceedings of the Working Conference on Advanced Visual Interfaces, May 23-26, 2006, Venezia, Italy doi:10.1145/1133265.1133358
  • 3. Stuart K. Card, Jock D. Mackinlay, Ben Shneiderman, Information Visualization, Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1999
  • 4. Ed H. Chi, Adam Rosien, Gesara Supattanasiri, Amanda Williams, Christiaan Royer, Celia Chow, Erica Robles, Brinda Dalal, Julie Chen, Steve Cousins, The Bloodhound Project: Automating Discovery of Web Usability Issues Using the InfoScentπ Simulator, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA doi:10.1145/642611.642699
  • 5. Cho, J., Garcia-Molina, H., Haveliwala, T., Lam, W., Paepcke, A., Raghavan, S., & Wesley, G. (2004). Stanford WebBase Components and Applications (Tech. Rep.). Stanford University.
  • 6. Placing Search in Context: The Concept Revisited, ACM Transactions on Information Systems (TOIS), v.20 n.1, p.116-131, January 2002 doi:10.1145/503104.503110
  • 7. Jarmasz, M., & Szpakowicz, S. (2003). Roget's Thesaurus and Semantic Similarity. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2003) (p. 212--219). Borovets, Bulgaria.
  • 8. Ishwinder Kaur, Anthony J. Hornof, A Comparison of LSA, WordNet and PMI-IR for Predicting User Click Behavior, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 02-07, 2005, Portland, Oregon, USA doi:10.1145/1054972.1054980
  • 9. Landauer, T. K., & Dumais, S. (1997). A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. Psychological Review, 104, 211--240.
  • 10. Landauer, T. K., Foltz, P., & Laham, D. (1998). An Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259--284.
  • 11. Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA, 1999
  • 12. Matveeva, I., Levow, G., Farahat, A., & Royer, C. (2005). Terms Representation with Generalized Latent Semantic Analysis. In Proceedings of the Recent Advances in Natural Language Processing Conference (RANLP 2005).
  • 13. Miller, G., & Charles, W. (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1--28.
  • 14. George A. Miller, WordNet: A Lexical Database for English, Communications of the ACM, v.38 n.11, p.39-41, Nov. 1995 doi:10.1145/219717.219748
  • 15. Nakov, P., Valchanova, E., & Angelova, G. (2003). Towards Deeper Understanding of the Lsa Performance. In Proceedings of the Recent Advances in Natural Language Processing Conference ((RANLP 2003) (p. 311--318). Borovetz, Bulgaria.
  • 16. Nelson, D. L., Dyrdal, G. M., & Goodmon, L. B. (2005). What is Preexisting Strength? Predicting Free Association Probabilities, Similarity Ratings, and Cued Recall Probabilities. Psychonomic Bulletin & Review, 12, 711--719.
  • 17. Yoshiki Niwa, Yoshihiko Nitta, Co-occurrence Vectors from Corpora Vs. Distance Vectors from Dictionaries, Proceedings of the 15th Conference on Computational Linguistics, August 05-09, 1994, Kyoto, Japan doi:10.3115/991886.991938
  • 18. Pirolli, P. (2005). Rational Analyses of Information Foraging on the Web. Cognitive Science, 29(3), 343--373.
  • 19. Pirolli, P., & Card, S. (1999). Information Foraging. Psychological Review.
  • 20. Peter Pirolli, Stuart K. Card, Mija M. Van Der Wege, The Effect of Information Scent on Searching Information: Visualizations of Large Tree Structures, Proceedings of the Working Conference on Advanced Visual Interfaces, p.161-172, May 2000, Palermo, Italy doi:10.1145/345513.345304
  • 21. Philip Resnik, Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proceedings of the 14th International Joint Conference on Artificial Intelligence, p.448-453, August 20-25, 1995, Montreal, Quebec, Canada
  • 22. Rohde, D., Gonnerman, L., & Plaut, D. (2006). An Improved Model of Semantic Similarity based on Lexical Co-occurence. (Manuscript Submitted to Cognitive Science)
  • 23. Herbert Rubenstein, John B. Goodenough, Contextual Correlates of Synonymy, Communications of the ACM, v.8 n.10, p.627-633, Oct. 1965 doi:10.1145/365628.365657
  • 24. Spool, J., Perfetti, C., & Brittan, D. (2004). Designing for the Scent of Information. UI Engineering.
  • 25. Egidio Terra, C. L. A. Clarke, Frequency Estimates for Statistical Word Similarity Measures, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.165-172, May 27-June 01, 2003, Edmonton, Canada doi:10.3115/1073445.1073477
  • 26. Peter D. Turney, Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL, Proceedings of the 12th European Conference on Machine Learning, p.491-502, September 05-07, 2001
  • 27. Zeno, S., Ivens, S., Millard, R., & Duvvuri, R. (1995). The Educator's Word Frequency Guide. Touchstone Applied Science Associates (TASA), Inc.

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 ModelingInformationScentAComparRaluca Budiu
Christiaan Royer
Peter Pirolli
Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora2007