1992 ClassBasedNGramModelsOfNL
(Redirected from Brown et al. 1992)
Jump to navigation
Jump to search
- (Brown et al., 1992) ⇒ Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. (1992). “Class-based n-gram Models of Natural Language.” In: Computational Linguistics, 18(4).
Subject Headings: Word N-gram Model, Brown Word-Hierarchy Cluster, Brown et al Clustering Algorithm.
Notes
Cited By
- Google Scholar: ~ 3,446 Citations.
- ACM DL: ~ 536 Citations.
- Microsoft Academic: ~ 3,468 Citations.
- CiteSeer: ~ 958 Citations
- Semantic Scholar: ~ 1,873 Citations
Quotes
Abstract
We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.
References
- 1. Averbuch, A.; Bahl, L.; Bakis, R.; Brown, P.; Cole, A.; Daggett, G.; Das, S.; Davies, K.; Gennaro, S. De.; de Souza, P.; Epstein, E.; Fraleigh, D.; Jelinek, F.; Moorhead, J.; Lewis, B.; Mercer, R.; Nadas, A.; Nahamoo, D.; Picheny, M.; Shichman, G.; Spinelli, P.; Van Compernolle, D.; and Wilkens, H. (1987). “Experiments with the Tangora 20,000 word speech recognizer.” In: Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, Texas, 701--704.
- 2. Bahl, L. R.; Jelinek, F.; and Mercer, R. L. (1983). “A maximum likelihood approach to continuous speech recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2), 179--190.
- 3. Baum, L. (1972). “An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process .” Inequalities, 3, 1--8.
- 4. Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990
- 5. Arthur P. Dempster; Laird, N.; and Rubin, D. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” In: Journal of the Royal Statistical Society, 39(B), 1--38.
- 6. Feller, W. (1950). An Introduction to Probability Theory and its Applications, Volume I. John Wiley & Sons, Inc.
- 7. Robert G. Gallager, Information Theory and Reliable Communication, John Wiley & Sons, Inc., New York, NY, 1968
- 8. Good, I. (1953). “The population frequencies of species and the estimation of population parameters.” Biometrika, 40(3--4), 237--264.
- 9. Jelinek, F., and Mercer, R. L. (1980). “Interpolated estimation of Markov source parameters from sparse data.” In: Proceedings, Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands, 381--397.
- 10. Kuçera, H., and Francis, W. (1967). Computational Analysis of Present Day American English. Brown University Press.
- 11. Mays, E.; Damerau, F. J.; and Mercer, R. L. (1990). “Context-based spelling correction.” In: Proceedings, IBM Natural Language ITL. Paris, France, 517--522.
,