1992 ClassBasedNGramModelsOfNL

(Brown et al., 1992) ⇒ Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. (1992). “Class-based n-gram Models of Natural Language.” In: Computational Linguistics, 18(4).

Subject Headings: Word N-gram Model, Brown Word-Hierarchy Cluster, Brown et al Clustering Algorithm.

Notes

Cited By

Quotes

Abstract

We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.

References

1. Averbuch, A.; Bahl, L.; Bakis, R.; Brown, P.; Cole, A.; Daggett, G.; Das, S.; Davies, K.; Gennaro, S. De.; de Souza, P.; Epstein, E.; Fraleigh, D.; Jelinek, F.; Moorhead, J.; Lewis, B.; Mercer, R.; Nadas, A.; Nahamoo, D.; Picheny, M.; Shichman, G.; Spinelli, P.; Van Compernolle, D.; and Wilkens, H. (1987). “Experiments with the Tangora 20,000 word speech recognizer.” In: Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, Texas, 701--704.

2. Bahl, L. R.; Jelinek, F.; and Mercer, R. L. (1983). “A maximum likelihood approach to continuous speech recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2), 179--190.

3. Baum, L. (1972). “An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process .” Inequalities, 3, 1--8.

4. Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990

5. Arthur P. Dempster; Laird, N.; and Rubin, D. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” In: Journal of the Royal Statistical Society, 39(B), 1--38.

6. Feller, W. (1950). An Introduction to Probability Theory and its Applications, Volume I. John Wiley & Sons, Inc.

7. Robert G. Gallager, Information Theory and Reliable Communication, John Wiley & Sons, Inc., New York, NY, 1968

8. Good, I. (1953). “The population frequencies of species and the estimation of population parameters.” Biometrika, 40(3--4), 237--264.

9. Jelinek, F., and Mercer, R. L. (1980). “Interpolated estimation of Markov source parameters from sparse data.” In: Proceedings, Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands, 381--397.

10. Kuçera, H., and Francis, W. (1967). Computational Analysis of Present Day American English. Brown University Press.

11. Mays, E.; Damerau, F. J.; and Mercer, R. L. (1990). “Context-based spelling correction.” In: Proceedings, IBM Natural Language ITL. Paris, France, 517--522.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1992 ClassBasedNGramModelsOfNL	Peter F. Brown Peter V. deSouza Robert L. Mercer Vincent J. Della Pietra Jenifer C. Lai			Class-based N-gram Models of Natural Language		Computational Linguistics (CL) Research Area	http://acl.ldc.upenn.edu/J/J92/J92-4003.pdf			1992