1995 TheProblOfCompTheMostProbTreeInDOPParsing
- (Bod, 1995) ⇒ Rens Bod. (1995). “The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars.” In: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics (EACL 1995). doi:10.3115/976973.976989
Subject Headings: Syntactic Parser, Data Oriented Parsing.
Notes
Cited By
Quotes
Abstract
We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a data-oriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic tree-substitution grammar (STSG). In STSG, a tree can be generated by exponentially many derivations involving different elementary trees. The probability of a tree is equal to the sum of the probabilities of all its derivations.
We show that in STSG, in contrast with stochastic context-free grammar, the Viterbi algorithm cannot be used for computing a most probable tree of a string. We propose a simple modification of Viterbi which allows by means of a "select-random" search to estimate the most probable tree of a string in polynomial time.
Experiments with DOP on ATIS show that only in 68% of the cases, the most probable derivation of a string generates the most probable tree of that string. Therefore, the parse accuracy obtained by the most probable trees (96%) is dramatically higher than the parse accuracy obtained by the most probable derivations (65%).
It is still an open question whether the most probable tree of a string can be deterministically computed in polynomial time.
References
- M. van den Berg, R. Bod & R. Scha, (1994). “A Corpus-based Approach to Semantic Interpretation", Proceedings Ninth Amsterdam Colloquium, Amsterdam.
- E. Black, R. Garside and G. Leech, 1993. Statistically-Driven Computer Grammars o/English: The IBM/Lancaster Approach, Rodopi: Amsterdam- Atlanta.
- R. Bod, (1992). “A Computational Model of Language Performance: Data Oriented Parsing", Proceedings COLING'92, Nantes.
- R. Bod, 1993a. “Using an Annotated Corpus as a Stochastic Grammar", Proceedings European Chapter fo the ACL'93, Utrecht.
- R. Bod, 1993b. “Monte Carlo Parsing", Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy.
- R. Bod, 1993c. “Data Oriented Parsing as a General Framework for Stochastic Language Processing", in:
- K.Sikkel & A. Nijholt (eds.), Parsing Natural Language, TWLT6, Twente University.
- R. Bod, (1995). Enriching Linguistics with Statistics: Performance Models of Natural Language. PhD-thesis, University of Amsterdam (forthcoming).
- T. Briscoe and J. Carroll, (1993). “Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-based Grammars", Computational Linguistics 19(1), 25-59.
- T. Fujisaki, F. Jelinek, J. Cocke, E. Black and T. Nishino, (1989). “A Probabilistic Method for Sentence Disambiguation", Proceedings 1st International Workshop on Parsing Technologies, Pittsburgh.
- J.M. Hammersley and D.C. Handscomb, 1964. Monte Carlo Methods, Chapman and Hall, London.
- C.T. Hemphill, J.J. Godfrey and G.R. Doddington, 1990. “The ATIS spoken language systems pilot corpus". Proceedings DARPA Speech and Natural Language Workshop, Hidden Valley, Morgan Kaufmann.
- F. Jelinek, J.D. Lafferty and R.L. Mercer, 1990. Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684), Yorktown Heights.
- M. Kay, (1980). Algorithmic Schemata and Data Structures in Syntactic Processing. Report CSL-80-12, Xerox PARC, Palo Alto, Ca.
- D. Magerman and C. Weir, (1992). “Efficiency, Robustness and Accuracy in Picky Chart Parsing", Proceedings A CL'92, Newark, Delaware.
- M. Marcus, B. Santorini and M. Marcinkiewicz, 1993. “Building a Large Annotated Corpus of English: the Penn Treebank", Computational Linguistics 19(2).
- D. Rumelhart and J. McClelland, (1986). Parallel Distributed Processing, The MIT Press, Cambridge, Mass.
- R. Scha, (1990). “Language Theory and Language Technology; Competence and Performance" (in Dutch), in Q.A.M. de Kort & G.L.J. Leerdam (eds.), Computertoepassingen in de Neerlandistiek, Almere: Landelijke Vereniging van Neerlandici (LVVNjaarboek).
- Y. Schabes, (1992). “Stochastic Lexicalized Tree Adjoining Grammars", Proceedings COLING'92, Nantes.
- Y. Schabes and R, Waters, (1993). “Stochastic Lexicalized Context Free Grammars", Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy.
- K. Sima'an, R. Bod, S. Krauwer and R. Scha, 1994. “Efficient Disambiguation by means of Stochastic Tree Substitution Grammars", Proceedings International Conference on New Methods in Language Processing, UMIST, Manchester.
- A. Viterbi, 1967. “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm", IEEE Trans. Information Theory, IT-13, 260-269.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
1995 TheProblOfCompTheMostProbTreeInDOPParsing | Rens Bod | The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars | Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics | http://acl.ldc.upenn.edu/E/E95/E95-1015.pdf | 1995 |