2003 KernelMethodsForRelationExtraction
Jump to navigation
Jump to search
- (Zelenko et al., 2003) ⇒ Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. (2003). “Kernel Methods for Relation Extraction.” In: Journal of Machine Learning Research, 3.
Subject Headings: Relation Mention Recognition Algorithm, Relational Data Kernel Function
Notes
Cited By
2006
- (Culotta et al., 2006) ⇒ Aron Culotta, Andrew McCallum, and Jonathan Betz. (2006). “Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text.” In: Proceedings of HLT-NAACL Conference (HLT-NAACL 2006). doi:10.3115/1220835.1220873
- (Zhang et al., 2006) ⇒ M. Zhang, J. Zhang, and J. Su. (2006). “Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel.” In: Proceedings of HLT-2006.
- Zelenko et al. (2003) develop a tree kernel for relation extraction. Their tree kernel is recursively defined in a top-down manner, matching nodes from roots to leaf nodes. For each pair of matching nodes, a subsequence kernel on their child nodes is invoked, which matches either contiguous or sparse subsequences of node.
Quotes
Abstract
We present an application of kernel methods to extracting relations from unstructured natural language sources. We introduce kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels. We use the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text. We experimentally evaluate the proposed methods and compare them with feature-based learning algorithms, with promising results.
References
- Steven P. Abney. Parsing by chunks. In Robert Berwick, Steven P. Abney, and Carol Tenny, editors, Principlebased parsing. Kluwer Academic Publishers, 1990.
- C. Aone, L. Halverson, T. Hampton, and M. Ramos-Santacruz. SRA: Description of the IE2 system used for MUC-7. In: Proceedings of MUC-7, 1998.
- C. Aone and M. Ramos-Santacruz. REES: A large-scale relation and event extraction system. In: Proceedings of the 6th Applied Natural Language Processing Conference, 2000.
- A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.
- D. M. Bikel, R. Schwartz, and R. M. Weischedel. An algorithm that learns what’s in a name. Machine Learning, 34(1-3):211–231, 1999.
- Michael Collins. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Proceedings of 40th Conference of the Association for Computational Linguistics, 2002.
- Michael Collins and N. Duffy. Convolution kernels for natural language. In: Proceedings of NIPS-2001, 2001.
- C. Cortes and Vladimir N. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.
- N. Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines (and Other Kernel-based Learning Methods). Cambridge University Press, 2000.
- R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley, New York, 1973.
- R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Seqience Analysis. Cambridge University Press, 1998.
- Dayne Freitag and Andrew McCallum. Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the 7th Conference on Artificial Intelligence (AAAI-00) and of the 12th Conference on Innovative Applications of Artificial Intelligence (IAAI-00), pages 584–589, Menlo Park, CA, July 30– 3 (2000). AAAI Press.
- Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.
- T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression. Bioinformatics, 16, 2000.
- L. Goldfarb. A new approach to pattern recognition. In Progress in pattern recognition 2. North Holland, 1985.
- T. Graepel, R. Herbrich, and K. Obermayer. Classification on pairwise proximity data. In Advances in Neural Information Processing Systems 11, 1999.
- (Haussler, 1999) ⇒ D. Haussler. (1999). “Convolution Kernels on Discrete Structures”. Technical Report UCSC-CLR-99-10, University of California at Santa Cruz.
- R. A. Horn and C. A. Johnson. Matrix Analysis. Cambridge University press, Cambridge, 1985.
- F. Jelinek. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, Massachusetts, 1997.
- Thorsten Joachims. Text categorization with support vector machines: learning with many relevant features. European Conference Mach. Learning, ECML98, April 1998.
- Thorsten Joachims. Learning Text Classifiers with Support Vector Machines. Kluwer Academic Publishers, Dordrecht, NL, 2002.
- John D. Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, pages 282–289. Morgan Kaufmann, San Francisco, CA, 2001.
- N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285, 1987.
- (Lodhi et al., 2002) ⇒ H. Lodhi, C. Saunders, John Shawe-Taylor, N. Cristianini, and C. Watkins. (2002). “Text classification using string kernels. The Journal of Machine Learning Research, vol:2.
- Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of 17th International Conference on Machine Learning, pages 591–598. Morgan Kaufmann, San Francisco, CA, 2000.
- (MillerCFRSSW, 1998) ⇒ S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group. (1998). “Algorithms that learn to extract information BBN: Description of the SIFT system as used for MUC-7.<>/i" In: Proceedings of MUC-7.
- M. Munoz, V. Punyakanok, Dan Roth, and D. Zimak. A learning approach to shallow parsing. Technical Report 2087, University of Illinois at Urbana-Champaign, Urbana, Illinois, 1999.
- National Institute of Standars and Technology. Proceedings of the 6th Message Undertanding Conference (MUC-7), 1998.
- E. Pekalska, P. Paclik, and R. Duin. A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, 2, 2001.
- Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1990.
- Frank Rosenblatt. Principles of Neurodynamics: Perceptrons and the theory of brain mechanisms. Spartan Books, Washington D.C., 1962.
- Dan Roth. Learning in natural language. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 898–904, S.F., July 31–August 6 (1999). Morgan Kaufmann Publishers.
- Dan Roth and W. Yih. Relational learning via propositional algorithms: An information extraction case study. In Bernhard Nebel, editor, Proceedings of the seventeenth International Conference on Artificial Intelligence (IJCAI-01), pages 1257–1263, San Francisco, CA, August 4–10 (2001). Morgan Kaufmann Publishers, Inc.
- D. Sankoff and J. Kruskal, editors. Time Warps, String Edits, and Macromolecules. CSLI Pulications, 1999.
- C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.
- Vladimir N. Vapnik. Statistical Learning Theory. John Wiley, 1998.
- (Watkins, 2000) ⇒ C. Watkins. (2000). “Dynamic alignment kernels.” In: A.J. Smola, P.L. Bartlett, B. Schlkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers.
,