1999 ConstrBioKBsByIE
- (Craven & Kumlien, 1999) ⇒ Mark Craven, and Johan Kumlien. (1999). “Constructing Biological Knowledge-bases by Extracting Information from Text Sources.” In: Proceedings of the International Conference on Intelligent Systems for Molecular Biology.
Subject Headings: Relation Detection from Text Algorithm, PPLRE Project, Distant-Supervision Learning Algorithm.
Notes
- It is one of the seminal publications of the application of [[weakly labeled dataset]]s (by Distant-Supervision Learning Algorithm).
Cited By
- ~317 http://scholar.google.com/scholar?q=%22Constructing+Biological+Knowledge-bases+by+Extracting+Information+from+Text+Sources%22+1999
- ~71 http://dl.acm.org/citation.cfm?id=645634.663209&preflayout=flat#citedby
Quotes
Abstract
Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underutilized source of biological information. We have begun a research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases. Our approach to this task is to use machine-learning methods to induce routines for extracting facts from text. We describe two learning methods that we have applied to this task a statistical text classification method, and a relational learning method and our initial experiments in learning such information-extraction routines. We also present an approach to decreasing the cost of learning information-extraction routines by learning from weakly" labeled training data.
References
- Andrade, M. A., and Valencia, A. (1997). Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. In: Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, 25{32. Halkidiki, Greece: AAAI Press.
- Boland, M. V.; Markey, M. K.; and Murphy, R. F. (1996). Automated classification of protein localization patterns. Molecular Biology of the Cell 8(346a).
- Califf, M. E. (1998). Relational Learning Techniques for Natural Language Extraction. Ph.D. Dissertation, Computer Science Department, University of Texas, Austin, TX. AI Technical Report 98-276.
- Cardie, C. (1997). Empirical methods in information extraction. AI Magazine 18(4):65{80.
- Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In: Proceedings of the Ninth European Conference on Artificial Intelligence, 147{ 150. Stockholm, Sweden: Pitman.
- Cowie, J., and Lehnert, W. (1996). Information extraction. Communications of the ACM 39(1):80{91.
- Pedro Domingos, and Michael J. Pazzani (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29:103{130.
- Dayne Freitag (1998). Multistrategy learning for information extraction. In: Proceedings of the Fifteenth International Conference on Machine Learning, 161{169. Madison, WI: Morgan Kaufmann.
- Fukuda, K.; Tsunoda, T.; Tamura, A.; and Takagi, T. (1998). Toward information extraction: Identifying protein names from biological papers. In Pacific Symposium on Biocomputing, 707{718.
- Genome Annotation Consortium. (1999). The genome channel. http://compbio.ornl.gov/tools/channel/.
- Hodges, P. E.; Payne, W. E.; and Garrels, J. I. (1998). Yeast protein database (YPD): A database for the complete proteome of saccharomyces cerevisiae. Nucleic Acids Research 26:68{72.
- Karp, P.; Riley, M.; Paley, S.; and Pellegrini-Toole, A. (1997). EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nucleic Acids Research 25(1).
- Lathrop, R. H.; Steffen, N. R.; Raphael, M. P.; Deeds-Rubin, S.; Michael J. Pazzani J.; Cimoch, P.; See, D. M.; and Tilles, J. G. (1998). Knowledge-based avoidance of drug-resistant HIV mutants. In: Proceedings of the Tenth Conference on Innovative Applications of Artificial Intelligence. Madison, WI: AAAI Press.
- Leek, T. (1997). Information extraction using hidden markov models. Master's thesis, Department of Computer Science and Engineering, University of California, San Diego, CA.
- Lewis, D. D., and Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, 81{93.
- Tom M. Mitchell. M. (1997). Machine Learning. New York: McGraw-Hill.
- National Center for Biotechnology Information. (1999). Entrez. http://www.ncbi.nlm.nih.gov/Entrez/.
- National Library of Medicine. 1999a. Pubmed. http://www.ncbi.nlm.nih.gov/PubMed/.
- National Library of Medicine. 1999b. Unified medical language system. http://www.nlm.nih.gov/research/umls/umlsmain.html.
- Ohta, Y.; Yamamoto, Y.; Okazaki, T.; Uchiyama, I.; and Takagi, T. (1997). Automatic construction of knowledge base from biological papers. In: Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, 218{225. Halkidiki, Greece: AAAI Press.
- Judea Pearl (1988). Probabalistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann.
- Porter, M. F. (1980). An algorithm for suffix stripping. Program 14(3):127{130.
- Provost, F., and Fawcett, T. (1998). Robust classification systems for imprecise environments. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, 706{713. Madison, WI: AAAI Press.
- J. Ross Quinlan (1990). Learning logical definitions from relations. Machine Learning 5:239{2666.
- Richards, B. L., and Mooney, R. J. (1992). Learning relations by pathfinding. In: Proceedings of the Tenth National Conference on Artificial Intelligence, 50{55. San Jose, CA: AAAI/MIT Press.
- Ellen Riloff (1996). An empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence 85:101{134.
- Ellen Riloff (1998). The sundance sentence analyzer. http://www.cs.utah.edu/projects/nlp/.
- Rost, B. (1996). PHD: Predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology 266:525{539.
- Slattery, S., and Craven, M. (1998). Combining statistical and relational methods for learning in hypertext domains. In: Proceedings of the Eighth International Conference on Inductive Logic Programming. Springer Verlag.
- Soderland, S. (1996). Learning Text Analysis Rules for Domain-speific Natural Language Processing. Ph.D. Dissertation, University of Massachusetts. Department of Computer Science Technical Report 96-087.
- Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine Learning.
- Swanson, D. R., and Smalheiser, N. R. (1997). An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence 91:183{203.
- Weeber, M., and Vos, R. (1998). Extracting expert medical knowledge from texts. In Working Notes of the Intelligent Data Analysis in Medicine and Pharmacology Workshop, 23{28.
- Xu, Y.; Mural, R. J.; Einstein, J. R.; Shah, M. B.; and Uberbacher, E. C. (1996). GRAIL: A multi-agent neural network system for gene identification. Proceedings of the IEEE 84(10):1544{1552.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
1999 ConstrBioKBsByIE | Mark Craven Johan Kumlien | Constructing Biological Knowledge-bases by Extracting Information from Text Sources | Proceedings of the International Conference on Intelligent Systems for Molecular Biology | http://www.biostat.wisc.edu/~craven/papers/ismb99.pdf | 1999 |