Berkeley BioText Project
Jump to navigation
Jump to search
A Berkeley BioText Project is a web application project developed by the Berkeley University for text mining of Bioscience research articles.
- See: BioText Search Engine.
References
2017a
- (BioText Project, 2017) ⇒ http://biotext.berkeley.edu/ Retrieved on 2017-05-21
- Project Goals: When the project began, new methods and tools were needed to improve how bioscience researchers search for and synthesize information from textual descriptions of bioscience research. This project built a flexible, efficient, platform-independent database system infrastructure specifically geared towards supporting the advanced and particular search needs of bioscience researchers. It used this infrastructure to support the development and deployment of statistical approaches to natural language processing, which was used to identify entities and relations between them in bioscience texts.
2017b
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Text_mining#Academic_applications Retrieved:2017-5-21.
- The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.
Academic institutions have also become involved in the text mining initiative:
- The National Centre for Text Mining (NaCTeM), is the first publicly funded text mining centre in the world. NaCTeM is operated by the University of Manchester in close collaboration with the Tsujii Lab, University of Tokyo. NaCTeM provides customised tools, research facilities and offers advice to the academic community. They are funded by the Joint Information Systems Committee (JISC) and two of the UK Research Councils (EPSRC & BBSRC). With an initial focus on text mining in the biological and biomedical sciences, research has since expanded into the areas of social sciences.
- In the United States, the School of Information at University of California, Berkeley is developing a program called BioText to assist biology researchers in text mining and analysis.
- The Text Analysis Portal for Research (TAPoR), currently housed at the University of Alberta, is a scholarly project to catalogue text analysis applications and create a gateway for researchers new to the practice.
- The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.
2007
- (Hearst et al., 2007) ⇒ Marti A. Hearst, Anna Divoli, Harendra Guturu, Alex Ksikes, Preslav Nakov, Michael A. Wooldridge, and Jerry Ye. (2007). “BioText Search Engine: beyond abstract search.” In: Bioinformatics 23(16)
- ABSTRACT: The BioText project team participated in both tasks of the TREC 2003 genomics track. Key to our approach in the primary task was the use of an organism-name recognition module, a module for recognizing gene name variants, and MeSH descriptors. Text classification improved the results slightly. In the secondary task, the key insight was casting it as a classification problem of choosing between the title and the last sentence of the abstract, although MeSH descriptors helped somewhat in this task as well. These approaches yielded results within the top three groups in both tasks.
2003
- (Bathlotia et al., 2003) ⇒ Gaurav Bhalotia, Preslav Nakov, Ariel S. Schwartz, and Marti A. Hearst. (2003). “BioText Team Report for the TREC 2003 Genomics Track.” In: Proceedings of TREC 2003.
- ABSTRACT: The BioText Search Engine is a freely available Web-based application that provides biologists with new ways to access the scientific literature. One novel feature is the ability to search and browse article figures and their captions. A grid view juxtaposes many different figures associated with the same keywords providing new insight into the literature. An abstract/title search and list view shows at a glance many of the figures associated with each article. The interface is carefully designed according to usability principles and techniques. The search engine is a work in progress, and more functionality will be added over time. Availability: http://biosearch.berkeley.edu