BioText Search Engine
Jump to navigation
Jump to search
The BioText Search Engine is a Search Engine for text mining of research articles in BioSciences.
- Context:
- It is part of the Berkeley BioText Project.
- It uses the Lucene Indexing System.
- …
- Counter-Example(s):
- See: Metadata, Document Abstract, Keyword Search, Biomedicine, Ontology, Gene Ontology.
References
2017a
- (BioText, 2017) ⇒ "BioText Search Engine" http://biosearch.berkeley.edu/index.php?action=about Retrieved on 2017-05-27
- Developed as part of the BioText project at the University of California, Berkeley, the BioText Search Engine is a freely available Web-based application that provides biologists with new ways to access the scientific literature.
- The interface has been carefully designed according to usability principles and techniques. The system uses Lucene for the underlying indexing, and users can use all the Lucene operators in their search queries.
- The search engine is a work in progress and more functionality is being added over time.
2017b
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Text_mining#Academic_applications Retrieved:2017-5-21.
- The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.
Academic institutions have also become involved in the text mining initiative:
- The National Centre for Text Mining (NaCTeM), is the first publicly funded text mining centre in the world. NaCTeM is operated by the University of Manchester in close collaboration with the Tsujii Lab, University of Tokyo. NaCTeM provides customised tools, research facilities and offers advice to the academic community. They are funded by the Joint Information Systems Committee (JISC) and two of the UK Research Councils (EPSRC & BBSRC). With an initial focus on text mining in the biological and biomedical sciences, research has since expanded into the areas of social sciences.
- In the United States, the School of Information at University of California, Berkeley is developing a program called BioText to assist biology researchers in text mining and analysis.
- The Text Analysis Portal for Research (TAPoR), currently housed at the University of Alberta, is a scholarly project to catalogue text analysis applications and create a gateway for researchers new to the practice.
- The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.
2007
- (Hearst et al., 2007) ⇒ Marti A. Hearst, Anna Divoli, Harendra Guturu, Alex Ksikes, Preslav Nakov, Michael A. Wooldridge, and Jerry Ye. (2007). BioText Search Engine: beyond abstract search. In: Bioinformatics 23(16)
- ABSTRACT: The BioText project team participated in both tasks of the TREC 2003 genomics track. Key to our approach in the primary task was the use of an organism-name recognition module, a module for recognizing gene name variants, and MeSH descriptors. Text classification improved the results slightly. In the secondary task, the key insight was casting it as a classification problem of choosing between the title and the last sentence of the abstract, although MeSH descriptors helped somewhat in this task as well. These approaches yielded results within the top three groups in both tasks.
2003
- (Bathlotia et al., 2003) ⇒ Gaurav Bhalotia, Preslav Nakov, Ariel S. Schwartz, and Marti A. Hearst. (2003). “BioText Team Report for the TREC 2003 Genomics Track.” In: Proceedings of TREC 2003.
- ABSTRACT The BioText Search Engine is a freely available Web-based application that provides biologists with new ways to access the scientific literature. One novel feature is the ability to search and browse article figures and their captions. A grid view juxtaposes many different figures associated with the same keywords providing new insight into the literature. An abstract/title search and list view shows at a glance many of the figures associated with each article. The interface is carefully designed according to usability principles and techniques. The search engine is a work in progress, and more functionality will be added over time. Availability: http://biosearch.berkeley.edu