2008 BigData

(Howe et al., 2008) ⇒ Doug Howe, Maria Costanzo, Petra Fey, Takashi Gojobori, Linda Hannick, Winston Hide, David P. Hill, Renate Kania, Mary Schaeffer, Susan St Pierre, Simon Twigger, Owen White, Seung Yon Rhee. (2008). “Big Data: The future of biocuration.” In: Nature, 455. doi:10.1038/455047a

Subject Headings: Biocuration Task, Biocurator.

Notes

It points to the http://www.1000genomes.org project as an example of the continued increase in data sources.
It recommends that Authors, Journals and Curators work to facilitate the Data Exchange between Scientific Publication data and Scientific Databases.
It recommends that Curators, Researchers, and Academic Institution develop "an accepted recognition structure to facilitate community-based curation efforts".
It recommends that Curators, Researchers, Academic Institution, and Funding Agencyies evolve Scientific Curation as a professional career.

Cited By

(Cusick et al., 2009) ⇒ Michael E Cusick, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan, Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-François Rual, Heather Borick, Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki, David E Hill, Joseph R Ecker, Frederick P Roth, and Marc Vidal. (2009). “Literature-Curated Protein Interaction Datasets.” In: Nature Methods 6, 39 - 46 (2009)

Quotes

Abstract

To thrive, the field that links biologists and their data urgently needs structure, recognition and support.

The exponential growth in the amount of biological data means that revolutionary measures are needed for data management, analysis and accessibility. Online databases have become important avenues for publishing biological data. Biocuration, the activity of organizing, representing and making biological information accessible to both humans and computers, has become an essential part of biological discovery and biomedical research. But curation increasingly lags behind data generation in funding, development and recognition.

We propose three urgent actions to advance this key field. First, authors, journals and curators should immediately begin to work together to facilitate the exchange of data between journal publications and databases. Second, in the next five years, curators, researchers and university administrations should develop an accepted recognition structure to facilitate community-based curation efforts. Third, curators, researchers, academic institutions and funding agencies should, in the next ten years, increase the visibility and support of scientific curation as a professional career.

Failure to address these three issues will cause the available curated data to lag farther behind current biological knowledge. Researchers will observe an increasing occurrence of obvious gaps in knowledge. As these gaps expand, resources will become less effective for generating and testing hypotheses, and the usefulness of curated data will be seriously compromised.

When all the data produced or published are curated to a high standard and made accessible as soon as they become available, biological research will be conducted in a manner that is quite unlike the way it is done now. Researchers will be able to process massive amounts of complex data much more quickly. They will garner insight about the areas of their interest rapidly with the help of inference programs. Digesting information and generating hypotheses at the computer screen will be so much faster that researchers will get back to the bench quickly for more experiments. Experiments will be designed with more insight; this increased specificity will cause an exponential growth in knowledge, much as we are experiencing exponential growth in data today.

Data avalanche

Such data, produced at great effort and expense, are only as useful as researchers' ability to locate, integrate and access them. In recent years, this challenge has been met by a growing cadre of biologists — 'biocurators' — who manage raw biological data, extract information from published literature, develop structured vocabularies to tag data and make the information available online.

References

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. Nucl. Acid. Res. 36, D25–D30 (2008). | Article | ChemPort |
Wheeler, D. L. et al. Nucl. Acid. Res. 36, D13–D21 (2008). | Article | ChemPort |
Salimi, N. & Vita, R. PLoS Comput. Biol. 2, e125 (2006). | Article | PubMed | ChemPort |
Brazma, A. et al. Nature Genet. 29, 365–371 (2001). | Article |
Deutsch, E. W. et al. Nature Biotechnol. 26, 305–312 (2008). | Article |
Field, D. et al. Nature Biotechnol. 26, 541–547 (2008). | Article |
Jenkins, H. et al. Nature Biotechnol. 22, 1601–1606 (2004). | Article |
Orchard, S. et al. Nature Biotechnol. 25, 894–898 (2007). | Article |
Taylor, C. F. et al. Nature Biotechnol. 25, 887–893 (2007). | Article |
Bourne, P. PLoS Comput. Biol. 1, 179–181 (2005). | PubMed | ChemPort |
Seringhaus, M. R. & Gerstein, M. B. BMC Bioinformatics 8, 17 (2007). | Article | PubMed | ChemPort |
Seringhaus, M. & Gerstein, M. FEBS Lett. 582, 1170 (2008). | Article | PubMed | ChemPort |
Ort, D. R. & Grennan, A. K. Plant Physiol. 146, 1022–1023 (2008). | Article | PubMed | ChemPort |
Burkhardt, K., Schneider, B. & Ory, J. PLoS Comput. Biol. 2, e99 (2006). | Article | PubMed | ChemPort |
Rhee, S. Y. Plant Physiol. 134, 543–547 (2004). | Article | PubMed | ChemPort |
Mons, B. et al. Genome Biol. 9, R89 (2008). | Article | PubMed | ChemPort |
Huss, J. W. et al. PLoS Biol. 6, e175 (2008). | Article | PubMed | ChemPort |
Palmer, C. L., Heidorn, P. B., Wright, D. & Cragin, M. H. International J. Dig. Curation 2, 31–40 (2007).

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 BigData	Doug Howe Maria Costanzo Petra Fey Takashi Gojobori Linda Hannick Winston Hide David P. Hill Renate Kania Mary Schaeffer Susan St Pierre Simon Twigger Owen White Seung Yon Rhee			Big Data: The future of biocuration		Nature		10.1038/455047a		2008