Scientific Data Mining Task
(Redirected from Scientific Data Mining)
Jump to navigation
Jump to search
A Scientific Data Mining Task is a domain-specific data analytics task that is restricted to scientific data.
- Context:
- It can range from being a Scientific Structured Data Mining Task to being a Scientific Unstructured Data Mining Task (such as scientific text mining).
- It can require a Scientific Entity NER, such as: protein NER.
- It can support a Scientific Information Extraction Task.
- …
- Example(s):
- See: Commercial Data Mining Task, Scientific Knowledge Discovery Task.
References
2020
- (Jiang and Shang, 2020) ⇒ Meng Jiang and Jingbo Shang. (2020). “Scientific Text Mining and Knowledge Graphs." Tutorial in the the 26th ACM SIGKDD Conferences on Knowledge Discovery and Data Mining (KDD-2020).
2008
- (Cho et al., 2008) ⇒ Yong Ju Cho, Naren Ramakrishnan, and Yang Cao. (2008). “Reconstructing Chemical Reaction Networks: Data Mining Meets System Identification.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401912
2001a
- (Mann et al., 2001) ⇒ Bob Mann, Roy Williams, Malcolm Atkinson, Ken Brodlie, Amos Storkey, and Chris Williams. (2001). “Scientific Data Mining, Integration, and Visualization." Report on the Workshop on Scientific Data Mining, Integration and Visualization (SDMIV).
- QUOTE: ... Much of the scientific data discussed at the workshop fell into three categories, and, while these do not represent an exhaustive list of scientific data types, much of the technology discussed in the meeting was directed to them. The three categories are:
- The datacube, or array, class - meaning an annotated block of data in one, two, or more dimensions. This includes time-series and spectra (one dimensional); images, frequency-time spectra, etc (two-dimensional); voxel datasets and hyperspectral images (three-dimensional), and so on. The highly-optimised chips of modern computers handle these data structures well.
- Records, or events, collected as a table. Also known as multi-parameter data. These datasets may come directly from an instrument (for example in a particle accelerator) or may be derived by picking features from a datacube (when stars are identified from an astronomical image). Relational databases hold these data effectively.
- Sequences of symbols, for example a biological gene is represented by ...
- QUOTE: ... Much of the scientific data discussed at the workshop fell into three categories, and, while these do not represent an exhaustive list of scientific data types, much of the technology discussed in the meeting was directed to them. The three categories are:
2001b
- (Grossman et al., 2001) ⇒ Robert L. Grossman, Chandrika Kamath, Philip Kegelmeyer, Vipin Kumar, and Raju R. Namburu, editors. (2001). “Data Mining for Scientific and Engineering Applications." Springer, Volume 2 of Massive Computing. ISBN:1402000332.
- Advances in technology are making massive data sets common in many scientific disciplines, such as astronomy, medical imaging, bio-informatics, combinatorial chemistry, remote sensing, and physics. To find useful information in these data sets, scientists and engineers are turning to data mining techniques. This book is a collection of papers based on the first two in a series of workshops on mining scientific datasets. It illustrates the diversity of problems and application areas that can benefit from data mining, as well as the issues and challenges that differentiate scientific data mining from its commercial counterpart. While the focus of the book is on mining scientific data, the work is of broader interest as many of the techniques can be applied equally well to data arising in business and web applications. Audience: This work would be an excellent text for students and researchers who are familiar with the basic principles of data mining and want to learn more about the application of data mining to their problem in science or engineering.
2001c
- (Ramakrishanan & Grama, 2001) ⇒ Naren Ramakrishnan, and Ananth Grama. (2001). “Mining Scientific Data.” In: Advances in Computers, 55.
- The past two decades have seen rapid advances in high performance computing and tools for data acquisition in a variety of scientific domains. Coupled with the availability of massive storage systems and fast networking technology to manage and assimilate data, these have given a significant impetus to data mining in the scientific domain. Data mining is now recognized as a key computational technology, supporting traditional analysis, visualization, and design tasks. Diverse applications in domains such as mineral prospecting, computer aided design, bioinformatics, and computational steering are now being viewed in the data mining framework. This has led to a very effective crossfertilization of computational techniques from both continuous and discrete perspectives. In this chapter, we characterize the nature of scientific data mining activities and identify dominant recurring themes. We discuss algorithms, techniques, and methodologies for their effective application and present application studies that summarize the state-of-the-art in this emerging field. We conclude by identifying opportunities for future