Data Mining Task
A data mining task is a data science task that is a discovery task (requires the discovery of novel and useful patterns from large datasets to solve a real-world problems).
- AKA: Knowledge Discovery from Databases, KDD Task.
- Context:
- input: Data Set.
- output: Knowledge Representation.
- performance measure: Accuracy, Error Rate, ...
- It can be the focus of a Data Mining Academic Discipline, that can be:
- analyzed by Data Mining Research and reported at a Data Mining Conference or Data Mining Journal.
- taught Data Mining Education.
- applied by the Data Mining Practice.
- It can be a part of either a Posthoc Analysis Task or an Experimental Analysis Task.
- It can range from being an Exploratory Data Analysis Task (that formulates hypotheses) to being a Confirmatory Data Analysis Task (that tests hypotheses) to being a Predictive Modeling Task (that exploits hypotheses).
- It can be described by a Data Mining Case Study.
- It can range from being a General Data Mining Task to being a Domain Specific Data Mining Task.
- It can be performed by a Data Miner.
- It can be proposed by a Data Mining Researcher.
- It can (typically) require Data Mining Knoweldge (which can include Statistical Modeling Knowledge, Machine Learning Knowledge, Computer Science Knowledge, and Pattern Recognition Knowledge).
- It can range from being a Semantically Informed Data Mining Task to being a Semantically Uninformed Data Mining Task.
- …
- Example(s):
- a General Data Mining Task, such as Graph Mining, Text Mining, Association Rule Mining.
- a Domain-Specific Mining Task, such as Web mining, opinion mining, movie recommending, and business data mining.
- an Information Extraction Task, which might require the identification of Semantic Relations from the web in order to put structure to the data.
- …
- Counter-Example(s):
- a Parametric Modeling Task, because it cannot produce a Novel Pattern.
- a Machine Learning Task, such as reinforcement learning or inductive reasoning.
- a Character Recognition Task.
- a Game Playing Learning Task.
- an Autonomous Agent Learning Task.
- a Software Programming Task.
- a Website Designing Task.
- See: Knowledge Discovery.
References
2023
- chat
- Q: What is the difference between data science and data mining?
- A: Data science is a broad term that encompasses many aspects of processing and analyzing data. It includes data mining, but also includes machine learning, predictive analytics, statistics, and more. Data mining is a specific subdomain of data science that focuses on extracting data from large datasets. It is the process of finding patterns and trends from large datasets using algorithms and statistical models.
2013
- (Wikipedia, 2013) ⇒ http://en.wikipedia.org/wiki/Data_mining
- 'Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,[1][2][3] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
The term is a misnomer, because the goal is the extraction of patterns and knowledge from large amount of data, not the extraction of data itself.[4] It also is a buzzword,[5] and is frequently also applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The popular book "Data mining: Practical machine learning tools and techniques with Java"[6] (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons.[7] Often the more general terms "(large scale) data analysis", or “analytics” – or when referring to actual methods, artificial intelligence and machine learning – are more appropriate.
The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting are part of the data mining step, but do belong to the overall KDD process as additional steps.
The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.
- 'Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,[1][2][3] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
- ↑ "Data Mining Curriculum". ACM SIGKDD. 2006-04-30. http://www.sigkdd.org/curriculum.php. Retrieved 2011-10-28.
- ↑ Clifton, Christopher (2010). "Encyclopædia Britannica: Definition of Data Mining". http://www.britannica.com/EBchecked/topic/1056150/data-mining. Retrieved 2010-12-09.
- ↑ Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction". http://www-stat.stanford.edu/~tibs/ElemStatLearn/. Retrieved 2012-08-07.
- ↑ Han, Jiawei; Kamber, Micheline (2001). Data mining: concepts and techniques. Morgan Kaufmann. p. 5. ISBN 9781558604896. "Thus, data mining should habe been more appropriately named "knowledge mining from data," which is unfortunately somewhat long"
- ↑ See e.g. OKAIRP 2005 Fall Conference, Arizona State University, About.com: Datamining
- ↑ Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining: Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN 978-0-12-374856-0.
- ↑ Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (2010). "WEKA Experiences with a Java open-source project". Journal of Machine Learning Research 11: 2533–2541. "the original title, "Practical machine learning", was changed ... The term "data mining" was [added] primarily for marketing reasons."Template:Inconsistent citations
2000
- (Witten & Frank, 2000) ⇒ Ian H. Witten, and Eibe Frank. (2000). “Data Mining: Practical Machine Learning Tools and Techniques with Java implementations." Morgan Kaufmann.
- The data used for mining has almost certainly not been gathered expressly for that purpose.
1998
- (Kohavi & Provost, 1998) ⇒ Ron Kohavi, and Foster Provost. (1998). “Glossary of Terms.” In: Machine Leanring 30(2-3).
- Data mining: The term data mining is somewhat overloaded. It sometimes refers to the whole process of knowledge discovery and sometimes to the specific machine learning phase.