Data Mining Subject Area
A data mining subject area is a data analysis subject area that centers on data mining tasks, concepts, algorithms, and systems.
- AKA: KDD Domain.
- Context:
- It can include:
- a Data Mining Academic Discipline (that includes data mining education and data mining research)
- a Data Mining Practice (that includes a data mining industry).
- a Data Mining Terminology.
- It can be related to a: Statistics Subject Area, Machine Learning Subject Area, Databases Subject Area, ...
- It can be represented by a Data Mining Ontology (of DM concepts and DM relations).
- …
- It can include:
- Counter-Example(s):
- See: Data Miner, Data Mining Researcher, Knowledge Discovery from Database Process, Predictive Modeling Subject Area.
References
2013
- (Wikipedia, 2013) ⇒ http://en.wikipedia.org/wiki/Data_mining
- Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,[1][2][3] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
The term is a buzzword,[4] and is frequently misused to mean any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) but is also generalized to any kind of computer decision support system, including artificial intelligence, machine learning, and business intelligence. In the proper use of the word, the key term is discovery[citation needed], commonly defined as "detecting something new". Even the popular book "Data mining: Practical machine learning tools and techniques with Java"[5] (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons.[6] Often the more general terms "(large scale) data analysis", or “analytics” – or when referring to actual methods, artificial intelligence and machine learning – are more appropriate.
The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting are part of the data mining step, but do belong to the overall KDD process as additional steps.
The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.
Data mining uses information from past data to analyze the outcome of a particular problem or situation that may arise. Data mining works to analyze data stored in data warehouses that are used to store that data that is being analyzed. That particular data may come from all parts of business, from the production to the management. Managers also use data mining to decide upon marketing strategies for their product. They can use data to compare and contrast among competitors. Data mining interprets its data into real time analysis that can be used to increase sales, promote new product, or delete product that is not value-added to the company.
- Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,[1][2][3] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
- ↑ "Data Mining Curriculum". ACM SIGKDD. 2006-04-30. http://www.sigkdd.org/curriculum.php. Retrieved 2011-10-28.
- ↑ Clifton, Christopher (2010). "Encyclopædia Britannica: Definition of Data Mining". http://www.britannica.com/EBchecked/topic/1056150/data-mining. Retrieved 2010-12-09.
- ↑ Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction". http://www-stat.stanford.edu/~tibs/ElemStatLearn/. Retrieved 2012-08-07.
- ↑ See e.g. OKAIRP 2005 Fall Conference, Arizona State University, About.com: Datamining
- ↑ Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining: Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN 978-0-12-374856-0.
- ↑ Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (2010). "WEKA Experiences with a Java open-source project". Journal of Machine Learning Research 11: 2533–2541. "the original title, "Practical machine learning", was changed ... The term "data mining" was [added] primarily for marketing reasons."Template:Inconsistent citations
2009
- Quotes of usage
- A working definition of data mining is the discovery of interesting, unexpected, or valuable structures in large datasets.
- data mining is the discovery of interesting, unexpected or valuable structures in large datasets.
- We examine how data mining is used and outline some of its methods.
- data mining is defined as the identification of interesting structure in data.
- On the data mining front we have ...
- The recession has boosted the importance of data mining as more businesses search for clues to increase revenues and decrease expenses.
- How Europeans Are Using Data Mining
- Using data mining to find out the most vulnerable ...
- This area of data mining is known as predictive analytics.
- … the basis of data mining is to compress the given data by ...
- The promise of data mining is compelling, and convinces many.
- The goal of data mining is to extract ...
- Much of data mining is about leveraging existing data to make useful predictions.
- The third family line of data mining is machine learning, which ...
- http://searchsqlserver.techtarget.com/sDefinition/0,,sid87_gci211901,00.html
- DEFINITION - Data mining is sorting through data to identify patterns and establish relationships.
- Data mining parameters include:
- Association - looking for patterns where one event is connected to another event
- Sequence or path analysis - looking for patterns where one event leads to another later event
- Classification - looking for new patterns (May result in a change in the way the data is organized but that's ok)
- Clustering - finding and visually documenting groups of facts not previously known
- Forecasting - discovering patterns in data that can lead to reasonable predictions about the future (This area of data mining is known as predictive analytics.)
2000
- (Han & Kamber, 2000) ⇒ Jiawei Han, and Micheline Kamber. (2000). “Data Mining: Concepts and Techniques, 1st ed." Morgan Kaufmann. ISBN:1558604898
1999
- (Sukumar, 1999) ⇒ Rajagopal Sukumar. (1999). “Data Mining." Overview Presentation.
- (Zaiane, 1999) ⇒ Osmar Zaiane. (1999). “Glossary of Data Mining Terms." University of Alberta, Computing Science CMPUT-690: Principles of Knowledge Discovery in Databases.
- QUOTE: Data Mining: The extraction of hidden predictive information, patterns and correlations from large databases.
1998
- (Kohavi & Provost, 1998) ⇒ Ron Kohavi, and Foster Provost. (1998). “Glossary of Terms.” In: Machine Leanring 30(2-3).
- Data mining: The term data mining is somewhat overloaded. It sometimes refers to the whole process of knowledge discovery and sometimes to the specific machine learning phase.
1996
- (Fayyad et al., 1996d) ⇒ Usama M. Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. (1996). “From Data Mining to Knowledge Discovery in Databases.” In: AI Magazine, 17(3).
- Historically, the notion of finding useful patterns in data has been given a variety of names, including data mining, knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing. The term data mining has mostly been used by statisticians, data analysts, and the management information systems (MIS) communities. It has also gained popularity in the database field. The phrase knowledge discovery in databases was coined at the first KDD workshop in 1989 (Piatetsky-Shapiro 1991) to emphasize that knowledge is the end product of a data-driven discovery. It has been popularized in the AI and machine-learning fields.
- In our view, KDD refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step in this process. Data mining is the application of specific algorithms for extracting patterns from data. The distinction between the KDD process and the data-mining step (within the process) is a central point of this article. The additional steps in the KDD process, such as data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining, are essential to ensure that useful knowledge is derived from the data. Blind application of data-mining methods (rightly criticized as data dredging in the statistical literature) can be a dangerous activity, easily leading to the discovery of meaningless and invalid patterns.