2001 DataMiningAtTheIntOfCompSciAndStats
Jump to navigation
Jump to search
- (Smyth, 2001) ⇒ Padhraic Smyth. (2001). “Data Mining at the Interface of Computer Science and Statistics.” In: (Grossman et al., 2001)
Subject Headings: Data Mining, statistics, pattern recognition, transaction data, correlation.
Notes
Cited By
~12 http://scholar.google.com/scholar?cites=2039945908716953323
Quotes
Abstract
- This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, involving the application of a variety of techniques from both computer science and statistics. The chapter discusses how computer scientists and statisticians approach data from different but complementary viewpoints and highlights the fundamental differences between statistical and computational views of data mining. In doing so we review the historical importance of statistical contributions to machine learning and data mining, including neural networks, graphical models, and flexible predictive modeling. The primary conclusion is that closer integration of computational methods with statistical thinking is likely to become increasingly important in data mining applications.
2. Is Data Mining Different from Statistics?
- Is data mining as currently practiced substantially different from conventional applied statistics? Certainly if one looks at the published commercial applications of data mining, such as the case studies presented in [BL00], one sees a heavy reliance on techniques that have their lineage in applied statistics. For example, decision trees are perhaps the single most widely-used modeling technique in commercial predictive data mining applications [Joh99, Koh00]. They are particularly popular because of their ability to both deal with heterogenous data types (they can easily handle both categorical and real-valued variables) and to find relatively low-dimensional parsimonious predictors for high-dimensional problems.
References
- M. J. A. Berry, and G. Linoff. (2000). “Mastering Data Mining: The Art and Science of Customer Relationship Management.” John Wiley and Sons,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2001 DataMiningAtTheIntOfCompSciAndStats | Padhraic Smyth | Data Mining at the Interface of Computer Science and Statistics | http://www.datalab.uci.edu/papers/dmchap.pdf | 2001 |