2011 ThePractitionersViewpointtoData
- (Boire, 2011) ⇒ Richard Boire. (2011). “The Practitioner's Viewpoint to Data Mining: Key Lessons Learned in the Trenches and Case Studies.” In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2011). doi:10.1145/2020408.2020543
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%22The+practitioner's+viewpoint+to+data+mining%3A+key+lessons+learned+in+the+trenches+and+case+studies%22+2011
- http://portal.acm.org/citation.cfm?doid=2020408.2020543&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
In many data mining exercises, we see information that appears on the surface to demonstrate a particular conclusion. But closer examination of the data reveals that these results are indeed misleading. In this session, we will examine this notion of misleading results in three areas:
Statistical Issues
Statistical issues such as multicollinearity and outliers can impact results dramatically. We will first outline how these statistical issues can provide misleading results. At the same time, we will demonstrate how the data mining practitioner overcomes these issues through data analysis approaches that provide both more meaningful and non-misleading results to the business community.
Overstating of Results
From a business standpoint, we will also look at results that appear to be too good to be true. In other words, there appears to be some overstating of results within a given data mining solution. Initially, we will discuss how to identify these situations. Secondly, we will outline what causes this overstatement of results and detail our approach on how we would overcome this predicament.
Overfitting
Another topic for discussion is overfitting of results. This is particularly the case when building predictive models. In this section of the seminar, we will define what overfitting is and why it is becoming more relevant for understanding by the business community. Once again, analytical approaches will be discussed in terms of how to best handle this issue.
We present two case studies that demonstrate how our principled 4-step approach can be used to solve challenging data mining problems. These 4 steps are as follows:
How to identify the problem
How we construct the right data environment to conduct our analytics
What kind of analytics are employed which include techniques such as correlation analysis, EDA reports, logistic regression, and gains charts. More importantly, we discuss how to interpret the output in terms of the actual impact to the business (i.e. increased response rate and ultimately increased ROI.)
How do we apply the learning to a future initiative and what were the actual results
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2011 ThePractitionersViewpointtoData | Richard Boire | The Practitioner's Viewpoint to Data Mining: Key Lessons Learned in the Trenches and Case Studies | 10.1145/2020408.2020543 |