2013 DataScienceforBusinessWhatYouNe
- (Provost & Fawcett, 2013) ⇒ Foster Provost, and Tom Fawcett. (2013). “Data Science for Business: What You Need to Know About Data Mining and Data-analytic Thinking.” O'Reilly Media. ISBN:1449374298
Subject Headings: Business Data Mining Task, Business Task, Data Mining Task.
Notes
Cited By
Quotes
Book Overview
Data Science for Business is a new book by Foster Provost and Tom Fawcett intended for those who need to understand data science/data mining, and those who want to develop their skill at data-analytic thinking. Data Science for Business is not a book about algorithms. Instead it presents a set of fundamental principles for extracting useful knowledge from data. These fundamental principles are the foundation for many algorithms and techniques for data mining, but also underlie the processes and methods for approaching business problems data-analytically, evaluating particular data science solutions, and evaluating general data science plans.
- Design
The book builds up the reader's understanding of data science by discussing the fundamental principles in the context of business examples, and then shows specifically how the principles can provide understanding of many of the most common methods and techniques used in data science. After reading the book, the reader should be able to (i) discuss data science intelligently with data scientists and with other stakeholders, (ii) better understand proposals for data science projects and data science investments, and (iii) participate integrally in data science projects.
As one example, a fundamental principle of data science is that solutions for extracting useful knowledge from data must carefully consider the problem from the business perspective. This may sound obvious at first, but the notion underlies many choices that must be made in the process of data analytics, including problem formulation, method choice, solution evaluation, and general strategy formulation. Another fundamental principle is that some data items can give us information about other data items. This principle manifests itself throughout data science: in the basic notion of finding “correlations” among variables, in the specific design of many particular data mining procedures, and more generally as the basis for all predictive analytics.
- Audience
Data Science for Business is intended for business people who will be managing or working with data scientists, for developers who will be implementing data science solutions, as well as for aspiring data scientists. By its very nature the material is somewhat technical --- the goal is to really understand data science, not to give a high-level overview. However, the book does not presume a sophisticated mathematical background, relegating the few technical details to optional "starred" sections.
Table of Contents
Chapter 1 Introduction: Data-Analytic Thinking The Ubiquity of Data Opportunities Example: Hurricane Frances Example: Predicting Customer Churn Data Science, Engineering, and Data-Driven Decision Making Data Processing and “Big Data” From Big Data 1.0 to Big Data 2.0 Data and Data Science Capability as a Strategic Asset Data-Analytic Thinking This Book Data Mining and Data Science, Revisited Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist Summary Chapter 2 Business Problems and Data Science Solutions From Business Problems to Data Mining Tasks Supervised Versus Unsupervised Methods Data Mining and Its Results The Data Mining Process Implications for Managing the Data Science Team Other Analytics Techniques and Technologies Summary Chapter 3 Introduction to Predictive Modeling: From Correlation to Supervised Segmentation Models, Induction, and Prediction Supervised Segmentation Visualizing Segmentations Trees as Sets of Rules Probability Estimation Example: Addressing the Churn Problem with Tree Induction Summary Chapter 4 Fitting a Model to Data Classification via Mathematical Functions Regression via Mathematical Functions Class Probability Estimation and Logistic “Regression” Example: Logistic Regression versus Tree Induction Nonlinear Functions, Support Vector Machines, and Neural Networks Summary Chapter 5 Overfitting and Its Avoidance Generalization Overfitting Overfitting Examined Example: Overfitting Linear Functions * Example: Why Is Overfitting Bad? From Holdout Evaluation to Cross-Validation The Churn Dataset Revisited Learning Curves Overfitting Avoidance and Complexity Control Summary Chapter 6 Similarity, Neighbors, and Clusters Similarity and Distance Nearest-Neighbor Reasoning Some Important Technical Details Relating to Similarities and Neighbors Clustering Stepping Back: Solving a Business Problem Versus Data Exploration Summary Chapter 7 Decision Analytic Thinking I: What Is a Good Model? Evaluating Classifiers Generalizing Beyond Classification A Key Analytical Framework: Expected Value Evaluation, Baseline Performance, and Implications for Investments in Data Summary Chapter 8 Visualizing Model Performance Ranking Instead of Classifying Profit Curves ROC Graphs and Curves The Area Under the ROC Curve (AUC) Cumulative Response and Lift Curves Example: churn performance analytics for modeling performance analytics, for modeling churn Performance Analytics for Churn Modeling Summary Chapter 9 Evidence and Probabilities Example: Targeting Online Consumers With Advertisements Combining Evidence Probabilistically Applying Bayes’ Rule to Data Science A Model of Evidence “Lift” Example: Evidence Lifts from Facebook "Likes" Summary Chapter 10 Representing and Mining Text Why Text Is Important Why Text Is Difficult Representation Example: Jazz Musicians * The Relationship of IDF to Entropy Beyond Bag of Words Example: Mining News Stories to Predict Stock Price Movement Summary Chapter 11 Decision Analytic Thinking II: Toward Analytical Engineering Targeting the Best Prospects for a Charity Mailing Our Churn Example Revisited with Even More Sophistication Chapter 12 Other Data Science Tasks and Techniques Co-occurrences and Associations: Finding Items That Go Together Profiling: Finding Typical Behavior Link Prediction and Social Recommendation Data Reduction, Latent Information, and Movie Recommendation Bias, Variance, and Ensemble Methods Data-Driven Causal Explanation and a Viral Marketing Example Summary Chapter 13 Data Science and Business Strategy Thinking Data-Analytically, Redux Achieving Competitive Advantage with Data Science Sustaining Competitive Advantage with Data Science Attracting and Nurturing Data Scientists and Their Teams Examine Data Science Case Studies Be Ready to Accept Creative Ideas from Any Source Be Ready to Evaluate Proposals for Data Science Projects A Firm’s Data Science Maturity Chapter 14 Conclusion The Fundamental Concepts of Data Science What Data Can’t Do: Humans in the Loop, Revisited Privacy, Ethics, and Mining Data About Individuals Is There More to Data Science? Final Example: From Crowd-Sourcing to Cloud-Sourcing Final Words
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2013 DataScienceforBusinessWhatYouNe | Foster Provost Tom Fawcett | Data Science for Business: What You Need to Know About Data Mining and Data-analytic Thinking | 2013 |