2021 ProductAnalyticsAppliedDataScie

From GM-RKB
(Redirected from Rodrigues, 2021)
Jump to navigation Jump to search
  • (Rodrigues, 2021) ⇒ Joanne Rodrigues. (2021). “Product Analytics: Applied Data Science Techniques for Actionable Consumer Insights.” Addison-Wesley. ISBN:9780135258521

Subject Headings: Product Usage Analytics.

Notes

Cited By

Quotes

Book Overview

Product Analytics bridges the divide between high-value business insights and today’s best statistics and machine learning techniques, offering practical qualitative and quantitative techniques to generate actionable insight from customer behavior.

Experienced data scientist and enterprise manager Joanne Rodrigues-Craig presents statistical techniques to determine why things happen, and how to change what people do at scale. She complements these with the social sciences’ most useful qualitative techniques for creating better theories, designing better metrics, and driving more rapid and sustained behavior change. Students will learn through intuitive examples from both web products and “real life,” including numeric examples illuminating hypothesis testing, regression, matching, uplift modeling and other statistical techniques. Discover how to:

  • Think like a social scientist to contextualize individual behavior in social environments, explore how human behavior develops, and establish the conditions for change
  • Develop core metrics and effective KPIs for user analytics in any web product.
  • Understand statistical inference, the differences between correlation and causation and when to apply each technique
  • Conduct more effective A/B tests
  • Build intuitive predictive models to capture user behavior in product
  • Using the latest quasi-experimental design techniques and statistical matching tease out causal effects from observational data
  • Implement sophisticated targeting methods like uplift modeling for marketing campaigns
  • Project business costs/subgroup population changes by using advanced demographic projection methods
  • Do all this in R (sample code available in a separate code manual)
Theme 1
Qualitative versus Quantitative Techniques

The first subtheme goes to the heart of this text. The goal is not just to provide analytical tools, but also to provide the resources needed to apply these analytical tools and examples where they are best applied for web applications. Many books within the data science or machine learning realm simply cover the underlying algorithms. While algorithms do play an important role, the cliché “Garbage in, garbage out” comes to mind. Without appropriate data, the algorithms themselves are useless. Applying the wrong algorithm to the wrong problem can lead to a whole host of problems.

To properly apply an algorithm or design an experiment, we must go over the full process of theory building, conceptualization, operationalization, metric building, hypothesis testing, falsification, and more. A large number of qualitative tools are available that we can use to model human behavior and social processes accurately. If we fail to use these tools, we lose out on a great deal of information, nuance, and insight. We also might completely misunderstand “why,” “how,” or “what” users are doing in our web products. Chapters 1–3 examine the qualitative tools needed to understand and model behavior in web products.

Obtaining actionable insights requires understanding the context and the information stored in each variable. If one cannot connect broader conceptual ideas to analytical results, we’re not left with much of anything. A good friend who had a PhD in physics and who worked as a data scientist at a women’s clothing company illustrates this disconnect best. He loved physics and loved applying physics algorithms to any data set, but struggled to connect their results to the business context of interest. I would often ask him what insights he had derived about the women’s apparel business. He always answered that he had applied the latest “X” model with “some extremely complex tuning.” While applying complex, well-tuned algorithms to the right context is awesome, those algorithms can also be applied to the wrong set of data or used to hide the lack of true insight into a topic.

In practice, “actionable insights” do not rely on using the latest algorithm. Better algorithms generally will only slightly improve your results, but bad data will destroy any hope of gaining valuable insights. What is even more common than bad data is misinterpretation of accurate data—a surprisingly frequent occurrence in industry.

For this reason, it’s essential to have good qualitative methodologies in place before any data analysis begins, so we don’t end up with “garbage out.” Since raw data is often not well documented, it’s easy to misunderstand what a variable is measuring or counting. It’s imperative to understand exactly which steps users must take to get to a particular variable and what they have done to get a particular variable outcome. If you’re using a variable as a proxy for a conceptually complex idea, what pieces of that idea is this variable actually measuring? Having theories and good qualitative frameworks in place will allow for the most robust interpretation and actionable use of your data.

Theme 2
Causal Inference

The second theme is this book is the preference for causal inference over prediction. Many data science books are focused on predictive algorithms. This book provides a basic predictive toolkit consisting of the following algorithms: k-means, principal components analysis (PCA), linear regression, logistic regression, decision trees, support vector machines, and some time-series modeling techniques. The more advanced topics, such as difference-in-difference modeling, statistical matching, and uplift modeling, are related to causal inference.

The only exception is found in Chapter 9, which covers advanced predictive techniques from demography on population forecasting. In Chapter 9, we use predictive modeling techniques in a somewhat novel way to create better core user metrics (e.g., retention), understand subgroup population changes in our web product, and forecast future population. Generally, for the analysis of user behavior, causal inference is preferred to prediction.

Theme 3
Layman’s Explanations

This book was written because most books about data science, statistical causal inference, or demography are extremely academic and proof-laden. While that is necessary in some contexts, mathematically heavy texts are inaccessible to the common person. Most of these tools don’t need mathematically heavy explanations and can be extremely easy to apply with a minimal understanding of R. Statistical data science and causal inference tools are useful in many business contexts, but are rarely applied in those settings due to their inaccessibility.

The goal of this book is to make all of this information accessible to anyone who has completed high school–level mathematics and statistics. This is a little bit optimistic, since some of the topics—such as statistical matching, uplift modeling, and population forecasting—are extremely mathematically complex. The goal is to make them conceptually understandable first. Those readers with a minimal math background should get a general idea of how the algorithm works and when to apply it. After reading the book, readers should be able to find the right design and/or model to apply to their own specific use-cases. After determining the right setup and algorithm, they should be able to run their analysis in R. The core goal of the book is to teach readers how those algorithms generally work, in which situations they should apply particular algorithms in the user or web analytics context, and which tools in R they can apply to get the answers that they’re looking for.

In this book, we’ll sparsely use mathematical notation as it turn’s away non-mathematically inclined readers. Chapter’s from 1-6 will use as little mathematical notation as possible and we’ll verbally describe equations. After Chapter 6, the material becomes too mathematically intensive to not rely on not using mathematical notation and later chapters will occassionally use mathematical notation in the text. Organization of the Book

The goal of this book is to better model, understand, and change user behavior in web and mobile products. The book is organized in the following way:

   Chapters 1–3 explain qualitative tools and theories to model user behavior.
   Chapters 4–6 cover introductory statistical methods in product analytics.
   Chapters 7–9 explore predictive modeling and forecasting methods.
   Chapters 10–13 cover causal inference methods for real-world data.
   Chapters 14–16 implement the methods explained in the quantitative chapters in R.

Table of Contents

   Part I: Qualitative Methodology
   Chapter 1: Data in Action: A Model of a Dinner Party
   Chapter 2: Building a Theory of the Universe–The Social Universe
   Chapter 3: The Coveted Goal Post: How to Change User Behavior
   Part II: Basic Statistical Methods
   Chapter 4: Distributions in User Analytics
   Chapter 5: Retained? Metric Creation and Interpretation
   Chapter 6: Why Are My Users Leaving? The Ins and Outs of A/B Testing
   Part III: Predictive Methods
   Chapter 7: Modeling the User Space: k-Means and PCA
   Chapter 8: Predicting User Behavior: Regression, Decision Trees, and Support Vector Machines
   Chapter 9: Forecasting Population Changes in Product: Demographic Projections
   Part IV: Causal Inference Methods
   Chapter 10: In Pursuit of the Experiment: Natural Experiments and the Difference-in-Difference Design
   Chapter 11: In Pursuit of the Experiment Continued: Regression Discontinuity, Time Series Modelling, and Interrupted Time Series Approaches
   Chapter 12: Developing Heuristics in Practice: Statistical Matching and Hill’s Causality Conditions
   Chapter 13: Uplift Modeling
   Part V: Basic, Predictive, and Causal Inference Methods in R
   Chapter 14: Metrics in R
   Chapter 15: A/B Testing, Predictive Modeling, and Population Projection in R
   Chapter 16: Regression Discontinuity, Matching, and Uplift in R
   Conclusion

Part I: Qualitative Methodology

Chapter 1: Data in Action: A Model of a Dinner Party

Chapter 1, “Data in Action: A Model of a Dinner Party,” is an introductory chapter, which uses the metaphor of a dinner party to showcase common pitfalls that hinder understanding of user behavior. These pitfalls include that social data is often viewed as a “process,” rather than a problem. Social data often has xxino clear outcomes, has rampant problems of incomplete information, has large numbers of variables that are strongly interconnected, is a system that can be easily perturbed, and prevents us from easily inferring causality.

Chapter 2: Building a Theory of the Universe–The Social Universe

Chapter 2, “Building a Theory of the Social Universe,” reviews the scientific method and walks you through sociological tools of quantifying human behavior. Exploring ideas of conceptualization forces us to spend time thinking about “quantifying”—both what that means and what is lost in the process. Today, everything is moving toward metrics. The difficulty with replacing complex qualitative metrics with a few quantitative measures is that these measures can rarely capture the level of sophistication of the original human heuristics or the sophistication that a human expert would expect. Practitioners rarely delve deeply into the shortcomings of their metrics, which leads to even more misguided strategies.

Chapter 3: The Coveted Goal Post: How to Change User Behavior

Chapter 3, “The Coveted Goalpost: How to Change Human Behavior,” is about human behavior change. User analytics has shifted from demographic profiling to sophisticated methods of targeting and altering user behavior in your web product. What features are most likely to change user behavior? This chapter explores current theories of behavior change, the factors that are most likely to cause change, and the magnitude of a given change.

Part II: Basic Statistical Methods

Chapter 4: Distributions in User Analytics

Chapter 4, “Distributions in User Analytics,” takes you through basic statistical tools to start working with user data. In Chapter 5, “Retained? Metric Creation and Interpretation,” we explore the nitty-gritty of developing quantitative measures of key ideas. This chapter uses demographic ideas of period, age, and cohort to inform our metric development and expands our toolkit for measuring populations. In addition, Chapter 5 explores the benefits and shortfalls of working with commonly used metrics by working through examples from the four key areas in user analytics: acquisition, retention, engagement, and revenue.

Chapter 5: Retained? Metric Creation and Interpretation

Chapter 6: Why Are My Users Leaving? The Ins and Outs of A/B Testing

Chapter 6, “Why Are My Users Leaving? The Ins and Outs of A/B Testing,” is a practical how-to guide to A/B testing. What is an A/B test? How do you set one up? How do you analyze the results? This chapter also goes through statistical testing and simple power analysis. Finally, it explores the complexities of A/B testing, such as best courses of action for conflicting results between short- and long-run indicators.

Part III: Predictive Methods

Chapter 7: Modeling the User Space: k-Means and PCA

Chapter 7, “Modeling the User Space: k-Means and PCA,” and Chapter 8, “Predicting User Behavior: Regression, Decision Trees, and Support Vector Machines,” explore the basics of supervised and unsupervised learning. This introduction to pattern recognition focuses on graphical descriptions and examples to drive understanding. It’s a basic toolkit to help you with everyday explanatory or predictive analysis. It also underlies the more sophisticated statistical techniques in Chapters 10–13. Topics covered include k-means, PCA, linear regression, logistic regression, decision trees, and support vector machines.

Chapter 8: Predicting User Behavior: Regression, Decision Trees, and Support Vector Machines

Chapter 9: Forecasting Population Changes in Product: Demographic Projections

Chapter 9, “Forecasting Population Changes in Product: Demographic Projections,” covers ways to forecast general and subgroup population changes in your web product. It relies on tools of demographic population prediction to model user behavior in a multidimensional and unique way.

Part IV: Causal Inference Methods

Chapter 10: In Pursuit of the Experiment: Natural Experiments and the Difference-in-Difference Design

Chapter 11: In Pursuit of the Experiment Continued: Regression Discontinuity, Time Series Modelling, and Interrupted Time Series Approaches

Chapter 12: Developing Heuristics in Practice: Statistical Matching and Hill’s Causality Conditions

Chapter 13: Uplift Modeling

Part V: Basic, Predictive, and Causal Inference Methods in R

Chapter 14: Metrics in R

Chapter 15: A/B Testing, Predictive Modeling, and Population Projection in R

Chapter 16: Regression Discontinuity, Matching, and Uplift in R

Conclusion

Final Thoughts
This book provides an intermediate guide to user analytics and relies on both causal and predictive inference. After reading this book, you should be able to build theories about user behavior, test those theories, and generate actionable insights to improve your product. The tools and practical advice from this book can be used in almost any role—from marketing and project management to business analytics and entrepreneur.

Most data produced is observational, meaning that we must tease out causal relationships. Chapter 10, “In Pursuit of the Experiment: Natural Experiments and the Difference-in-Difference Modeling,” and Chapter 11, “In Pursuit of the Experiment, Continued,” go through some elementary techniques for deriving causal insights from observational data. These techniques include natural experiments, the difference-in-difference design, and regression discontinuity—all of which can help us derive actionable insights from real-world data. Chapter 12, “Developing Heuristics in Practice,” explores statistical matching and situations where causal inference is not possible or is not easy.

Predictive modeling with A/B testing can be a powerful combination. Chapter 13, “Uplift Modeling,” explores uplift modeling, a technique that combines the two and leads to improved user targeting.

The final section of the book implements all these techniques in R. Chapter 14, “Metrics in R,” runs through statistical distributions and metric calculation in R. Chapter 15, “A/B Testing, Predictive Modeling, and Population Projection in R,” discusses A/B testing, predictive modeling techniques, and population projection techniques in R. Chapter 16, “Regression Discontinuity, Matching, and Uplift in R,” introduces difference-in-difference modeling, statistical matching, and uplift modeling in R.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2021 ProductAnalyticsAppliedDataScieJoanne RodriguesProduct Analytics: Applied Data Science Techniques for Actionable Consumer Insights2021