2015 EstimatingtheCausalImpactofReco

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Recommender System, Product Recommender System, Causal Inference.

Notes

Cited By

2018

  • (Chaney et al., 2018) ⇒ Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt. (2018). “How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility.” In: Proceedings of the 12th ACM Conference on Recommender Systems, pp. 224-232.
    • QUOTE: ... Some work has also been done to understand the causal impact of these systems on behavior by finding natural experiments in observational data [ 53,55] (approximating expensive controlled experiments [33]), but it is unclear how well these results generalize …

2016

  • (Su et al., 2016) ⇒ Jessica Su, Aneesh Sharma, and Sharad Goel. (2016). “The Effect of Recommendations on Network Structure.” In: Proceedings of the 25th International Conference on World Wide Web, pp. 1157-1167.
    • QUOTE: We further find the system increased triadic closure and promoted the formation of uni-directional network ties. By treating the recommender’s introduction as a “natural experiment” [10, 25], whose precise timing was largely unrelated to other significant events, we are able to estimate the causal impact of the recommender on network structure, side-stepping concerns that often plague traditional observational analysis. ...

      ... In this context, the effect of the recommender is typically measured by counting clicks on the “Follow” button in the “Who to Follow” module. However, as has been noted previously [25], such an estimation scheme can lead to spurious results. In particular, recommendations may encourage follow actions even in the absence of a click (e.g., by increasing an individual’s awareness of thecandidate user, akin to brand advertising), leading one to underestimate the effect of the recommender. Conversely, when individuals click on the “Follow” button, they might have independently followed the recommended user even if they had not seen the recommendation, leading to an over-estimate. …

2015

Quotes

Abstract

Recommendation systems are an increasingly prominent part of the web, accounting for up to a third of all traffic on several of the world's most popular sites. Nevertheless, little is known about how much activity such systems actually cause over and above activity that would have occurred via other means (e.g., search) if recommendations were absent. Although the ideal way to estimate the causal impact of recommendations is via randomized experiments, such experiments are costly and may inconvenience users. In this paper, therefore, we present a method for estimating causal effects from purely observational data. Specifically, we show that causal identification through an instrumental variable is possible when a product experiences an instantaneous shock in direct traffic and the products recommended next to it do not. We then apply our method to browsing logs containing anonymized activity for 2.1 million users on Amazon.com over a 9 month period and analyze over 4,000 unique products that experience such shocks. We find that although recommendation click-throughs do account for a large fraction of traffic among these products, at least 75% of this activity would likely occur in the absence of recommendations. We conclude with a discussion about the assumptions under which the method is appropriate and caveats around extrapolating results to other products, sites, or settings.

1. INTRODUCTION

How much activity do recommendation systems cause? At first glance, answering this question may seem straightforward: given browsing data for a web site, simply count how many pageviews on the site come from clicks on recommendations and compare this to overall traffic. …

The ideal natural experiment, therefore, is one in which we not only see an exogenous shock to demand for a particular “focal” product, but where we also know that demand for a corresponding recommended product is constant. In the language of causal inference, a shock to the focal product can be treated as an instrumental variable [Dunning 2012; Morgan and Winship 2007] to identify the causal effect of the recommendation. When the demand for the recommended product is known to be constant, any increase in click-throughs from the focal product can be attributed to the recommender, and hence we can estimate its causal effect simply by dividing the observed change in recommendation click-throughs during the shock by the exogenous change in traffic over the same period.

2. RELATED WORK

There is an extensive body of work on recommender systems that seeks to evaluate such systems along various metrics including accuracy, diversity, utility, novelty and serendipity of the recommendations shown to users [Herlocker et al. 2004; McNee et al. 2006; Shani and Gunawardana 2011]. Among these many possible dimensions of recommender systems, we focus specifically on the role of recommendations in exposing users to items they would not have seen otherwise—a function that is closely related to the notion of serendipity, defined as recommending a “surprisingly interesting item a user might not have otherwise discovered” [Herlocker et al. 2004]—and thus, causing an increase in the volume of traffic on a website. Although our somewhat narrow focus on increasing volume clearly overlooks other potentially important functions of recommenders, it greatly simplifies the method logical challenges associated with estimating causal effects, allowing us to make progress.

Focusing specifically on volume, therefore, previous work on estimating the impact of recommendation systems can be classified into two broad categories: experimental and non-experimental approaches. In the experimental category, Dias et al. [2008] tracked usage of a recommendation system on a Swiss online grocer over a two year period following its introduction in May 2006, finding that both click-throughs and associated revenues increased over the study interval. Because they did not compare either total pageviews or revenue with a control condition (i.e., without recommendations), however, it is impossible to estimate how much of this increase was caused by the recommendation system itself versus some other source of demand. Subsequently, Jannach and Hegelich [2009] randomly assigned 155,000 customers of a mobile game platform to see either personalized or non-personalized recommendations, finding that personalized recommendations generated significantly more clicks and downloads than nonpersonalized recommendations. Compared with a prior no-recommendation condition, moreover, they estimated that personalized recommendations could have increased sales by as much as 3.6%. Finally, Belluf et al. [2012] conducted an experiment on a Latin American shopping website in which 600,000 users were randomly assigned to either receive or not receive recommendations for one month in 2012, finding that recommendations increased pageviews per user by 5-9%.

In the non-experimental category, Garfinkel et al. [2006] analyzed panel data comprising 156 books on Amazon.com and Barnes and Noble over a 52 day period. By conditioning on observable covariates, including previous day sales rank, they estimated that a single additional recommendation could improve the sales rank of a book by 3%. Although plausible in light of the results from experiments, this estimate is likely confounded by other sources of unobservable demand, hence it does not rule out that users would have arrived at the recommended books by some other means in the absence of recommendations. Oestreicher and Sundararajan [2012] and Lin et al. [2013] attempted to deal with this problem in a similar manner, studying books on Amazon and digital camera equipment on a Chinese e-commerce site respectively, by constructing sets of “complementary” products that were not recommended from the focal product but were likely to experience similar (unobserved) demand. Finally, Carmi et al. [2012] and Kummer [2013] also use sets of complementary products to establish conditional independence of demand to the focal and recommended products, but instead exploit exogenous shocks to identify casual effects of recommendations: Carmi et al. [2012] treat appearances on Oprah and in the New York Times Book Review as shocks to demand for books on Amazon, while Kummer [2013] treats natural disasters and listings on the front page of Wikipedia as shocks to the corresponding Wikipedia pages.

In general, the non-experimental papers find large effects of recommendations; for example, Oestreicher and Sundararajan estimated that a recommendation amplified demand covariance between otherwise complementary books as much as three-fold. Although this effect seems large relative to the results from experiments, it is hard to compare with them in part because it is expressed in terms of covariance of demand instead of actual demand, and in part because the demand itself is estimated from sales rank using a model [Chevalier and Goolsbee 2003]. More importantly, the assumption that the complementary sets do indeed experience the same demand as the recommended sets is critical to their results but ultimately difficult to verify.

Our contribution clearly belongs to the non-experimental category; however, it differs from previous work in three important respects. First, in contrast with rank-based proxies for overall demand used in many of the above studies, pageview volume from browser logs provides a direct and easily interpretable measure of demand. Second, in contrast with identification strategies that attempt to establish independence of demand for focal and recommended products indirectly, either by conditioning on observable covariates or by comparing correlations with complementary products, our strategy simply controls for demand on recommended products by selecting shocks for which direct traffic to recommended products is known to be constant (and therefore uncorrelated with the focal product). Finally, whereas previous work selects exogenous shocks by first imagining plausible scenarios (e.g., an appearance on Oprah driving traffic to Amazon, or a natural disaster driving traffic to Wikipedia) and then checking for impact, we can measure impact directly from browsing logs, thereby increasing the number and diversity of natural experiments to be analyzed.

3. DATA

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 EstimatingtheCausalImpactofRecoJake M. Hofman
Duncan J. Watts
Amit Sharma
Estimating the Causal Impact of Recommendation Systems from Observational Data2015