Contextual Bandit Algorithm

A Contextual Bandit Algorithm is a bandit algorithm that can be implemented by a contextual bandit system (that solves a contextual bandit task).

- …
Counter-Example(s):
- a Context-Free Bandit Algorithm.
See: Reinforcement Learning Algorithm.

References

2017

Ashok Chandrashekar, Fernando Amat, Justin Basilico and Tony Jebara. (2017). “Artwork Personalization at Netflix.” In: Netflix Blog
- QUOTE: … Briefly, contextual bandits are a class of online learning algorithms that trade off the cost of gathering training data required for learning an unbiased model on an ongoing basis with the benefits of applying the learned model to each member context. In our previous unpersonalized image selection work, we used non-contextual bandits where we found the winning image regardless of the context. For personalization, the member is the context as we expect different members to respond differently to the images.
  A key property of contextual bandits is that they are designed to minimize regret. At a high level, the training data for a contextual bandit is obtained through the injection of controlled randomization in the learned model’s predictions. The randomization schemes can vary in complexity from simple epsilon-greedy formulations with uniform randomness to closed loop schemes that adaptively vary the degree of randomization as a function of model uncertainty. …

2011

(Li et al., 2011) ⇒ Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. (2011). “Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms.” In: Proceedings of the fourth ACM International Conference on Web search and data mining. ISBN:978-1-4503-0493-1 doi:10.1145/1935826.1935878
- ABSTRACT: Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. ommon practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.

2010

(Li et al., 2010) ⇒ Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. (2010). “A Contextual-bandit Approach to Personalized News Article Recommendation.” In: Proceedings of the 19th International Conference on World wide web. doi:10.1145/1772690.1772758
- QUOTE: Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. … we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.

Contextual Bandit Algorithm

References

2017

2011

2010

Navigation menu

Search