2012 ThompsonSamplingAnAsymptoticall
Jump to navigation
Jump to search
- (Kaufmann et al., 2012) ⇒ Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. (2012). “Thompson Sampling: An Asymptotically Optimal Finite-time Analysis.” In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory. ISBN:978-3-642-34105-2 doi:10.1007/978-3-642-34106-9_18
Subject Headings: Thompson Sampling.
Notes
Cited By
- http://scholar.google.com/scholar?q=%222012%22+Thompson+Sampling%3A+An+Asymptotically+Optimal+Finite-time+Analysis
- http://dl.acm.org/citation.cfm?id=2413763.2413787&preflayout=flat#citedby
Quotes
Abstract
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2012 ThompsonSamplingAnAsymptoticall | Rémi Munos Emilie Kaufmann Nathaniel Korda | Thompson Sampling: An Asymptotically Optimal Finite-time Analysis | 10.1007/978-3-642-34106-9_18 | 2012 |