2012 ThompsonSamplingAnAsymptoticall
- (Kaufmann et al., 2012) ⇒ Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. (2012). “Thompson Sampling: An Asymptotically Optimal Finite-time Analysis.” In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory. ISBN:978-3-642-34105-2 doi:10.1007/978-3-642-34106-9_18
Subject Headings: Thompson Sampling.
Notes
Cited By
Quotes
Abstract
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2012 ThompsonSamplingAnAsymptoticall | Rémi Munos Emilie Kaufmann Nathaniel Korda | Thompson Sampling: An Asymptotically Optimal Finite-time Analysis | 10.1007/978-3-642-34106-9_18 | 2012 |