2012 ThompsonSamplingAnAsymptoticall

(Kaufmann et al., 2012) ⇒ Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. (2012). “Thompson Sampling: An Asymptotically Optimal Finite-time Analysis.” In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory. ISBN:978-3-642-34105-2 doi:10.1007/978-3-642-34106-9_18

Subject Headings: Thompson Sampling.

Notes

Cited By

Quotes

Abstract

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2012 ThompsonSamplingAnAsymptoticall	Rémi Munos Emilie Kaufmann Nathaniel Korda			Thompson Sampling: An Asymptotically Optimal Finite-time Analysis				10.1007/978-3-642-34106-9_18		2012

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2012_ThompsonSamplingAnAsymptoticall&oldid=850197"