k-Armed Bandit Algorithm

Context:
- It can range from being an Exact k-Armed Bandit Algorithm to being an Approximate k-Armed Bandit Algorithm.
- ...
Example(s):
- a Poker k-Armed Bandit Algorithm (Vermorel et al., 2005).
- a Weighted Majority Algorithm (Littlestone et al., 1989).
- an Upper Confidence Bound (UCB) Algorithm, ...
- an Epsilon-Greedy Algorithm, ...
- a Thompson Sampling Algorithm, ...
- ...
Conter-Example(s):
- a Deterministic Game.
See: Online Optimization Algorithm, Explore-Exploit Algorithm, Game of Chance.

References

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/multi-armed_bandit#Bandit_strategies Retrieved:2015-11-22.
- A major breakthrough was the construction of optimal population selection strategies, or policies (that possess uniformly maximum convergence rate to the population with highest mean) in the work described below.

(Vermorel et al., 2005) ⇒ Joannès Vermorel, and Mehryar Mohri. (2005). “Multi-armed Bandit Algorithms and Empirical Evaluation.” In: Proceedings of the 16th European conference on Machine Learning. doi:10.1007/11564096_42
- QUOTE: Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several [k-Armed Bandit Algorithm|multi-armed bandit algorithms]]. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.