Self-Play Game Learning Algorithm

Example(s):
- Selfplay Monte-Carlo Tree Search (MCTS).
See: Reinforcement Learning Algorithm, AlphaGo Zero, Deep Net Reinforcement Learning Algorithm.

References

(Heinrich & Silver, 2014) ⇒ Johannes Heinrich, and David Silver. (2014). “Self-play Monte-carlo Tree Search in Computer Poker.” In: Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence.
- QUOTE: Self-play reinforcement learning has proved to be successful in many perfect information two-player games. However, research carrying over its theoretical guarantees and practical success to games of imperfect information has been lacking. In this paper, we evaluate selfplay Monte-Carlo Tree Search (MCTS) in limit Texas Hold’em and Kuhn poker. We introduce a variant of the established UCB algorithm and provide first empirical results demonstrating its ability to find approximate Nash equilibria. Introduction Reinforcement learning has traditionally focused on stationary single-agent environments. Its applicability to fully observable multi-agent Markov games has been explored by (Littman 1996). Backgammon and computer Go are two examples of fully observable two-player games where reinforcement learning methods have achieved outstanding performance (Tesauro 1992; Gelly et al. 2012). Computer poker provides a diversity of stochastic imperfect information games of different sizes and has proved to be a fruitful research domain for game theory and artificial intelligence (Sandholm 2010; Rubin and Watson 2011). ...