Self-Play Reinforcement Learning Algorithm

From GM-RKB

(Redirected from Self-play reinforcement learning)

Jump to navigation Jump to search

A Self-Play Reinforcement Learning Algorithm is a reinforcement learning algorithm for game play that uses self-play methods.

See: Expert-Play Reinforcement Learning.

References

2017

(Silver et al., 2017) ⇒ David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. (2017). “Mastering the Game of Go Without Human Knowledge.” In: Nature, 550(7676).
- QUOTE: ... These neural networks were trained by supervised learning from human expert movses, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. ...

2016

(Heinrich & Silver, 2016) ⇒ Johannes Heinrich, and David Silver. (2016). “Deep Reinforcement Learning from Self-play in Imperfect-information Games.” In: Proceedings of NIPS Deep Reinforcement Learning Workshop.
- QUOTE: Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. ...

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Self-Play_Reinforcement_Learning_Algorithm&oldid=888193"

Concept