AlphaZero System
An AlphaZero System is a ML-based system that implements an AlphaZero algorithm for computer-based game playing.
- Context:
- It can autonomously develop highly innovative strategies and tactics.
- It can (typically) operate without the need for traditional game theory strategies (opening books or endgame tables).
- It can (typically) rely on its ability to learn and improve through self-play.
- It can (typically) utilizes a unique combination of Monte Carlo tree search (MCTS) with a deep neural network that evaluates board positions and determines move probabilities.
- It can suggest the potential for AI to achieve expert-level performance in complex decisioning tasks beyond games.
- ...
- Example(s):
- ...
- Counter-Example(s):
- AlphaGo System.
- a traditional chess engines such as Stockfish.
- See: Deep Reinforcement Learning, Neural Network Architecture, AI Search Algorithm, Strategic Decision Making, Advanced Chess Engine, AI Game Theory Applications, Self-Play Learning Method, Artificial General Intelligence, Reinforcement Learning from Self-Play.
References
2024
- (Ruoss et al., 2024) ⇒ Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, and Tim Genewein. (2024). “Grandmaster-Level Chess Without Search.” doi:10.48550/arXiv.2402.04494
2024
- (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/AlphaZero Retrieved:2024-2-21.
- AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.
On December 5, 2017, the DeepMind team released a preprint paper introducing AlphaZero, which within 24 hours of training achieved a superhuman level of play in these three games by defeating world-champion programs Stockfish, Elmo, and the three-day version of AlphaGo Zero. In each case it made use of custom tensor processing units (TPUs) that the Google programs were optimized to use. AlphaZero was trained solely via self-play using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After four hours of training, DeepMind estimated AlphaZero was playing chess at a higher Elo rating than Stockfish 8; after nine hours of training, the algorithm defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws).[1] The trained algorithm played on a single machine with four TPUs. DeepMind's paper on AlphaZero was published in the journal Science on 7 December 2018; however, the AlphaZero program itself has not been made available to the public. In 2019, DeepMind published a new paper detailing MuZero, a new algorithm able to generalise AlphaZero's work, playing both Atari and board games without knowledge of the rules or representations of the game.
- AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedpreprint
2017
- (Silver, Hubert et al., 2017) ⇒ David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot et al. (2017). “Mastering Chess and Shogi by Self-play with a General Reinforcement Learning Algorithm.” arXiv preprint arXiv:1712.01815
- ABSTRACT: The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.