Deep Net Reinforcement Learning Algorithm

From GM-RKB

Jump to navigation Jump to search

A Deep Net Reinforcement Learning Algorithm is a NNet reinforcement learning algorithm that is a deep neural net learning algorithm.

Context:
- It can (typically) leverage multiple layers of artificial neurons to learn complex policies through trial-and-error interactions.
- It can (often) optimize policy or value functions using backpropagation and stochastic gradient descent.
- It can range from early techniques like Policy Gradients Algorithm to advanced methods like Proximal Policy Optimization (PPO).
- ...
- It can be implemented by a Deep Net Reinforcement Learning System (to solve a deep net reinforcement learning task).
- ...
Example(s):
- The one used by AlphaGo Zero.
- Policy Gradients Algorithm, (Williams, 1992).
- DQN Algorithm, (Mnih et al., 2015).
- Trust Region Policy Optimization (TRPO), (Schulman et al., 2015).
- Proximal Policy Optimization (PPO), (Schulman et al., 2017).
- …
Counter-Example(s):
- A Q-Learning Algorithm, which typically uses a shallow network or tabular method without deep learning.
- A Shallow NNet Reinforcement Learning Algorithm, which lacks the depth of a deep neural network.
See: Shallow NNet Reinforcement Learning Algorithm, Deep Reinforcement Learning, AlphaGo, RLHF, Noam Brown.

References

2017

https://davidbarber.github.io/blog/2017/11/07/Learning-From-Scratch-by-Thinking-Fast-and-Slow-with-Deep-Learning-and-Tree-Search/
- QUOTE: In current Deep Reinforcement Learning (RL) algorithms such as Policy Gradients and DQN, neural networks make action selections with no lookahead; this is analogous to System 1. Unlike human intuition, their training does not benefit from a ‘System 2’ to suggest strong policies.

2016a

(Heinrich & Silver, 2016) ⇒ Johannes Heinrich, and David Silver. (2016). “Deep Reinforcement Learning from Self-play in Imperfect-information Games.” In: Proceedings of NIPS Deep Reinforcement Learning Workshop.
- QUOTE: In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged.

2016b

(Mnih et al., 2016) ⇒ Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu. (2016). “Asynchronous Methods for Deep Reinforcement Learning.” In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48.

2015

(Mnih et al., 2015) ⇒ Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. (2015). “Human-level Control through Deep Reinforcement Learning.” In: Nature, 518(7540).

2013

(Mnih et al., 2013) ⇒ Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. (2013). “Playing Atari with Deep Reinforcement Learning.” arXiv preprint arXiv:1312.5602

1992

(Williams, 1992) ⇒ Ronald J. Williams. (1992). "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning." In: *Machine Learning*, Volume 8, Pages 229-256. doi:10.1007/BF00992696.
- QUOTE: “This paper introduces the foundational policy gradient algorithm for reinforcement learning using function approximators.”
- NOTE: It establishes the principle of using gradients to optimize neural networks for sequential decision-making.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Deep_Net_Reinforcement_Learning_Algorithm&oldid=909642"

Concept