Deep Net Reinforcement Learning Algorithm
(Redirected from Deep RL)
Jump to navigation
Jump to search
A Deep Net Reinforcement Learning Algorithm is a NNet reinforcement learning algorithm that is a deep neural net learning algorithm.
- Context:
- It can (typically) leverage multiple layers of artificial neurons to learn complex policies through trial-and-error interactions.
- It can (often) optimize policy or value functions using backpropagation and stochastic gradient descent.
- It can range from early techniques like Policy Gradients Algorithm to advanced methods like Proximal Policy Optimization (PPO).
- ...
- It can be implemented by a Deep Net Reinforcement Learning System (to solve a deep net reinforcement learning task).
- ...
- Example(s):
- Counter-Example(s):
- A Q-Learning Algorithm, which typically uses a shallow network or tabular method without deep learning.
- A Shallow NNet Reinforcement Learning Algorithm, which lacks the depth of a deep neural network.
- See: Shallow NNet Reinforcement Learning Algorithm, Deep Reinforcement Learning, AlphaGo, RLHF, Noam Brown.
References
2017
- https://davidbarber.github.io/blog/2017/11/07/Learning-From-Scratch-by-Thinking-Fast-and-Slow-with-Deep-Learning-and-Tree-Search/
- QUOTE: In current Deep Reinforcement Learning (RL) algorithms such as Policy Gradients and DQN, neural networks make action selections with no lookahead; this is analogous to System 1. Unlike human intuition, their training does not benefit from a ‘System 2’ to suggest strong policies.
2016a
- (Heinrich & Silver, 2016) ⇒ Johannes Heinrich, and David Silver. (2016). “Deep Reinforcement Learning from Self-play in Imperfect-information Games.” In: Proceedings of NIPS Deep Reinforcement Learning Workshop.
- QUOTE: In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged.
2016b
- (Mnih et al., 2016) ⇒ Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu. (2016). “Asynchronous Methods for Deep Reinforcement Learning.” In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48.
2015
- (Mnih et al., 2015) ⇒ Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. (2015). “Human-level Control through Deep Reinforcement Learning.” In: Nature, 518(7540).
2013
- (Mnih et al., 2013) ⇒ Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. (2013). “Playing Atari with Deep Reinforcement Learning.” arXiv preprint arXiv:1312.5602
1992
- (Williams, 1992) ⇒ Ronald J. Williams. (1992). "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning." In: *Machine Learning*, Volume 8, Pages 229-256. doi:10.1007/BF00992696.
- QUOTE: “This paper introduces the foundational policy gradient algorithm for reinforcement learning using function approximators.”
- NOTE: It establishes the principle of using gradients to optimize neural networks for sequential decision-making.