Reinforcement Learning (RL) Algorithm

From GM-RKB
(Redirected from RL algorithm)
Jump to navigation Jump to search

A Reinforcement Learning (RL) Algorithm is an online learning algorithm that can be implemented into a reinforcement learning system to solve an online reward maximization task (to maximize a cumulative reward metric).



References

2024

  • (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Reinforcement_learning Retrieved:2024-4-10.
    • Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

      Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with the goal of maximizing the long term reward, whose feedback might be incomplete or delayed.

      The environment is typically stated in the form of a Markov decision process (MDP), because many reinforcement learning algorithms for this context use dynamic programming techniques. The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the Markov decision process and they target large Markov decision processes where exact methods become infeasible.[1]

  1. Cite error: Invalid <ref> tag; no text was provided for refs named Li-2023

2019a

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reinforcement_learning Retrieved:2019-5-12.
    • Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Reinforcement learning is considered as one of three machine learning paradigms, alongside supervised learning and unsupervised learning.

      It differs from supervised learning in that labelled input/output pairs need not be presented, and sub-optimal actions need not be explicitly corrected. Instead the focus is finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). ...


2019b

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Q-learning Retrieved:2019-5-12.
    • Q-learning is a model-free reinforcement learning algorithm. The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. It does not require a model (hence the connotation "model-free") of the environment, and it can handle problems with stochastic transitions and rewards, without requiring adaptations.

      For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it maximizes the expected value of the total reward over any and all successive steps, starting from the current state.[1] Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time and a partly-random policy. "Q" names the function that returns the reward used to provide the reinforcement and can be said to stand for the "quality" of an action taken in a given state.[2]

  1. Melo, Francisco S. "Convergence of Q-learning: a simple proof" (PDF).
  2. Matiisen, Tambet (December 19, 2015). "Demystifying Deep Reinforcement Learning". neuro.cs.ut.ee. Computational Neuroscience Lab. Retrieved 2018-04-06.

2017

2016

1998

  • (Sutton & Barto, 1998) ⇒ Richard S. Sutton, and Andrew G. Barto. (1998). “Reinforcement Learning: An introduction." MIT Press. ISBN:0262193981
    • BOOK OVERVIEW: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment....

1996

  • (Kaelbling et al., 1996) ⇒ L. P. Kaelbling, M. L. Littman, and A. W. Moore. (1996). “Reinforcement Learning: A Survey.” In: Journal of Artificial Intelligence Research, Vol 4, (1996), 237-285
    • ABSTRACT: This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word “reinforcement. The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.

1997

1990