Reinforcement Learning Task: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
No edit summary
 
(33 intermediate revisions by 4 users not shown)
Line 1: Line 1:
A [[Reinforcement Learning Task]] is an [[Active Learning Task|active learning task]]/[[Serial Decision Task|serial decision task]] with a [[Cost Function|cost function]] and [[Constraint Rule|constraint rules]].
A [[Reinforcement Learning Task]] is an [[online reward-maximization task]] that requires the use of a [[reinforcement learning algorithm]] (which involves an agent learning to make decisions through trial and error, aiming to maximize cumulative rewards over time by interacting with a [[dynamic environment]]).
* <B><U>AKA</U>:</B> [[Reinforcement Learning]].
* <B>Context:</B>
* <B><U>Context</U>:</B>
** It can (often) involves the challenge of the [[Exploration/Exploitation Tradeoff]], requiring the agent to balance between exploring the environment to find new strategies and exploiting known strategies for maximum reward.
** It can be solved by a [[Reinforcement Learning System]] that implements a [[Reinforcement Learning Algorithm]].
** It can range from being a [[Discreate-Space Reinforcement Learning Task]] to being a [[Continuous-Space Reinforcement Learning Task]].
* <B><U>Example(s)</U>:</B>  
** …
** It can be an [[RL Benchmark Task]], such as http://www.rl-competition.org/
* <B>Example(s):</B>
** a [[k-Armed Bandit Task]].
** an [[RL-based Autonomous Helicopter Flight Task]], as presented in the paper "Robust Deep Reinforcement Learning for Quadcopter Control".
** <B>Counter-Example(s):</B>
** a [[RL-based Robot Control Task]], as detailed in "Adaptive Gain Scheduling using Reinforcement Learning for Quadcopter Control".
** an [[Unsupervised Learning Task]].
** a [[RL-based Game Playing Task]], ...
** a [[i.i.d. Learning Task]].
** an [[RL-based Autonomous System Task]], ...
* <B><U>See</U>:</B> [[Active Learning]], [[Exploration/Exploitation Tradeoff]].
** an [[RL-based Adaptive User Interface Task]], ...
** a [[RL-based Dynamic Item Recommendation Task]], ...
** a [[RL-based Real-Time Traffic Light Control Task]], ...
** a [[RL-based Personalized Healthcare Decision Support Task]], ...
** an [[RL-based Adaptive Energy Management Task]], ...
** an [[RL-based LLM Model Finetuning Task]], (using [[RLHF]]).
** [[Reward Shaping Task]].
** …
* <B>Counter-Example(s):</B>
** A [[Linear Regression]] task, where the goal is to fit a linear model to a dataset without any interactive decision-making process.
** A [[Clustering Task]], which involves grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups, without the use of rewards or interactive environments.
* <B>See:</B> [[Model-Based Reinforcement Learning]], [[Model-Free Reinforcement Learning]], [[Value Function]], [[Policy Function]], [[Reward Function]], [[State Transition Function]].
----
----
----
----
==References ==


===2009===
== References ==
* http://en.wikipedia.org/wiki/Reinforcement_learning
** Inspired by related psychological theory, in [[computer science]], '''reinforcement learning''' is a sub-area of [[machine learning]] concerned with how an ''agent'' ought to take ''actions'' in an ''environment'' so as to maximize some notion of long-term ''reward''. Reinforcement learning algorithms attempt to find a ''policy'' that maps ''states'' of the world to the actions the agent ought to take in those states. In [[economics]] and [[game theory]], reinforcement learning is considered as a [[bounded rationality|boundedly rational]] interpretation of how equilibrium may arise.
** The environment is typically formulated as a finite-state [[Markov decision process]] (MDP), and reinforcement learning algorithms for this context are highly related to [[dynamic programming]] techniques. State transition probabilities and reward probabilities in the MDP are typically stochastic but stationary over the course of the problem.
** Reinforcement learning differs from the [[supervised learning]] problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been mostly studied through the [[multi-armed bandit]] problem.
** Formally, the basic reinforcement learning model, as applied to MDPs, consists of:
*** a set of environment states <math>S</math>;
*** a set of actions <math>A</math>; and
*** a set of scalar "rewards" in <math> \Bbb{R}</math>.


----
=== 2021 ===
 
* ([[Patel et al., 2021]]) ⇒ [[Sahil Patel]], [[Ewoud Vos]], and [[Henk Wymeersch]]. ([[2021]]). “Robust Deep Reinforcement Learning for Quadcopter Control.” In: arXiv preprint arXiv:2111.03915. [URL](https://ar5iv.org/abs/2111.03915)
__NOTOC__
** NOTES: It introduces the use of Robust Markov Decision Processes (RMDP) and the Action Robust Deep Deterministic Policy Gradient (AR-DDPG) algorithm for robust drone control, demonstrating advanced RL techniques for handling uncertainties in quadcopter flight tasks.
<B>See:</B> [[Associative Reinforcement Learning]]; [[Autonomous Helicopter Flight Using Reinforcement Learning]]; [[Average-Reward Reinforcement Learning]]; [[Bayesian Reinforcement Learning]]; [[Dynamic Programming]]; [[Efficient Exploration in Reinforcement Learning]]; [[Gaussian Process Reinforcement Learning]]; [[Hierarchical Reinforcement Learning]]; [[Instance-Based Reinforcement Learning]]; [[Inverse Reinforcement Learning]]; [[Least Squares Reinforcement Learning Methods]]; [[Model-Based Reinforcement Learning]]; [[Policy Gradient Methods]]; [[Q-Learning]]; [[Relational Reinforcement Learning]]; [[Reward Shaping]]; [[Symbolic Dynamic Programming]]; [[Temporal Difference Learning]]; [[Value Function Approximation]]
----
----
==References==


===2011===
=== 2022 ===
* ([[Peter Stone, 2011b]]) &rArr; Peter Stone. (2011). "Reinforcement Learning." In: ([[Sammut & Webb, 2011]]) p.849
* ([[Timmerman et al., 2022]]) ⇒ [[Mike Timmerman]], [[Aryan Patel]], and [[Tim Reinhart]]. ([[2022]]). “Adaptive Gain Scheduling using Reinforcement Learning for Quadcopter Control.In: arXiv preprint arXiv:2403.07216. [URL](https://ar5iv.org/abs/2403.07216)
** NOTES: It discusses applying reinforcement learning to dynamically adjust the gains of a quadcopter controller, showcasing how RL can optimize robot control systems for improved performance and adaptability.


----
----
__NOTOC__
__NOTOC__
[[Category:Quality Silver]]
[[Category:Concept]]
[[Category:Concept]]
A [[Reinforcement Learning Task]] is a [[Machine Learning]] that ...
* <B>See:</B> [[Machine Learning]], [[Behaviorism]], [[Software Agent]], [[Game Theory]], [[Control Theory]], [[Operations Research]], [[Information Theory]], [[Simulation-Based Optimization]], [[Statistics]], [[Genetic Algorithm]], [[Optimal Control Theory]], [[Economics]], [[Bounded Rationality]].
----
----
==References==
=== 2013 ===
* (Wikipedia, 2013) &rArr; http://en.wikipedia.org/wiki/reinforcement_learning Retrieved:2013-12-4.
** '''Reinforcement learning''' is an area of [[machine learning]] inspired by [[Behaviorism|behaviorist psychology]], concerned with how [[software agent]]s ought to take ''actions'' in an ''environment'' so as to maximize some notion of cumulative ''reward''. The problem, due to its generality, is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[statistics]], and [[genetic algorithm]]s. In the operations research and control literature, the field where reinforcement learning methods are studied is called ''approximate dynamic programming''. The problem has been studied in the [[optimal control theory|theory of optimal control]], though most studies there are concerned with existence of optimal solutions and their characterization, and not with the learning or approximation aspects. <P> In [[economics]] and [[game theory]], reinforcement learning may be used to explain how equilibrium may arise under [[bounded rationality]]. <P> In machine learning, the environment is typically formulated as a [[Markov decision process]] (MDP), and many reinforcement learning algorithms for this context are highly related to [[dynamic programming]] techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible. <P> Reinforcement learning differs from standard [[supervised learning]] in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the [[multi-armed bandit]] problem and in finite MDPs.
----
[[Category:Concept]]
__NOTOC__

Latest revision as of 06:52, 23 September 2024

A Reinforcement Learning Task is an online reward-maximization task that requires the use of a reinforcement learning algorithm (which involves an agent learning to make decisions through trial and error, aiming to maximize cumulative rewards over time by interacting with a dynamic environment).



References

2021

  • (Patel et al., 2021) ⇒ Sahil Patel, Ewoud Vos, and Henk Wymeersch. (2021). “Robust Deep Reinforcement Learning for Quadcopter Control.” In: arXiv preprint arXiv:2111.03915. [URL](https://ar5iv.org/abs/2111.03915)
    • NOTES: It introduces the use of Robust Markov Decision Processes (RMDP) and the Action Robust Deep Deterministic Policy Gradient (AR-DDPG) algorithm for robust drone control, demonstrating advanced RL techniques for handling uncertainties in quadcopter flight tasks.

2022