Reinforcement Learning Task: Difference between revisions

Revision as of 01:32, 5 December 2013

A Reinforcement Learning Task is an active learning task/serial decision task with a cost function and constraint rules.

AKA: Reinforcement Learning.
Context:
- It can be solved by a Reinforcement Learning System that implements a Reinforcement Learning Algorithm.
Example(s):
- It can be an RL Benchmark Task, such as http://www.rl-competition.org/
- a k-Armed Bandit Task.
- Counter-Example(s):
- an Unsupervised Learning Task.
- a i.i.d. Learning Task.
See: Active Learning, Exploration/Exploitation Tradeoff, Associative Reinforcement Learning; Autonomous Helicopter Flight Using Reinforcement Learning; Average-Reward Reinforcement Learning; Bayesian Reinforcement Learning; Dynamic Programming; Efficient Exploration in Reinforcement Learning; Gaussian Process Reinforcement Learning; Hierarchical Reinforcement Learning; Instance-Based Reinforcement Learning; Inverse Reinforcement Learning; Least Squares Reinforcement Learning Methods; Model-Based Reinforcement Learning; Policy Gradient Methods; Q-Learning; Relational Reinforcement Learning; Reward Shaping; Symbolic Dynamic Programming; Temporal Difference Learning; Value Function Approximation, Behaviorism, Software Agent, Game Theory, Control Theory, Operations Research, Information Theory, Simulation-Based Optimization, Genetic Algorithm, Optimal Control Theory, Bounded Rationality.

References

2013

(Wikipedia, 2013) ⇒ http://en.wikipedia.org/wiki/reinforcement_learning Retrieved:2013-12-4.
- Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies there are concerned with existence of optimal solutions and their characterization, and not with the learning or approximation aspects.
  In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.
  In machine learning, the environment is typically formulated as a Markov decision process (MDP), and many reinforcement learning algorithms for this context are highly related to dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.
  Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

2011

(Peter Stone, 2011b) ⇒ Peter Stone. (2011). "Reinforcement Learning." In: (Sammut & Webb, 2011) p.849

@@ Line 9: / Line 9: @@
 ** an [[Unsupervised Learning Task]].
 ** a [[i.i.d. Learning Task]].
-* <B><U>See</U>:</B> [[Active Learning]], [[Exploration/Exploitation Tradeoff]].
+* <B><U>See</U>:</B> [[Active Learning]], [[Exploration/Exploitation Tradeoff]], [[Associative Reinforcement Learning]]; [[Autonomous Helicopter Flight Using Reinforcement Learning]]; [[Average-Reward Reinforcement Learning]]; [[Bayesian Reinforcement Learning]]; [[Dynamic Programming]]; [[Efficient Exploration in Reinforcement Learning]]; [[Gaussian Process Reinforcement Learning]]; [[Hierarchical Reinforcement Learning]]; [[Instance-Based Reinforcement Learning]]; [[Inverse Reinforcement Learning]]; [[Least Squares Reinforcement Learning Methods]]; [[Model-Based Reinforcement Learning]]; [[Policy Gradient Methods]]; [[Q-Learning]]; [[Relational Reinforcement Learning]]; [[Reward Shaping]]; [[Symbolic Dynamic Programming]]; [[Temporal Difference Learning]]; [[Value Function Approximation]], [[Behaviorism]], [[Software Agent]], [[Game Theory]], [[Control Theory]], [[Operations Research]], [[Information Theory]], [[Simulation-Based Optimization]],  [[Genetic Algorithm]], [[Optimal Control Theory]],  [[Bounded Rationality]].
-----
-----
-==References ==
-===2009===
-* http://en.wikipedia.org/wiki/Reinforcement_learning
-** Inspired by related psychological theory, in [[computer science]], '''reinforcement learning''' is a sub-area of [[machine learning]] concerned with how an ''agent'' ought to take ''actions'' in an ''environment'' so as to maximize some notion of long-term ''reward''. Reinforcement learning algorithms attempt to find a ''policy'' that maps ''states'' of the world to the actions the agent ought to take in those states. In [[economics]] and [[game theory]], reinforcement learning is considered as a [[bounded rationality|boundedly rational]] interpretation of how equilibrium may arise.
-** The environment is typically formulated as a finite-state [[Markov decision process]] (MDP), and reinforcement learning algorithms for this context are highly related to [[dynamic programming]] techniques. State transition probabilities and reward probabilities in the MDP are typically stochastic but stationary over the course of the problem.
-** Reinforcement learning differs from the [[supervised learning]] problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been mostly studied through the [[multi-armed bandit]] problem.
-** Formally, the basic reinforcement learning model, as applied to MDPs, consists of:
-*** a set of environment states <math>S</math>;
-*** a set of actions <math>A</math>; and
-*** a set of scalar "rewards" in <math> \Bbb{R}</math>.
-----
-__NOTOC__
-<B>See:</B> [[Associative Reinforcement Learning]]; [[Autonomous Helicopter Flight Using Reinforcement Learning]]; [[Average-Reward Reinforcement Learning]]; [[Bayesian Reinforcement Learning]]; [[Dynamic Programming]]; [[Efficient Exploration in Reinforcement Learning]]; [[Gaussian Process Reinforcement Learning]]; [[Hierarchical Reinforcement Learning]]; [[Instance-Based Reinforcement Learning]]; [[Inverse Reinforcement Learning]]; [[Least Squares Reinforcement Learning Methods]]; [[Model-Based Reinforcement Learning]]; [[Policy Gradient Methods]]; [[Q-Learning]]; [[Relational Reinforcement Learning]]; [[Reward Shaping]]; [[Symbolic Dynamic Programming]]; [[Temporal Difference Learning]]; [[Value Function Approximation]]
-----
-----
-==References==
-===2011===
-* ([[Peter Stone, 2011b]]) &rArr; Peter Stone. (2011). "Reinforcement Learning." In: ([[Sammut & Webb, 2011]]) p.849
-----
-__NOTOC__
-[[Category:Concept]]
-A [[Reinforcement Learning Task]] is a [[Machine Learning]] that ...
-* <B>See:</B> [[Machine Learning]], [[Behaviorism]], [[Software Agent]], [[Game Theory]], [[Control Theory]], [[Operations Research]], [[Information Theory]], [[Simulation-Based Optimization]], [[Statistics]], [[Genetic Algorithm]], [[Optimal Control Theory]], [[Economics]], [[Bounded Rationality]].
-----
-----
 ==References==
@@ Line 47: / Line 16: @@
 * (Wikipedia, 2013) &rArr; http://en.wikipedia.org/wiki/reinforcement_learning Retrieved:2013-12-4.
 ** '''Reinforcement learning''' is an area of [[machine learning]] inspired by [[Behaviorism|behaviorist psychology]], concerned with how [[software agent]]s ought to take ''actions'' in an ''environment'' so as to maximize some notion of cumulative ''reward''. The problem, due to its generality, is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[statistics]], and [[genetic algorithm]]s. In the operations research and control literature, the field where reinforcement learning methods are studied is called ''approximate dynamic programming''. The problem has been studied in the [[optimal control theory|theory of optimal control]], though most studies there are concerned with existence of optimal solutions and their characterization, and not with the learning or approximation aspects. 	<P>	 In [[economics]] and [[game theory]], reinforcement learning may be used to explain how equilibrium may arise under [[bounded rationality]]. 	<P>	 In machine learning, the environment is typically formulated as a [[Markov decision process]] (MDP), and many reinforcement learning algorithms for this context are highly related to [[dynamic programming]] techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible. 	<P>	 Reinforcement learning differs from standard [[supervised learning]] in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the [[multi-armed bandit]] problem and in finite MDPs.
+===2011===
+* ([[Peter Stone, 2011b]]) &rArr; Peter Stone. (2011). "Reinforcement Learning." In: ([[Sammut & Webb, 2011]]) p.849
 ----
 [[Category:Concept]]
 __NOTOC__

Reinforcement Learning Task: Difference between revisions

Revision as of 01:32, 5 December 2013

References

2013

2011

Navigation menu

Search