Reinforcement Learning Task

Revision as of 20:40, 10 October 2012 by Gmelli (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A Reinforcement Learning Task is an active learning task/serial decision task with a cost function and constraint rules.



References

2009

  • http://en.wikipedia.org/wiki/Reinforcement_learning
    • Inspired by related psychological theory, in computer science, reinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. In economics and game theory, reinforcement learning is considered as a boundedly rational interpretation of how equilibrium may arise.
    • The environment is typically formulated as a finite-state Markov decision process (MDP), and reinforcement learning algorithms for this context are highly related to dynamic programming techniques. State transition probabilities and reward probabilities in the MDP are typically stochastic but stationary over the course of the problem.
    • Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been mostly studied through the multi-armed bandit problem.
    • Formally, the basic reinforcement learning model, as applied to MDPs, consists of:
      • a set of environment states [math]\displaystyle{ S }[/math];
      • a set of actions [math]\displaystyle{ A }[/math]; and
      • a set of scalar "rewards" in [math]\displaystyle{ \Bbb{R} }[/math].


See: Associative Reinforcement Learning; Autonomous Helicopter Flight Using Reinforcement Learning; Average-Reward Reinforcement Learning; Bayesian Reinforcement Learning; Dynamic Programming; Efficient Exploration in Reinforcement Learning; Gaussian Process Reinforcement Learning; Hierarchical Reinforcement Learning; Instance-Based Reinforcement Learning; Inverse Reinforcement Learning; Least Squares Reinforcement Learning Methods; Model-Based Reinforcement Learning; Policy Gradient Methods; Q-Learning; Relational Reinforcement Learning; Reward Shaping; Symbolic Dynamic Programming; Temporal Difference Learning; Value Function Approximation



References

2011