Instance-based Reinforcement Learning (IBRL) System
Jump to navigation
Jump to search
An Instance-based Reinforcement Learning (IBRL) System is a Reinforcement Learning System that uses an instance-based supervised learning system to interpolate the values of state-action pairs.
- AKA: Kernel-Based Reinforcement Learning System.
- Example(s):
- Counter-Example(s):
- See: Curse of Dimensionality, Instance-Based Learning, Locally Weighted Learning, Value-Function Approximation.
References
2017
- (Smart, 2017) ⇒ William D. Smart (2017). "Instance-Based Reinforcement". In: (Sammut & Webb, 2017). DOI:10.1007/978-1-4899-7687-1_410.
- QUOTE: Traditional reinforcement-learning (RL) algorithms operate on domains with discrete state spaces. They typically represent the value function in a table, indexed by states, or by state-action pairs. However, when applying RL to domains with continuous state, a tabular representation is no longer possible. In these cases, a common approach is to represent the value function by storing the values of a small set of states (or state-action pairs), and interpolating these values to other, unstored, states (or state-action pairs). This approach is known as instance-based reinforcement learning (IBRL). The instances are the explicitly stored values, and the interpolation is typically done using well-known instance-based supervised learning algorithms.
2002
- (Ormoneit & Sen, 2002) ⇒ Dirk Ormoneit, and Saunak Sen (2002). "Kernel-Based Reinforcement Learning". In: Machine learning, 49(2-3), 161-178. DOI:10.1023/A:1017928328829.
- QUOTE: We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process.
2000
- (Smart & Kaelbling, 2000) ⇒ William D. Smart, and Leslie Pack Kaelbling (2000, June). "Practical Reinforcement Learning in Continuous Spaces". In: ICML 2000.
- QUOTE: Dynamic control tasks are good candidates for the application of reinforcement learning techniques. However, many of these tasks inherently have continuous state or action variables. This can cause problems for traditional reinforcement learning algorithms which assume discrete states and actions. In this paper, we introduce an algorithm that safely approximates the value function for continuous state control tasks, and that learns quickly from a small amount of data.
1997
- (Kretchmar & Anderson,1997) ⇒ R. Matthew Kretchmar, and Charles W. Anderson (1997, June). "Comparison of CMACs and Radial Basis Functions for Local Function Approximators in Reinforcement Learning". In: Proceedings of International Conference on Neural Networks (ICNN'97).
- QUOTE: CMACs and Radial Basis Functions are often used in reinforcement learning to learn value function approximations having local generalization properties.