Adaptive Real-Time Dynamic Programming (ARTDP) Algorithm

AKA: ARTDP.
Context:
- It is a based on reinforcement learning.
Example(s):
- …
Counter-Example(s):
- Bounded Real-Time Dynamic Programming (BRTDP) Algorithm,
- Brown-UMBC Reinforcement Learning and Planning (BURLAP) Algorithm.
See: Anytime Algorithm; Approximate Dynamic Programming; Reinforcement Learning; System Identification.

References

(Barto, 2017) ⇒ Barto A.G. (2017) Adaptive Real-Time Dynamic Programming. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA
- QUOTE: Adaptive Real-Time Dynamic Programming (ARTDP) is an algorithm that allows an agent to improve its behavior while interacting over time with an incompletely known dynamic environment. It can also be viewed as a heuristic search algorithm for finding shortest paths in incompletely known stochastic domains. ARTDP is based on Dynamic Programming (DP), but unlike conventional DP, which consists of off-line algorithms, ARTDP is an on-line algorithm because it uses agent behavior to guide its computation. ARTDP is adaptive because it does not need a complete and accurate model of the environment but learns a model from data collected during agent-environment interaction. When a good model is available, Real-Time Dynamic Programming (RTDP) is applicable, which is ARTDP without the model-learning component.