Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) Algorithm
(Redirected from LoE-AIM Algorithm)
Jump to navigation
Jump to search
A Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) Algorithm is a Multi-Agent Learning (MAL) Algorithm that can be implemented by a LoE-AIM System to solve a LoE-AIM Task.
- AKA: LoE-AIM Algorithm.
- Example(s):
- Counter-Example(s):,
- Adapt When Everybody is Stationary Otherwise Move to Equilibrium (AWESOME) Algorithm,
- Enhanced Cooperative Multi-Agent Learning Algorithm (ECMLA) Algorithm,
- Replicatior Dynamics with a Variable Learning Rate (ReDVaLeR) Algorithm,
- Weighted Policy Learner (WPL) Algorithm,
- Win or Learn Fast (WoLF) Algorithm.
- See: Game Theory, Machine Learning System, Q-Learning, Reinforment Learning, Nash Equilibrium.
References
- (Chakraborty & Sons, 2008) ⇒ Doran Chakraborty, and Peter Stone. (y2008). “Online Multiagent Learning Against Memory Bounded Adversaries.” In: Proceedings of the 2008th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I. ISBN:3-540-87478-X, 978-3-540-87478-2 doi:10.1007/978-3-540-87479-9_32
- QUOTE: The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) that targets optimality against any learning opponent that can be treated as a memory bounded adversary. LoE-AIM makes no prior assumptions about the opponent and is tailored to optimally exploit any adversary which induces a Markov decision process in the state space of joint histories. LoE-AIM either explores and gathers new information about the opponent or converges to the best response to the partially learned opponent strategy in repeated play.