Win or Learn Fast (WoLF) Algorithm

AKA: WoLF-Based Learning Algorithm.
Example(s):
- WoLF-IGA Algorithm,
- WoLF-GIGA Algorithm,
- WoLF-PHC Algorithm.
- …
Counter-Example(s):,
See: Game Theory, Machine Learning System, Q-Learning, Reinforment Learning, Nash Equilibrium.

References

(Bowling & Veloso, 2002) ⇒ Michael Bowling, and Manuela Veloso. (2002). “Multiagent Learning Using a Variable Learning Rate.” In: Artificial Intelligence Journal, 136(2). doi:10.1016/S0004-3702(02)00121-2
- QUOTE: In this article, we contribute a new learning technique: a variable learning rate. We introduce this concept and provide a specific principle to adjust the learning rate, namely the WoLF principle, standing for “Win or Learn Fast”. We successfully develop and apply the WoLF principle within different learning approaches. Given the novelty of the WoLF principle, we face the challenge of determining whether a WoLF-based learning algorithm is rational and convergent according to our own introduced properties of multiagent learning algorithms. We show the rationality property and we contribute a theoretical proof of the convergence of WoLF gradient ascent in a restricted class of iterated matrix games. We then show empirical results suggesting convergence of an extended WoLF algorithm and compare its performance in a variety of game situations used previously by other learning algorithms.

(Bowling & Veloso, 2001) ⇒ Michael H. Bowling, and Manuela M. Veloso. (2001). “Convergence of Gradient Dynamics with a Variable Learning Rate.” In: Proceedings of the Eighteenth International Conference on Machine Learning. ISBN:1-55860-778-1.
- QUOTE: The specific method for varying the learning rate that we are contributing is the WoLF (“Win or Learn Fast”) principle. The essence of this method is to learn quickly when losing, and cautiously when winning. The intuition is that a learner should adapt quickly when it is doing more poorly than expected. When it is doing better than expected, it should be cautious since the other players are likely to change their policy. The heart of the algorithm is how to determine whether a player is winning or losing. For the analysis in this section each player will select a Nash equilibrium and compare their expected payoff with they would receive if they played according to the selected equilibrium strategy.