Adapt When Everybody is Stationary Otherwise Move to Equilibrium (AWESOME) Algorithm
Jump to navigation
Jump to search
An Adapt When Everybody is Stationary Otherwise Move to Equilibrium (AWESOME) Algorithm is a Multi-Agent Learning Algorithm that can adapt to play the best response based on stationarity and a mixed equilibrium strategy.
- Context:
- It (often) defined in a game theoretic setting such as repeated games or stochastic games.
- …
- Example(s):
- as described in (Conitzer & Sandholm, 2007).
- …
- Counter-Example(s):
- Enhanced Cooperative Multi-Agent Learning Algorithm (ECMLA) Algorithm,
- Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) Algorithm,
- Replicatior Dynamics with a Variable Learning Rate (ReDVaLeR) Algorithm,
- Weighted Policy Learner (WPL) Algorithm.
- Win or Learn Fast (WoLF) Algorithm.
- See: Single-Agent Learning Algorithm, Game Theory, Machine Learning System, Q-Learning, Reinforcent Learning, Nash Equilibrium.
References
2017
- (Shoham & Powers, 2017) ⇒ Yoav Shoham, and Rob Powers (2017). “Multi-Agent Learning Algorithms”. In: (Sammut & Webb, 2017)
- QUOTE: Multi-agent learning (MAL) refers to settings in which multiple agents learn simultaneously. Usually defined in a game theoretic setting, specifically in repeated games or stochastic games, the key feature that distinguishes MAL from single-agent learning is that in the former the learning of one agent impacts the learning of others. As a result, neither the problem definition for multi-agent learning nor the algorithms offered follow in a straightforward way from the single-agent case. In this second of two entries on the subject, we focus on algorithms.
2007
- (Conitzer & Sandholm, 2007) ⇒ Vincent Conitzer, and Tuomas Sandholm. (2007). “AWESOME: A General Multiagent Learning Algorithm That Converges in Self-play and Learns a Best Response Against Stationary Opponents.” In: Machine Learning Journal, 67(1-2). doi:10.1007/s10994-006-0143-1
- QUOTE: Roughly, the idea of the algorithm is the following. When the others appear to playing stationary strategies, AWESOME adapts to play the best response to those apparent strategies. When the others appear to be adapting their strategies, AWESOME retreats to an equilibrium strategy. (Hence, AWESOME stands for Adapt When Everybody is Stationary, Otherwise Move to Equilibrium.)
(...) While the basic idea is simple, we need a few more technical specifications to enable us to prove the desired properties.
- To make the algorithm well-specified, we need to specify which equilibrium strategy AWESOME retreats to. We let AWESOME compute an equilibrium in the beginning, and it will retreat to its strategy in that equilibrium every time it retreats. To obtain our guarantee of convergence in self-play, we also specify that each AWESOME agent computes the same equilibrium [1] We observe that any equilibrium will work here (e.g., a social welfare maximizing one), but AWESOME might not converge to that equilibrium in self-play—that is, it may c onverge to another equilibrium.
- When retreating to the equilibrium strategy, AWESOME forgets everything it has learned. So, retreating to an equilibrium is a complete restart. (This may be wasteful in practice, but makes the analysis easier.)
- Best-responding to strategies that are close to the precomputed equilibrium strategies, but slightly different, can lead to rapid divergence from the equilibrium. To avoid this, AWESOME at various stages has a null hypothesis that the others are playing the precomputed equilibrium. AWESOME will not reject this hypothesis unless presented with significant evidence to the contrary.
- AWESOME rejects the equilibrium hypothesis also when its own actions, chosen according to its mixed equilibrium strategy, happen to appear to indicate a nonequilibrium strategy (even though the underlying mixed strategy is actually the equilibrium strategy). This will help in proving convergence in self-play by making the learning process synchronized across all AWESOME players. (Since the other AWESOME players will restart when they detect such nonstationarity, this agent restarts itself to stay synchronized with the others.)
- After AWESOME rejects the equilibrium hypothesis, it randomly picks an action and changes its strategy to always playing this action. At the end of an epoch, if another action would perform significantly better than this action against the strategies the others appeared to play in the last epoch, it switches to this action. (The significant difference is necessary to prevent the AWESOME player from switching back and forth between multiple best responses to the actual strategies played.)
- Because the others’ strategies are unobservable (only their actions are observable), we need to specify how an AWESOME agent can reject, based on others’ actions, the hypothesis that the others are playing the precomputed equilibrium strategies. Furthermore, we need to specify how an AWESOME agent can reject, based on others’ actions, the hypothesis that the others are drawing their actions according to stationary (mixed) strategies.
- QUOTE: Roughly, the idea of the algorithm is the following. When the others appear to playing stationary strategies, AWESOME adapts to play the best response to those apparent strategies. When the others appear to be adapting their strategies, AWESOME retreats to an equilibrium strategy. (Hence, AWESOME stands for Adapt When Everybody is Stationary, Otherwise Move to Equilibrium.)
- ↑ This is at least somewhat easonable since they share the same algorithm. If there are only finitely many equilibria, then one way to circumvent this assumption is to let each agent choose a random equilibrium after each restart, so that there is at least some probability that the computed equilibria coincide.