2009 MonteCarloSamplingforRegretMini
- (Lanctot et al., 2009) ⇒ Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. (2009). “Monte Carlo Sampling for Regret Minimization in Extensive Games.” In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. ISBN:978-1-61567-911-9
Subject Headings: Counterfactual Regret Minimization (CFR).
Notes
Cited By
- http://scholar.google.com/scholar?q=%222009%22+Monte+Carlo+Sampling+for+Regret+Minimization+in+Extensive+Games
- http://dl.acm.org/citation.cfm?id=2984093.2984215&preflayout=flat#citedby
Quotes
Abstract
Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR's bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 MonteCarloSamplingforRegretMini | Martin Zinkevich Marc Lanctot Michael Bowling Kevin Waugh | Monte Carlo Sampling for Regret Minimization in Extensive Games | 2009 |