2014 ParallelizingExplorationExploit
- (Desautels et al., 2014) ⇒ Thomas Desautels, Andreas Krause, and Joel W. Burdick. (2014). “Parallelizing Exploration-exploitation Tradeoffs in Gaussian Process Bandit Optimization.” In: The Journal of Machine Learning Research, 15(1).
Subject Headings: Gaussian Process-based NNet Bottleneck.
Notes
Cited By
- http://scholar.google.com/scholar?q=%222014%22+Parallelizing+Exploration-exploitation+Tradeoffs+in+Gaussian+Process+Bandit+Optimization
- http://dl.acm.org/citation.cfm?id=2627435.2750368&preflayout=flat#citedby
Author Keywords
Gaussian process, upper confidence bound, batch, active learning, regret-bound.
Abstract
How can we take advantage of opportunities for experimental parallelization in exploration-exploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time. Additionally, observations may be both noisy and expensive. We introduce Gaussian Process Batch Upper Confidence Bound (GP-BUCB), an upper confidence bound-based algorithm, which models the reward function as a sample from a Gaussian process and which can select batches of experiments to run in parallel. We prove a general regret bound for GP-BUCB, as well as the surprising result that for some common kernels, the asymptotic average regret can be made independent of the batch size. The GP-BUCB algorithm is also applicable in the related case of a delay between initiation of an experiment and observation of its results, for which the same regret bounds hold. We also introduce Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB), a variant of GP-BUCB which can exploit parallelism in an adaptive manner. We evaluate GP-BUCB and GP-AUCB on several simulated and real data sets. These experiments show that GP-BUCB and GP-AUCB are competitive with state-of-the-art heuristics.
1. Introduction
Many problems require optimizing an unknown reward function, from which we can only obtain noisy observations. A central challenge is choosing actions that both explore (estimate) the function and exploit our knowledge about likely high reward regions in the function's domain. Carefully calibrating this exploration{exploitation tradeo� is especially important in cases where the experiments are costly, e.g., when each experiment takes a long time to perform and the time budget available for experiments is limited. In such settings, it may be desirable to execute several experiments in parallel. By parallelizing the experiments, substantially more information can be gathered in the same time-frame; however, future actions must be
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2014 ParallelizingExplorationExploit | Andreas Krause Thomas Desautels Joel W. Burdick | Parallelizing Exploration-exploitation Tradeoffs in Gaussian Process Bandit Optimization | 2014 |