2015 TheSelfNormalizedEstimatorforCo

From GM-RKB

(Redirected from Swaminathan & Joachims, 2015b)

Jump to navigation Jump to search

(Swaminathan & Joachims, 2015b) ⇒ Adith Swaminathan, and Thorsten Joachims. (2015). “The Self-normalized Estimator for Counterfactual Learning.” In: Proceedings of the 28th International Conference on Neural Information Processing Systems.

Subject Headings: Logged Bandit Feedback (BLBF) Algorithms, Online System Training, Counterfactual Risk Minimization (CRM) Principle.

Notes

Cited By

Quotes

Abstract

This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy. This makes BLBF algorithms particularly attractive for training online systems (e.g., ad placement, web search, recommendation) using their historical logs. The Counterfactual Risk Minimization (CRM) principle [1] offers a general recipe for designing BLBF algorithms. It requires a counterfactual risk estimator, and virtually all existing works on BLBF have focused on a particular unbiased estimator. We show that this conventional estimator suffers from a propensity overfitting problem when used for learning over complex hypothesis spaces. We propose to replace the risk estimator with a self-normalized estimator, showing that it neatly avoids this problem. This naturally gives rise to a new learning algorithm - Normalized Policy Optimizer for Exponential Models (Norm-POEM) - for structured output prediction using linear rules. We evaluate the empirical effectiveness of Norm-POEM on several multi-label classification problems, finding that it consistently outperforms the conventional estimator.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2015 TheSelfNormalizedEstimatorforCo	Thorsten Joachims Adith Swaminathan			The Self-normalized Estimator for Counterfactual Learning						2015

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2015_TheSelfNormalizedEstimatorforCo&oldid=882159"

Facts

... more about "2015 TheSelfNormalizedEstimatorforCo"

Adith Swaminathan + and Thorsten Joachims +

The Self-normalized Estimator for Counterfactual Learning +

2015 +