Weighted Approximately Ranked Pairwise (WARP) Ranking Loss

A Weighted Approximately Ranked Pairwise (WARP) Ranking Loss is a ranking loss function for ranking with pairwise classification.

See: BPR Algorithm, Loss Function.

References

2013

http://github.com/Mendeley/mrec/blob/master/README.rst
- QUOTE: mrec is a Python package developed at Mendeley to support recommender systems development and evaluation. The package currently focuses on item similarity and other methods that work well on implicit feedback, and on experimental evaluation. Why another package when there are already some really good software projects implementing recommender systems?
  mrec tries to fill two small gaps in the current landscape, firstly by supplying simple tools for consistent and reproducible evaluation, and secondly by offering examples of how to use IPython.parallel to run the same code either on the cores of a single machine or on a cluster. The combination of IPython and scientific Python libraries is very powerful, but there are still rather few examples around that show how to get it to work in practice.
- Highlights:
  - a (relatively) efficient implementation of the SLIM item similarity method (Levy & Jack, 2013).
  - an implementation of Hu, Koren & Volinsky's WRMF weighted matrix factorization for implicit feedback (Hu et al., 2008).
  - a matrix factorization model that optimizes the Weighted Approximately Ranked Pairwise (WARP) ranking loss (Weston et al., 2010).
  - a hybrid model optimizing the [[WARP loss for a ranking based jointly on a user-item matrix and on content features for each item.
  - utilities to train models and make recommendations in parallel using IPython.
  - utilities to prepare datasets and compute quality metrics.

2012

http://www.hongliangjie.com/2012/08/24/weighted-approximately-ranked-pairwise-loss-warp/
- QUOTE: To focus more on the top of the ranked list, where the top k positions are those we care about using the precision at k measure, one can weigh the pairwise violations depending on their position in the ranked list. For pair-wise learning procedure, we construct a set of all positive labelled instances, denoted as C+u and a set of negative labelled instances as C−u. The loss is defined as:
  errWARP(xi,yi) = L[rank(f(yi|xi))](1)
  where [math]\displaystyle{ \operatorname{rank}(f(y_i|x_i)) }[/math] is a function to measure how many negative labelled instances are “wrongly” ranked higher than this positive example xi:
  [math]\displaystyle{ \operatorname{rank}(f(y_i|x_i)) = ∑(x′,y′) \in C−uI[f(y′|x′) \ge f(y|x_i)] }[/math],
  where I(x) is the indicator function, and L(⋅) transforms this rank into a loss: [math]\displaystyle{ L(r) = r∑j = 1τj, }[/math] with [math]\displaystyle{ τ1≥τ2≥⋯≥0 }[/math].

2010

(Weston et al., 2010) ⇒ Jason Weston, Samy Bengio, and Nicolas Usunier. (2010). “Large Scale Image Annotation: Learning to Rank with Joint Word-image Embeddings.” In: Machine Learning Journal, 81(1). doi:10.1007/s10994-010-5198-3
- QUOTE: ... Unfortunately, such measures can be costly to train. To make training time efficient we propose the WARP loss (Weighted Approximate-Rank Pairwise loss). The WARP loss is related to the recently proposed Ordered Weighted Pairwise Classification (OWPC) loss (Usunier et al. 2009) which has been shown to be state-of-the-art on (small) text retrieval tasks. WARP uses stochastic gradient descent and a novel sampling trick to approximate ranks resulting in an efficient online optimization strategy which we show is superior to standard stochastic gradient descent applied to the same loss, enabling us to train on datasets that do not even fit in memory. Moreover, WARP can be applied to our embedding models (in fact, to arbitrary differentiable models) whereas the OWPC loss, which relies on SVM_struct cannot.

2009

(Usunier et al., 2009) ⇒ Nicolas Usunier, David Buffoni, and Patrick Gallinari. (2009). “Ranking with Ordered Weighted Pairwise Classification.” In: Proceedings of the 26th Annual International Conference on Machine Learning. ISBN:978-1-60558-516-1 doi:10.1145/1553374.1553509
- QUOTE: In ranking with the pairwise classification approach, the loss associated to a predicted ranked list is the mean of the pairwise classification losses. This loss is inadequate for tasks like information retrieval where we prefer ranked lists with high precision on the top of the list. We propose to optimize a larger class of loss functions for ranking, based on an ordered weighted average (OWA) (Yager, 1988) of the classification losses. Convex OWA aggregation operators range from the max to the mean depending on their weights, and can be used to focus on the top ranked elements as they give more weight to the largest losses. When aggregating hinge losses, the optimization problem is similar to the SVM for interdependent output spaces. Moreover, we show that OWA aggregates of margin-based classification losses have good generalization properties. Experiments on the Letor 3.0 benchmark dataset for information retrieval validate our approach.

Weighted Approximately Ranked Pairwise (WARP) Ranking Loss

References

2013

2012

2010

2009

Navigation menu

Search