2009 NonLinearMatrixFactorizationwit
- (Lawrence et al., 2009) ⇒ Neil D. Lawrence, and Raquel Urtasun. (2009). “Non-linear Matrix Factorization with Gaussian Processes.” In: Proceedings of the 26th Annual International Conference on Machine Learning. doi:10.1145/1553374.1553452
Subject Headings: Non-Linear Probabilistic Matrix Factorization, Collaborative Filtering Algorithm.
Notes
Cited By
- http://scholar.google.com/scholar?q=%22Non-linear+matrix+factorization+with+Gaussian+processes%22+2009
- http://dl.acm.org/citation.cfm?id=1553374.1553452&preflayout=flat#citedby
Quotes
Abstract
A popular approach to collaborative filtering is matrix factorization. In this paper we develop a non-linear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous state-of-the-art performance.
3 Non-Linear PMF via GP-LVMs
We have already highlighted the fact that probabilistic matrix factorization, with the parameters W marginalized is a Bayesian multi-output regression model in which we optimize with respect to the inputs to the regression. This type of model is equivalent to probabilistic PCA. However, it also belongs to a larger class of models called Gaussian process latent variable models (GP-LVM). Lawrence (2005) showed how the matrix C has an interpretation as a Gaussian process (GP) covariance matrix. The GP associated with the covariance function C = �..1 w XX> + �2I is a linear model. However, by replacing the inner product matrix, XX>, by a Mercer kernel the model becomes a non-linear GP model. Maximization of the log likelihood can no longer be done through an eigenvalue problem, but it is straightforward to apply stochastic gradient descent in the manner described above.
The regression model from (1) can be written as a product of univariate Gaussian distributions,
- [math]\displaystyle{ p . YjW;X; �2� = DY j=1 NY i=1 N .. yi;j jfj (xi;:) ; �2I � ; }[/math]
where the mean of each Gaussian is given by the inner product fj (xi;:) = w> j;:xi;:. Probabilistic PCA can be recovered by marginalizing either W or X. The GPLVM is recovered by recognizing that we can place the prior distribution directly over the function f (�) through a Gaussian process (Rasmussen & Williams, 2006).
A Gaussian process (GP) can be thought of as a probability distribution for generating functions. The GP is specified by a mean and a covariance function. For any given set of observations of the function, f, the joint distribution over those observations is Gaussian. Restricting ourselves to GPs with a zero mean function, they are distributed as p (f jX) = N (f j0;K) ; where K represents the covariance function. The covariance function is made up of elements, k(xi;:; xj;:) that encode the degree of correlation between two samples, fi, fj from f as a function of the inputs associated with those samples, xi;: and xj;:. For a covariance function to be valid, it has to lead to a positive semi-definite matrix K for all valid inputs to the function. In practice that means that valid covariance functions have to be positive definite functions, i.e. the class of valid covariance functions is the same as the class of Mercer kernels (Sch?olkopf & Smola, 2001). A linear regression model is a GP in which the covariance function is taken to be k (xi;:; xj;:) = x> i;:xj;:.
A widely used covariance function that gives a prior Non-linear Matrix Factorization with Gaussian Processes over non-linear functions is known as the RBF covariance, k (x`;:; xi;:) = �m exp �m 2 jjx`;: xi;:jj2 �
This covariance can be substituted directly for the linear covariance function in (2) giving the following probabilistic model,
- [math]\displaystyle{ p YjX; �2; � = DY j=1 N yij ;j j0;K+ �2I }[/math]
� where �are the parameters of the covariance function. Alternative covariance functions can also be considered, but in this paper we focus only on the RBF and linear covariance functions.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 NonLinearMatrixFactorizationwit | Neil D. Lawrence Raquel Urtasun | Non-linear Matrix Factorization with Gaussian Processes | 10.1145/1553374.1553452 | 2009 |