2011 LinkPredictionviaMatrixFactoriz

From GM-RKB
(Redirected from Menon & Elkan, 2011)
Jump to navigation Jump to search

Subject Headings: Supervised Matrix Factorization, Link Prediction.

Notes

Cited By

Quotes

Author Keywords

Link prediction, matrix factorization, side information, ranking loss.

Abstract

We propose to solve the link prediction problem in graphs using a supervised matrix factorization approach. The model learns latent features from the topological structure of a (possibly directed) graph, and is shown to make better predictions than popular unsupervised scores. We show how these latent features may be combined with optional explicit features for nodes or edges, which yields better performance than using either type of feature exclusively. Finally, we propose a novel approach to address the class imbalance problem which is common in link prediction by directly optimizing for a ranking loss. Our model is optimized with stochastic gradient descent and scales to large graphs. Results on several datasets show the efficacy of our approach.

1 The Link Prediction Problem

Link prediction is the problem of predicting the presence or absence of edges between nodes of a graph. There are two types of link prediction: (i) structural, where the input is a partially observed graph, and we wish to predict the status of edges for unobserved pairs of nodes, and (ii) temporal, where we have a sequence of fully observed graphs at various time steps as input, and our goal is to predict the graph state at the next time step. Both problems have important practical applications, such as predicting interactions between pairs of proteins and [[recommending friends in social networks]] respectively. This document will focus on the structural link prediction problem, and henceforth, we will use the term “link prediction” to refer to the structural version of the problem.

Link prediction is closely related to the problem of collaborative filtering, where the input is a partially observed matrix of (user, item) preference scores, and the goal is to recommend new items to a user. Collaborative filtering can be seen as a bipartite weighted link prediction problem, where users and items are represented by nodes, and edges between nodes are weighted according to the preference score. More generally, both problems are instances of dyadic prediction, which is the problem of predicting a label for the interaction between pairs of entities (or dyads) [17]. Despite this connection, there has been limited interaction between the link prediction and collaborative filtering literatures, aside from a few papers that propose to solve one problem using models from the other [7,23].

2 Existing Link Prediction Models

At a high level, existing link prediction models fall into two classes: unsupervised and supervised. Unsupervised models compute scores for pairs of nodes based on topological properties of the graph. For example, one such score is the number of common neighbours that two nodes share. Other popular scores are the Adamic-Adar [1] and Katz score [22]. These models use predefined scores that are invariant to the specific structure of the input graph, and thus do not involve any learning. Supervised models, on the other hand, attempt to be directly predictive of link behaviour by learning a parameter vector θ via :[math]\displaystyle{ \underset{\theta}{\operatorname{min}} \frac{1}{\mathcal{O}} \sum_{(i,j)\in \mathcal{O}} \mathcal{l}(G_{ij}, \hat{G}_{ij}(\theta)) + \Omega(\theta), \ \ (1) }[/math] where [math]\displaystyle{ \hat{G}_{ij}(θ) }[/math] is the model’s predicted score for the dyad [math]\displaystyle{ (i,j), \mathcal{l}(•,•) }[/math] is a loss function, and \math{Ω(•)} is a regularization term that prevents overfitting. The choice of these terms depends on the type of model. We list some popular approaches:

1. Feature-based models. Suppose each node [math]\displaystyle{ i }[/math] in the graph has an associated feature vector [math]\displaystyle{ x_i \in \mathbb{R}^d }[/math]. Suppose further that each dyad (i, j) has a feature vector [math]\displaystyle{ z_{ij}\in \mathbb{R}^D }[/math]. Then, we may instantiate Equation 1 via :[math]\displaystyle{ Gˆij (w, v) = L(fD (zij ; w) + fM (xi, xj ; v)) \ \ \ (2) }[/math] for appropriate fD (•), fM (•, •) acting on dyadic and monadic features respectively, and a link function L(•). Both linear [33] and nonlinear [14,5] choices of fD (•), fM (•, •) have been considered. In the linear case, it is standard to let fD(zij ; w) = wT zij and fM (xi, xj ; v) = (v(1))T x_i + (v(2) )T xj , where v(1) = v(2) iff the graph is undirected. The loss is typically either square or log-loss, and the regularizer typically λw ||w 2 + λv ||v||2. Note also that we can compute multiple unsupervised scores between pairs of nodes (i, j), and treat these as comprising a feature vector zij .

2. Graph regularization models. Here, we assume the existence of node features x_i ∈ Rd, based on which we construct a kernel Kiit jjt that compares the node pairs (i, j) and (it, jt). We compute the predicted adjacency matrix Gˆ by constraining that values in this matrix should vary smoothly according to K. Thus K acts as a graph regularizer, a popular approach in semi-supervised learning [39]. In the framework of Equation 1, we have [math]\displaystyle{ Ω(Gˆ) = λ … Kiit jjt (Gˆij — Gˆ itjt)2 + μ … _(i,j)∈/O }[/math] The above is called link propagation [19,26]. The performance of such methods depends on the choice of kernel K, which is pre-specified and not learned from the data.

3. Latent class models. These models assign each node of the graph to a class, and use the classes to predict the link structure. [4] assumes that nodes interact solely through their class memberships. It is possible to extend this to allow nodes to have membership in multiple classes [3]. These models are largely Bayesian, and so are not directly expressible in the empirical loss framework of Equation 1. Nonetheless, they do learn a matrix of probabilities [math]\displaystyle{ P ∈ {0, 1}C×C }[/math], where C is the number of classes, and this is done by placing appropriate priors on P , which may be viewed as a form of regularization.

4. Latent feature models. Here, we treat link prediction as a matrix completion problem, and factorize G ≈ L(UΛUT ) for some U ∈ Rn×k , Λ ∈ Rk×k and link function L(•). Each node i thus has a corresponding latent vector ui ∈ Rk , where k is the number of latent features. In the setup of Equation 1, we have :[math]\displaystyle{ Gˆij (U, Λ) = L(uT Λuj ). }[/math] The regularizer [math]\displaystyle{ Ω(U,Λ) = ^λ^U_2||U||_F + _2||Λ||_F }[/math] usually. Such models have been successful in other dyadic prediction tasks such as collaborative filtering [20]. This approach has not been studied as extensively in the link prediction literature as it has in the collaborative filtering literature. Bayesian versions of these methods using a sigmoidal link function have been studied in statistics [16] and political science [34], where they are sometimes referred to as [[bi-linear random effects models|bi-linear random effects models]]. These fields have focussed more on qualitative analysis of link behaviour than predictive performance and scalability. In the machine learning literature, computationally efficient frequentist versions of these latent feature models have been studied in [23,37], and Bayesian models have also been extended to allow for an infinite number of latent features [24,25].

2.1 Do Existing Methods Meet the Challenges in Link Prediction?

We recall the three challenges in link prediction from Section 1.1, and study how they are handled by current models.

  • Incorporating topological and side information. Existing models typically use a nonlinear classifier such as a kernel SVM [14] or decision tree [21] to combine topological and explicit features. However, the topological structure is exploited by just learning weights on unsupervised scores. We will show that this is potentially limiting, since unsupervised scores have limited expressivity. Latent feature methods like [16] do incorporate side information, but we will subsequently describe a limitation of this approach.
  • Overcoming imbalance. Relatively little attention has been paid to modifying models so as to account for the class imbalance. One solution is undersampling the set of training dyads [14,21], but this has the disadvantage of necessarily throwing away information. [9] addresses imbalance by casting the problem as being cost-sensitive with unknown costs, and tunes the costs via cross-validation.
  • Scaling to large graphs. Methods based on topological scores generally

scale to large graphs, by virtue of mostly requiring only simple operations on the adjacency matrix. Some scores that look at long-range relationships between node pairs, such as the Katz measure, can be approximated in order to run on large graphs [10]. For methods based on explicit features, undersampling to overcome imbalance also reduces the number of training examples [21]. Latent class and latent feature methods based on Bayesian models do not scale to large graphs due to the cost of MCMC sampling.

We summarize representative link prediction methods in Table 1, and note whether they meet the three challenges. We would like a model that has the same expressiveness as existing methods, while directly addressing the above challenges. Our proposal is to extend matrix factorization to this end. The next section proposes the model, and argues why it is an suitable foundation for a link prediction model.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 LinkPredictionviaMatrixFactorizCharles P. Elkan
Aditya Krishna Menon
Link Prediction via Matrix Factorization2011