PageRank Algorithm

A PageRank Algorithm is a Directed Graph Node Ranking Algorithm that efficiently calculates a directed graph node's PageRank Value (random arrival likelihood score).

Context:
- It was proposed in (Page et al., 1998)
- It was originally proposed for a Webpage Retrieval System.
- It requires the calculation of a PageRank Vector.
- It can determine the fraction of time that the random walker (e.g. web surfer) would spend at the page if the random process were iterated ad infinitum.
Counter-Exsample(s):
- HITS Algorithm.
- SALSA Algorithm.
See: Random Walk, Stochastic Matrix, Eigenvector Centrality, Markov Process, MCMC Algorithm, Personalized PageRank.

References

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/PageRank#Algorithm Retrieved:2015-3-12.
- The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.
  A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a "50% chance" of something happening. Hence, a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank.

2012

(Wikipedia - PageRank, 2011-Jun-11) ⇒ http://en.wikipedia.org/wiki/PageRank
- PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element $E$ is referred to as the PageRank of E and denoted by [math]\displaystyle{ PR(E). }[/math] The name "PageRank" is a trademark of Google, and the PageRank process has been patented (Template:US patent).

2011

(Franceschet, 2011) ⇒ Massimo Franceschet. (2011). “PageRank: standing on the shoulders of giants.” In: Communications of the ACM, 54(6) doi:10.1145/1953122.1953146
- QUOTE: We start by providing an intuitive interpretation of PageRank in terms of random walks on graphs. The Web is viewed as a directed graph of pages connected by hyperlinks. A random surfer starts from an arbitrary page and simply keeps clicking on successive links at random, bouncing from page to page. The PageRank value of a page corresponds to the relative frequency the random surfer visits that page, assuming that the surfer goes on infinitely. The more time spent by the random surfer on a page, the higher the PageRank importance of the page. ...

2009

(Wu & Kumar, 2009) ⇒ Xindong Wu, and Vipin Kumar, editors. (2009). “The Top Ten Algorithms in Data Mining.” Chapman & Hall. ISBN:1420089641

2006

(Langville & Meyer, 2006) ⇒ Amy N. Langville, and Carl Dean Meyer. (2006). “Page Rank and Beyond." Princeton University Press. ISBN:0691122024

2005

(Getoor & Diehl, 2005) ⇒ Lise Getoor, Christopher P. Diehl. (2005). “Link Mining: A survey.” In: SIGKDD Explorations, 7(2). doi:10.1145/1117454.1117456
- QUOTE: In the context of web information retrieval, the PageRank [91] and HITS [64] algorithms are the most notable approaches to LBR. PageRank models web surfing as a random walk where the surfer randomly selects and follows links and occasionally jumps to a new web page to start another traversal of the link structure. The rank of a given web page in this context is the fraction of time that the random web surfer would spend at the page if the random process were iterated ad infinitum. This can be determined by computing the steady-state distribution of the random process. HITS assumes a slightly more complex process, modeling the web as being composed of two types of web pages: hubs and authorities. Hubs are web pages that link to many authoritative pages. Authorities are web pages that are linked to by many hubs. Each page in the web is assigned hub and authority scores. These scores are computed by an iterative algorithm that updates the scores of a page based on the scores of pages in its immediate neighborhood. This approach bears a relation to PageRank with two separate random walks - one with hub transitions and one with authority transitions - on a corresponding bipartite graph of hubs and authorities [73; 95; 84]. The hub and authority scores are the steady-state distributions of the respective random processes.

PageRank Algorithm

References

2015

2012

2011

2009

2006

2005

2002

2001

1998

Navigation menu

Search