HITS Algorithm
(Redirected from HITS)
Jump to navigation
Jump to search
A HITS Algorithm is a Link-based Object Ranking Algorithm proposed in (Kleinberg, 1999) that models a Graph has HITS Hub Node and HITS Authority Node.
- AKA: HITS, Hyperlink-Induced Topic Search.
- Context:
- It was originally proposed for the Webpage Retrieval Task.
- See: PageRank Algorithm.
References
2011
- (Wikipedia - HITS algorithm, 2011-Jun-11) ⇒ http://en.wikipedia.org/wiki/HITS_algorithm
- Hyperlink-Induced Topic Search (HITS) (also known as Hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. It was a precursor to PageRank. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages.
2005
- (Getoor & Diehl, 2005) ⇒ Lise Getoor, Christopher P. Diehl. (2005). “Link Mining: A survey.” In: SIGKDD Explorations, 7(2). doi:10.1145/1117454.1117456
- QUOTE: In the context of web information retrieval, the PageRank [91] and HITS [64] algorithms are the most notable approaches to LBR. PageRank models web surfing as a random walk where the surfer randomly selects and follows links and occasionally jumps to a new web page to start another traversal of the link structure. The rank of a given web page in this context is the fraction of time that the random web surfer would spend at the page if the random process were iterated ad infinitum. This can be determined by computing the steady-state distribution of the random process. HITS assumes a slightly more complex process, modeling the web as being composed of two types of web pages: hubs and authorities. Hubs are web pages that link to many authoritative pages. Authorities are web pages that are linked to by many hubs. Each page in the web is assigned hub and authority scores. These scores are computed by an iterative algorithm that updates the scores of a page based on the scores of pages in its immediate neighborhood. This approach bears a relation to PageRank with two separate random walks - one with hub transitions and one with authority transitions - on a corresponding bipartite graph of hubs and authorities [73; 95; 84]. The hub and authority scores are the steady-state distributions of the respective random processes.
2001
- (Ng et al., 2001) ⇒ Andrew Y. Ng, Alice X. Zheng, Michael I. Jordan. (2001). “Stable Algorithms for Link Analysis.” In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2001). doi:10.1162/153244302760185243
- QUOTE: The Kleinberg HITS and the Google PageRank algorithms are eigenvector methods for identifying “authoritative or “influential articles, given hyperlink or citation information.
1999
- (Kleinberg, 1999) ⇒ Jon Kleinberg. (1999). “Authoritative Sources in a Hyperlinked Environment.” In: Journal of the ACM (JACM), 46(5) doi:10.1145/324133.324140