Rocchio Relevance Feedback Algorithm
A Rocchio Relevance Feedback Algorithm is a Relevance Feedback Algorithm that is used for automatic document processing and text categorization by Information Retrieval Systems.
- AKA: Rocchio Algorithm.
- Context:
- It was initially developed by Rocchio (1971).
- It has been implemented by SMART Information Retrieval Systems.
- Example(s):
- Counter-Example(s):
- See: Rocchio Classification Algorithm, Information Retrieval, SMART Information Retrieval System, Vector Space Model, Algorithm, Information Retrieval Relevance, Search Engine, Information Retrieval Recall, Automatic Thesaurus Generation Algorithm, Query Expansion Algorithm.
References
References
2019
- (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Rocchio_algorithm Retrieved:2019-10-31.
- The Rocchio algorithm is based on a method of relevance feedback found in information retrieval systems which stemmed from the SMART Information Retrieval System which was developed 1960-1964. Like many other retrieval systems, the Rocchio feedback approach was developed using the Vector Space Model. The algorithm is based on the assumption that most users have a general conception of which documents should be denoted as relevant or non-relevant.[1] Therefore, the user's search query is revised to include an arbitrary percentage of relevant and non-relevant documents as a means of increasing the search engine's recall, and possibly the precision as well. The number of relevant and non-relevant documents allowed to enter a query is dictated by the weights of the a, b, c variables listed below in the Algorithm section.
- ↑ Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: An Introduction to Information Retrieval, page 163-167. Cambridge University Press, 2009.
2009a
- (Manning et al., 2009) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze (2009). "The Rocchio algorithm for relevance feedback". In: Introduction to Information Retrieval (HTML Edition).
- QUOTE: The Rocchio Algorithm is the classic algorithm for implementing relevance feedback. It models a way of incorporating relevance feedback information into the vector space model of Section 6.3 .
Figure 9.3: The Rocchio optimal query for separating relevant and nonrelevant documents.
- QUOTE: The Rocchio Algorithm is the classic algorithm for implementing relevance feedback. It models a way of incorporating relevance feedback information into the vector space model of Section 6.3 .
2009b
- (Manning et al., 2009) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze (2009). "The Rocchio (1971) algorithm.". In: Introduction to Information Retrieval (HTML Edition).
- QUOTE: An application of Rocchio's algorithm. Some documents have been labeled as relevant and nonrelevant and the initial query vector is moved in response to this feedback.
This was the relevance feedback mechanism introduced in and popularized by Salton's SMART system around 1970. In a real IR query context, we have a user query and partial knowledge of known relevant and nonrelevant documents. The algorithm proposes using the modified query [math]\displaystyle{ \vec{q}_m }[/math]:
[math]\displaystyle{ \vec{q}_m = \alpha \vec{q}_0 + \beta\dfrac{1}{\vert D_r\vert}\displaystyle \sum_{\vec{d}_j \in D_{r}} \vec{d}_j -\gamma \dfrac{1}{\vert D_{nr}\vert}\displaystyle\sum_{\vec{d}_j \in D_{nr}} \vec{d}_j \quad\quad (49) }[/math]where [math]\displaystyle{ q_0 }[/math] is the original query vector, [math]\displaystyle{ D_r }[/math] and [math]\displaystyle{ D_{nr} }[/math] are the set of known relevant and nonrelevant documents respectively, and [math]\displaystyle{ \alpha }[/math], [math]\displaystyle{ \beta }[/math], and [math]\displaystyle{ \gamma }[/math] are weights attached to each term. These control the balance between trusting the judged document set versus the query: if we have a lot of judged documents, we would like a higher [math]\displaystyle{ \beta }[/math] and [math]\displaystyle{ \gamma }[/math]. Starting from [math]\displaystyle{ q_0 }[/math], the new query moves you some distance toward the centroid of the relevant documents and some distance away from the centroid of the nonrelevant documents. This new query can be used for retrieval in the standard vector space model (see Section 6.3 ). We can easily leave the positive quadrant of the vector space by subtracting off a nonrelevant document's vector. In the Rocchio algorithm, negative term weights are ignored. That is, the term weight is set to 0. Figure 9.4 shows the effect of applying relevance feedback.
=== 2005 ===
- QUOTE: An application of Rocchio's algorithm. Some documents have been labeled as relevant and nonrelevant and the initial query vector is moved in response to this feedback.
- (Chen & Fu, 2005) ⇒ Zhixiang Chen, and Bin Fu (2005, August). "A quadratic lower bound for Rocchio’s similarity-based relevance feedback algorithm". In International Computing and Combinatorics Conference (pp. 955-964). Springer, Berlin, Heidelberg.
1997
- (Joachims, 1997a) ⇒ Thorsten Joachims. (1997). “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization.” In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997). ISBN:1-55860-486-3
- The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval.
1988
- (Salton & Buckley, 1988) ⇒ Gerard Salton, and Christopher Buckley (1988). "Term-Weighting Approaches In Automatic Text Retrieval". Information processing & management, 24(5), 513-523.
1971
- (Rocchio, 197) ⇒ J. J. Rocchio (1971). “Relevance Feedback in Information Retrieval.” In: Gerard M. Salton, editor. “The SMART Retrieval System: Experiments in Automatic Document Processing." Prentice-Hall.