Okapi BM25 Ranking Function

An Okapi BM25 Ranking Function is an text ranking function that estimates text-item relevance to a given search query by utilizing term frequency (TF), inverse document Frequency (IDF), and document length normalization.

Context:
- It can (typically) be described as an evolution of the TF-IDF Weighting Scheme, incorporating probabilistic models of document retrieval.
- It can (typically) use two parameters, \(k_1\) and \(b\), to control term frequency scaling and document length normalization, respectively.
- It can (often) be adapted for different languages and types of documents, showing its versatility across various information retrieval tasks.
Example(s):
- BM25F, which considers document structure.
- BM25+, a more recent enhancement.
- ...
Counter-Example(s):
- A Latent Semantic Analysis (LSA) model that ranks documents based on the concepts derived from term-document matrices.
- A Neural Network-based ranking model that uses deep learning techniques to understand the semantic relevance of documents to queries.
See: TF-IDF, Information Retrieval, Ranking Function, Search Engine, Relevance (Information Retrieval), Probabilistic Relevance Model.

References

2016

(Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/Okapi_BM25 Retrieved:2016-12-8.
- In information retrieval, Okapi BM25 (BM stands for Best Matching) is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others.
  The name of the actual ranking function is BM25. To set the right context, however, it usually referred to as "Okapi BM25", since the Okapi information retrieval system, implemented at London's City University in the 1980s and 1990s, was the first system to implement this function.
  BM25, and its newer variants, e.g. BM25F (a version of BM25 that can take document structure and anchor text into account), represent state-of-the-art TF-IDF-like retrieval functions used in document retrieval.

2010

(Li et al., 2010) ⇒ Yuefeng Li, Abdulmohsen Algarni, and Ning Zhong. (2010). “Mining Positive and Negative Patterns for Relevance Feature Discovery.” In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2010). doi:10.1145/1835804.1835900
- ABSTRACT: It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.

Okapi BM25 Ranking Function

References

2016

2010

Navigation menu

Search