Normalized Discounted Cumulative Gain (NDCG) Measure
(Redirected from Normalized Discounted Cummulative Gain (NDCG))
Jump to navigation
Jump to search
A Normalized Discounted Cumulative Gain (NDCG) Measure is normalized version of a discounted cumulative gain measure that can account for differently sized output ranked lists.
- Context:
- It can (typically) be used as an Ranking Performance Measure, such as an IR perf. measure.
- It can be calculated by an NDCG Calculation System, such as [1].
- …
- Counter-Example(s):
- See: Cumulative Gain, Ranked-Output Performance Measure.
References
2015
- https://www.kaggle.com/wiki/NormalizedDiscountedCumulativeGain
- QUOTE: Normalized discounted cumulative gain (NDCG) measures the performance of a recommendation system based on the graded relevance of the recommended entities. It varies from 0.0 to 1.0, with 1.0 representing the ideal ranking of the entities. This metric is commonly used in information retrieval and to evaluate the performance of web search engines.
- Parameters: k - The maximum number of entities that can be recommended
- Detailed Description:
- DCGk=∑i=1k2reli−1 log_2(i+1)
- IDCGk: is the maximum possible (ideal) DCG for a given set of queries, documents, and relevances.
- nDCGk=DCG_k / IDCG_k
- QUOTE: Normalized discounted cumulative gain (NDCG) measures the performance of a recommendation system based on the graded relevance of the recommended entities. It varies from 0.0 to 1.0, with 1.0 representing the ideal ranking of the entities. This metric is commonly used in information retrieval and to evaluate the performance of web search engines.
2012
- http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
- QUOTE: Search result lists vary in length depending on the query. Comparing a search engine's performance from one query to the next cannot be consistently achieved using DCG alone, so the cumulative gain at each position for a chosen value of [math]\displaystyle{ p }[/math] should be normalized across queries. This is done by sorting documents of a result list by relevance, producing the maximum possible DCG till position [math]\displaystyle{ p }[/math], also called Ideal DCG till that position. For a query, the normalized discounted cumulative gain, or nDCG, is computed as: :[math]\displaystyle{ \mathrm{nDCG_{p}} = \frac{DCG_{p}}{IDCG{p}} }[/math] The nDCG values for all queries can be averaged to obtain a measure of the average performance of a search engine's ranking algorithm. Note that in a perfect ranking algorithm, the [math]\displaystyle{ DCG_p }[/math] will be the same as the [math]\displaystyle{ IDCG_p }[/math] producing an nDCG of 1.0. All nDCG calculations are then relative values on the interval 0.0 to 1.0 and so are cross-query comparable.
The main difficulty encountered in using nDCG is the unavailability of an ideal ordering of results when only partial relevance feedback is available.
- QUOTE: Search result lists vary in length depending on the query. Comparing a search engine's performance from one query to the next cannot be consistently achieved using DCG alone, so the cumulative gain at each position for a chosen value of [math]\displaystyle{ p }[/math] should be normalized across queries. This is done by sorting documents of a result list by relevance, producing the maximum possible DCG till position [math]\displaystyle{ p }[/math], also called Ideal DCG till that position. For a query, the normalized discounted cumulative gain, or nDCG, is computed as: :[math]\displaystyle{ \mathrm{nDCG_{p}} = \frac{DCG_{p}}{IDCG{p}} }[/math] The nDCG values for all queries can be averaged to obtain a measure of the average performance of a search engine's ranking algorithm. Note that in a perfect ranking algorithm, the [math]\displaystyle{ DCG_p }[/math] will be the same as the [math]\displaystyle{ IDCG_p }[/math] producing an nDCG of 1.0. All nDCG calculations are then relative values on the interval 0.0 to 1.0 and so are cross-query comparable.
2010
- (Wang, Jin, & Chang, 2010) ⇒ Fengxia Wang, Huixia Jin, and Xiao Chang. (2010). “Relevance Vector Ranking for Information Retrieval.” In: Journal of Convergence Information Technology, 5(9).
- QUOTE: The normalized discounted cumulative gain (NDCG) (Kekäläinen, 2005) is adopted as a performance measure, which takes the form [math]\displaystyle{ NDGC@k = N(k) \bullet \sum^{k}_{j=1}\frac{2^{r(j)}-1}{log(1+j)} }[/math] where [math]\displaystyle{ N(k) }[/math] is the NDCG at [math]\displaystyle{ k }[/math] of ideal ranking list. It is used as a normalization factor of the NDCG at [math]\displaystyle{ k }[/math] in the ranking list of prediction results.
2008
- (Manning et al., 2008) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. (2008). “Introduction to Information Retrieval." Cambridge University Press. ISBN:0521865719.
- QUOTE: A final approach that has seen increasing adoption, especially when employed with machine learning approaches to ranking svm-ranking is measures of cumulative gain , and in particular normalized discounted cumulative gain (NDCG). NDCG is designed for situations of non-binary notions of relevance (cf. Section 8.5.1). Like precision at $k$, it is evaluated over some number $k$ of top search results. For a set of queries $Q$, let $R(j,d)$ be the relevance score assessors gave to document $d$ for query $j$. Then, [math]\displaystyle{ \mbox{NDCG}(Q, k) = \frac{1}{\vert Q\vert} \sum_{j=1}^{\vert Q\vert} Z_{kj} \sum_{m=1}^{k} \frac{2^{R(j,m)}-1}{\log_2(1+m)}, (44) }[/math] where [math]\displaystyle{ Z_{kj} }[/math] is a normalization factor calculated to make it so that a perfect ranking's NDCG at $k$ for query $j$ is 1. For queries for which $k' < k$ documents are retrieved, the last summation is done up to $k'$.
2008
- (Chakrabarti et al., 2008) ⇒ Soumen Chakrabarti, Rajiv Khanna, Uma Sawant, and Chiru Bhattacharyya. (2008). “Structured Learning for Non-smooth Ranking Losses.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401906
- QUOTE: Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almost-linear-time algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework.
2005
- (Kekäläinen, 2005) ⇒ Jaana Kekäläinen. (2005). “Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems.” In: Inf. Process. Management, 41(5). doi:10.1016/j.ipm.2005.01.004
2002
- (Järvelin et al., 2002) ⇒ Kalervo Järvelin, and Jaana Kekäläinen. (2002). “Cumulated Gain-based Evaluation of IR Techniques.” In: ACM Transactions on Information Systems (ACM TOIS), 20(4). doi:10.1145/582415.582418
- QUOTE: Furthermore, the normalized nCG and nDCG measures support evaluation.
* They represent performance as relative to the ideal based on a known (possibly large) recall base of graded relevance judgments.
* The performance differences between IR techniques are also normalized in relation to the ideal thereby supporting the analysis of performance differences.
- QUOTE: Furthermore, the normalized nCG and nDCG measures support evaluation.