Item Recommendations System Performance Measure

From GM-RKB
Jump to navigation Jump to search

An Item Recommendations System Performance Measure is a ranked subset prediction task performance measure for an item recommendations task.



References

2020

2018

Precision at K is the accuracy of the predicted recommendations with respect to the actual purchases: : [math]\displaystyle{ Precision@K = \frac{1}{C} \frac {\Sigma^{C-1}_{c=0} \mid \{Rec_c\} \cap \{T_c\}) \mid} {K}, (1) }[/math] where K is the position/rank of a recommendation, c is the customer index, Recc is top K recommended items for customer c, Tc is actual consumptions for customer c represented as the set of items the customer purchased in the evaluation period (where interaction can be purchases, watches, listens), jRecj is the number of items in set Rec, Rec \ T is the intersection between sets Rec and T, and C is the number of customers.

While having high precision is necessary, it is not sufficient. A personalized recommender should also recommend diverse set of items (Adomavicius & Kwon, 2012). For example, if precision is high with no diversity, then recommendations looks like a hall of mirrors showing only products in a single topic. Therefore, to guarantee the diversity of recommendations, we use products converted coverage at K. It captures the number of unique products being recommended at top K and at the same time purchased: : [math]\displaystyle{ PCC@K = \frac{1}{P} \mid \cup_c^{C-1} (\{Rec_c\} \cap \{T_c\}) \mid , (2) }[/math] where [math]\displaystyle{ \cup_c^{C-1} (X_c) }[/math] represents union of sets [math]\displaystyle{ X_0,X_1,...,X_{C-1}, P }[/math] is total number of products.

Using held-out labels to measure a recommender’s efficacy is leaking future purchase information (Covington et al., 2016). Consequently, there exists the risk of having inconsistent performance between offline and online evaluation. In order to reduce this gap and emulate real production environment, the test metrics in this paper are measured on future purchases instead of held out data.

2006