Apriori Algorithm
Jump to navigation
Jump to search
An Apriori Algorithm is an iterative association rule learning algorithm that computes the frequent itemsets where iteration [math]\displaystyle{ i }[/math] computes all frequent i-itemsets (itemsets with [math]\displaystyle{ i }[/math] elements) and each iteration has a candidate generation step, candidate counting step and candidate selection step.
- Context:
- Its Apriori Candidate Generation Step generates all i-itemsets.
- It generates large numbers of subsets
- Its Apriori Candidate Counting Step counts the support of all i-itemsets.
- It uses a breadth-first search and a tree structure.
- Its Apriori Candidate Selection Step selects itemsets above required threshold.
- Its Apriori Candidate Generation Step generates all i-itemsets.
- See: Association Rule Learning Task; Frequent-Itemset Mining Algorithm; Association Rule; Basket Analysis; Constraint-Based Mining; Frequent Itemset; Frequent Pattern.
References
2011
- (Toivonen, 2011a) ⇒ Hannu Toivonen. (2011). “Apriori Algorithm.” In: (Sammut & Webb, 2011) p.40
2009
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Apriori_algorithm
- In computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).
- As is common in association rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
- Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It generates candidate item sets of length [math]\displaystyle{ k }[/math] from item sets of length [math]\displaystyle{ k-1 }[/math]. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent [math]\displaystyle{ k }[/math]-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates.
- Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all [math]\displaystyle{ 2^{|S|}-1 }[/math] of its proper subsets.
- (Wu & Kumar, 2009) ⇒ Xindong Wu, and Vipin Kumar, editors. (2009). “The Top Ten Algorithms in Data Mining.” Chapman & Hall. ISBN:1420089641
1994
- Rakesh Agrawal, and Rakesh Srikant. (1994). “Fast Algorithms for Mining Association Rules.” In: Proceedings of VLDB 1994. ISBN 1-55860-153-8.
- Heikki Mannila, H. Toivonen, and A. I. Verkamo. (1994). “Efficient Algorithms for Discovering Association Rules.” In: AAAI Workshop on Knowledge Discovery in Databases (SIGKDD).
1993
- (Agrawal et al., 1993) ⇒ Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. (1993). “Mining Association Rules Between Sets of Items in Large Databases.” In: Proceedings of ACM SIGMOD Conference (SIGMOD 1993). do>10.1145/170035.170072.