Support Measure
A Support Measure is an association rule measure of significance that estimates how often an itemset appears in a dataset.
- AKA: Support, Support (Association Rule Measure).
- Example(s):
- Counter-Example(s):
- See: Association Rule Learning Task, Association Rule Learning Task, Mean Absolute Error, Mean Squared Error, ROC Analysis, Negative Predictive Value, Positive Predictive Value, Accuracy, Precision, Recall, Sensitivity, Specificity.
References
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Association_rule_learning#Support Retrieved:2018-10-7.
- Support is an indication of how frequently the itemset appears in the dataset.
The support of [math]\displaystyle{ X }[/math] with respect to [math]\displaystyle{ T }[/math] is defined as the proportion of transactions [math]\displaystyle{ t }[/math] in the dataset which contains the itemset [math]\displaystyle{ X }[/math] .
[math]\displaystyle{ \mathrm{supp}(X) = \frac{|\{t \in T; X \subseteq t\}|}{|T|} }[/math]
In the example dataset, the itemset [math]\displaystyle{ X=\{\mathrm{beer, diapers}\} }[/math] has a support of [math]\displaystyle{ 1/5=0.2 }[/math] since it occurs in 20% of all transactions (1 out of 5 transactions). The argument of [math]\displaystyle{ \mathrm{supp}() }[/math] is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive).
- Support is an indication of how frequently the itemset appears in the dataset.
2011
- (Han, Pei & Kamber, 2011) ⇒ Jiawei Han, Jian Pei, and Micheline Kamber (2011). "Data mining: concepts and techniques" (PDF). Elsevier. ISBN 978-0-12-381479-1
- QUOTE: Let [math]\displaystyle{ I = \{I_1 , I_2 , \cdots , I_m\} }[/math] be an itemset. Let [math]\displaystyle{ D }[/math], the task-relevant data, be a set of database transactions where each transaction [math]\displaystyle{ T }[/math] is a nonempty itemset such that [math]\displaystyle{ T \subseteq I }[/math]. Each transaction is associated with an identifier, called a TID. Let [math]\displaystyle{ A }[/math] be a set of items. A transaction [math]\displaystyle{ T }[/math] is said to contain A if [math]\displaystyle{ A \subseteq T }[/math]. An association rule is an implication of the form [math]\displaystyle{ A \Rightarrow B }[/math], where [math]\displaystyle{ A \subset I,\; B \subset I,\; A = \emptyset,\; B = \emptyset }[/math], and [math]\displaystyle{ A \cap B = \emptyset }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] holds in the transaction set [math]\displaystyle{ D }[/math] with support [math]\displaystyle{ s }[/math], where [math]\displaystyle{ s }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] that contain [math]\displaystyle{ A \cup B }[/math] (i.e., the union of sets [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] say, or, both [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math]). This is taken to be the probability, [math]\displaystyle{ P(A \cup B) }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] has confidence [math]\displaystyle{ c }[/math] in the transaction set [math]\displaystyle{ D }[/math], where [math]\displaystyle{ c }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] containing [math]\displaystyle{ A }[/math] that also contain [math]\displaystyle{ B }[/math]. This is taken to be the conditional probability, [math]\displaystyle{ P(B|A) }[/math]. That is,[math]\displaystyle{ support (A\Rightarrow B) = P(A ∪ B) \quad\quad }[/math] (6.2).
- QUOTE: Let [math]\displaystyle{ I = \{I_1 , I_2 , \cdots , I_m\} }[/math] be an itemset. Let [math]\displaystyle{ D }[/math], the task-relevant data, be a set of database transactions where each transaction [math]\displaystyle{ T }[/math] is a nonempty itemset such that [math]\displaystyle{ T \subseteq I }[/math]. Each transaction is associated with an identifier, called a TID. Let [math]\displaystyle{ A }[/math] be a set of items. A transaction [math]\displaystyle{ T }[/math] is said to contain A if [math]\displaystyle{ A \subseteq T }[/math]. An association rule is an implication of the form [math]\displaystyle{ A \Rightarrow B }[/math], where [math]\displaystyle{ A \subset I,\; B \subset I,\; A = \emptyset,\; B = \emptyset }[/math], and [math]\displaystyle{ A \cap B = \emptyset }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] holds in the transaction set [math]\displaystyle{ D }[/math] with support [math]\displaystyle{ s }[/math], where [math]\displaystyle{ s }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] that contain [math]\displaystyle{ A \cup B }[/math] (i.e., the union of sets [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] say, or, both [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math]). This is taken to be the probability, [math]\displaystyle{ P(A \cup B) }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] has confidence [math]\displaystyle{ c }[/math] in the transaction set [math]\displaystyle{ D }[/math], where [math]\displaystyle{ c }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] containing [math]\displaystyle{ A }[/math] that also contain [math]\displaystyle{ B }[/math]. This is taken to be the conditional probability, [math]\displaystyle{ P(B|A) }[/math]. That is,
2008
- (Hahsler & Hornik, 2007) ⇒ Michael Hahsler, and Kurt Hornik (2007). "New probabilistic interest measures for association rules". Intelligent Data Analysis, 11(5), 437-455. arXiv:0803.0966
- QUOTE: An association rule is a rule of the form [math]\displaystyle{ X \Rightarrow Y }[/math] , where [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are two disjoint sets of items (itemsets). The rule means that if we find all items in [math]\displaystyle{ X }[/math] in a transaction it is likely that the transaction also contains the items in [math]\displaystyle{ Y }[/math].
Association rules are selected from the set of all possible rules using measures of significance and interestingness. Support, the primary measure of significance, is defined as the fraction of transactions in the database which contain all items in a specific rule 3. That is,
[math]\displaystyle{ supp(X \Rightarrow Y ) = supp(X \cup Y ) = \dfrac{c_{XY}}{m}\quad\quad }[/math], (1)
- QUOTE: An association rule is a rule of the form [math]\displaystyle{ X \Rightarrow Y }[/math] , where [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are two disjoint sets of items (itemsets). The rule means that if we find all items in [math]\displaystyle{ X }[/math] in a transaction it is likely that the transaction also contains the items in [math]\displaystyle{ Y }[/math].
- where [math]\displaystyle{ c_{XY} }[/math] represents the number of transactions which contain all items in [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] , and [math]\displaystyle{ m }[/math] is the number of transactions in the database.
2005
- (Hahsler et al., 2005) ⇒ Michael Hahsler, Bettina Grun, and Kurt Hornik (2005). "A computational environment for mining association rules and frequent item sets".
- QUOTE: Support is defined on an itemset as the proportion of transactions in the data set which contain the itemset. All itemsets which have a support above a set minimum support threshold are called frequent itemsets(...)
1993
- (Aggarwal et al.,1993) ⇒ Rakesh Agrawal, Tomasz Imielinski, and Arun Swami (1993, June). "Mining association rules between sets of items in large databases". In Acm sigmod record (Vol. 22, No. 2, pp. 207-216). ACM.