Confidence (Association Rule Measure)
A Confidence is an association rule performance measure that estimates a conditional probability which indicates how often the rule has been found to be true.
- AKA: Confidence Measure, Confidence.
- Context:
- It can be calculated as: [math]\displaystyle{ conf(X\Rightarrow Y) = P(X|Y) = \dfrac{supp (X \cup Y)} {supp (X)} }[/math].
- Example(s)
- Counter-Example(s):
- See: Association Rule Learning Task, Mean Absolute Error, Mean Squared Error, ROC Analysis, Negative Predictive Value, Positive Predictive Value, Accuracy, Precision, Recall, Sensitivity, Specificity.
References
2018a
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Association_rule_learning#Confidence Retrieved:2018-10-7.
- Confidence is an indication of how often the rule has been found to be true.
The confidence value of a rule, [math]\displaystyle{ X \Rightarrow Y }[/math] , with respect to a set of transactions [math]\displaystyle{ T }[/math] , is the proportion of the transactions that contains [math]\displaystyle{ X }[/math] which also contains [math]\displaystyle{ Y }[/math] .
Confidence is defined as: [math]\displaystyle{ \mathrm{conf}(X \Rightarrow Y) = \mathrm{supp}(X \cup Y) / \mathrm{supp}(X) }[/math] For example, the rule [math]\displaystyle{ \{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\} }[/math] has a confidence of [math]\displaystyle{ 0.2/0.2=1.0 }[/math] in the database, which means that for 100% of the transactions containing butter and bread the rule is correct (100% of the times a customer buys butter and bread, milk is bought as well).
Note that [math]\displaystyle{ \mathrm{supp}(X \cup Y) }[/math] means the support of the union of the items in X and Y. This is somewhat confusing since we normally think in terms of probabilities of events and not sets of items. We can rewrite [math]\displaystyle{ \mathrm{supp}(X \cup Y) }[/math] as the probability [math]\displaystyle{ P(E_X \cap E_Y) }[/math] , where [math]\displaystyle{ E_X }[/math] and [math]\displaystyle{ E_Y }[/math] are the events that a transaction contains itemset [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] , respectively.[1]
Thus confidence can be interpreted as an estimate of the conditional probability [math]\displaystyle{ P(E_Y | E_X) }[/math], the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.[2] [3]
- Confidence is an indication of how often the rule has been found to be true.
- ↑ Michael Hahsler (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules. http://michael.hahsler.net/research/association_rules/measures.html
- ↑ Hahsler, Michael (2005). "Introduction to arules – A computational environment for mining association rules and frequent item sets" (PDF). Journal of Statistical Software.
- ↑ Hipp, J.; Güntzer, U.; Nakhaeizadeh, G. (2000). “Algorithms for association rule mining --- a general survey and comparison". ACM SIGKDD Explorations Newsletter. 2: 58. doi:10.1145/360402.360421.
2011
- (Han, Pei & Kamber, 2011) ⇒ Jiawei Han, Jian Pei, and Micheline Kamber (2011). "Data mining: concepts and techniques" (PDF). Elsevier. pp. 266 ISBN 978-0-12-381479-1
- QUOTE: Let [math]\displaystyle{ I = \{I_1 , I_2 , \cdots , I_m\} }[/math] be an itemset. Let [math]\displaystyle{ D }[/math], the task-relevant data, be a set of database transactions where each transaction [math]\displaystyle{ T }[/math] is a nonempty itemset such that [math]\displaystyle{ T \subseteq I }[/math]. Each transaction is associated with an identifier, called a TID. Let [math]\displaystyle{ A }[/math] be a set of items. A transaction [math]\displaystyle{ T }[/math] is said to contain A if [math]\displaystyle{ A \subseteq T }[/math]. An association rule is an implication of the form [math]\displaystyle{ A \Rightarrow B }[/math], where [math]\displaystyle{ A \subset I,\; B \subset I,\; A = \emptyset,\; B = \emptyset }[/math], and [math]\displaystyle{ A \cap B = \emptyset }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] holds in the transaction set [math]\displaystyle{ D }[/math] with support [math]\displaystyle{ s }[/math], where [math]\displaystyle{ s }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] that contain [math]\displaystyle{ A \cup B }[/math] (i.e., the union of sets [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] say, or, both [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math]). This is taken to be the probability, [math]\displaystyle{ P(A \cup B) }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] has confidence [math]\displaystyle{ c }[/math] in the transaction set [math]\displaystyle{ D }[/math], where [math]\displaystyle{ c }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] containing [math]\displaystyle{ A }[/math] that also contain [math]\displaystyle{ B }[/math]. This is taken to be the conditional probability, [math]\displaystyle{ P(B|A) }[/math]. That is,[math]\displaystyle{ support (A\Rightarrow B) = P(A ∪ B) \quad\quad }[/math] (6.2)
[math]\displaystyle{ confidence (A\Rightarrow B) =P(B|A)\quad\quad }[/math](6.3)
(...) From Eq. (6.3), we have
[math]\displaystyle{ confidence (A\Rightarrow B) = P(B|A) = \dfrac{support (A \cup B)} {support (A)} = .... }[/math]
- QUOTE: Let [math]\displaystyle{ I = \{I_1 , I_2 , \cdots , I_m\} }[/math] be an itemset. Let [math]\displaystyle{ D }[/math], the task-relevant data, be a set of database transactions where each transaction [math]\displaystyle{ T }[/math] is a nonempty itemset such that [math]\displaystyle{ T \subseteq I }[/math]. Each transaction is associated with an identifier, called a TID. Let [math]\displaystyle{ A }[/math] be a set of items. A transaction [math]\displaystyle{ T }[/math] is said to contain A if [math]\displaystyle{ A \subseteq T }[/math]. An association rule is an implication of the form [math]\displaystyle{ A \Rightarrow B }[/math], where [math]\displaystyle{ A \subset I,\; B \subset I,\; A = \emptyset,\; B = \emptyset }[/math], and [math]\displaystyle{ A \cap B = \emptyset }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] holds in the transaction set [math]\displaystyle{ D }[/math] with support [math]\displaystyle{ s }[/math], where [math]\displaystyle{ s }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] that contain [math]\displaystyle{ A \cup B }[/math] (i.e., the union of sets [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] say, or, both [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math]). This is taken to be the probability, [math]\displaystyle{ P(A \cup B) }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] has confidence [math]\displaystyle{ c }[/math] in the transaction set [math]\displaystyle{ D }[/math], where [math]\displaystyle{ c }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] containing [math]\displaystyle{ A }[/math] that also contain [math]\displaystyle{ B }[/math]. This is taken to be the conditional probability, [math]\displaystyle{ P(B|A) }[/math]. That is,
2008
- (Hahsler & Hornik, 2007) ⇒ Michael Hahsler, and Kurt Hornik (2007). "New probabilistic interest measures for association rules". Intelligent Data Analysis, 11(5), 437-455. arXiv:0803.0966
- QUOTE: Confidence is defined by Agrawal et al. as
[math]\displaystyle{ conf(X \Rightarrow Y ) = \dfrac{supp(X \cup Y )}{supp(X)}\quad\quad }[/math], (5)
where [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are two disjoint itemsets. Often confidence is understood as an estimate of the conditional probability [math]\displaystyle{ P(E_Y |E_X) }[/math], were [math]\displaystyle{ E_X (E_Y ) }[/math] is the event
- QUOTE: Confidence is defined by Agrawal et al. as
that [math]\displaystyle{ X (Y ) occurs }[/math] in a transaction …
1993
- (Aggarwal et al.,1993) ⇒ Rakesh Agrawal, Tomasz Imielinski, and Arun Swami (1993, June). "Mining association rules between sets of items in large databases". In Acm sigmod record (Vol. 22, No. 2, pp. 207-216). ACM.