Frequent Data Pattern
A Frequent Data Pattern is a data pattern that occurs frequently in a dataset.
- Context:
- It can be discovered by a Frequent Pattern Mining System that can implement a Frequent Pattern Mining Algorithm to solve a Frequent Pattern Mining Task.
- It can be associated to a Frequent Pattern Rule, such as an association rule.
- It can range from being a Unconstrained Frequent Pattern to being a Constrained Frequent Pattern.
- It can range from being a Frequent Itemset, to being a Frequent Structural Pattern, to being a Frequent Sequential Pattern.
- …
- Example(s):
- a Frequent Itemset, for itemset patterns in item datasets,
- a Frequent n-Gram, for n-gram patterns in string datasets,
- a Frequent Structural Pattern such as:
- a Frequent Tree Pattern,
- a Frequent Subtree, for subtree patterns in tree datasets,
- a Frequent Subgraph, for subgraph patterns in graph datasets.
- …
- Counter-Example(s):
- a Infrequent Data Pattern.
- an Outlier.
- See: Co-Occurrence Statistic, Basket Analysis; Constraint-Based Mining; Graph Mining; Tree Mining.
References
2017
- (Toivonen, 2017) ⇒ Hannu Toivonen. (2017). "Frequent Pattern". In: (Sammut & Webb, 2017) DOI: 10.1007/978-1-4899-7687-1_318
- QUOTE: Given a set [math]\displaystyle{ \mathcal{D} }[/math] of examples, a language [math]\displaystyle{ \mathcal{L} }[/math] possible patterns, and a minimum frequency [math]\displaystyle{ min_{-}f_r }[/math] , every pattern [math]\displaystyle{ \theta \in \mathcal{L} }[/math] that occurs at least in the minimum number of examples, i.e., [math]\displaystyle{ |\{ \mathcal{e}\; \in \;\mathcal{D}|\theta \text{ occurs in } \mathcal{e}\}| \geq min_{-}f_r }[/math] , is a frequent pattern. Discovery of all frequent patterns is a common data mining task. In its most typical form, the patterns are frequent itemsets. A more general formulation of the problem is constraint-based mining.
(...) Frequent patterns are often used as components in larger data mining or machine learning tasks. In particular, discovery of frequent itemsets was actually first introduced as an intermediate step in association rule mining (Agrawal et al. 1993) (“frequent itemsets” were then called “large”). The frequency and confidence of every valid association rule [math]\displaystyle{ X \to Y }[/math] are obtained simply as the frequency of [math]\displaystyle{ X \cup Y }[/math] and the ratio of frequencies of [math]\displaystyle{ X \cup Y }[/math] and [math]\displaystyle{ X }[/math], respectively.
Frequent patterns can be useful as features for further learning tasks. They may capture shared properties of examples better than individual original features, while the frequency threshold gives some guarantee that the constructed features are not so likely just noise. However, other criteria besides frequency are often used to choose a good set of candidate patterns.
- QUOTE: Given a set [math]\displaystyle{ \mathcal{D} }[/math] of examples, a language [math]\displaystyle{ \mathcal{L} }[/math] possible patterns, and a minimum frequency [math]\displaystyle{ min_{-}f_r }[/math] , every pattern [math]\displaystyle{ \theta \in \mathcal{L} }[/math] that occurs at least in the minimum number of examples, i.e., [math]\displaystyle{ |\{ \mathcal{e}\; \in \;\mathcal{D}|\theta \text{ occurs in } \mathcal{e}\}| \geq min_{-}f_r }[/math] , is a frequent pattern. Discovery of all frequent patterns is a common data mining task. In its most typical form, the patterns are frequent itemsets. A more general formulation of the problem is constraint-based mining.
2016
- (Yan, 2016) ⇒ Xifeng Yan (2016). "Frequent Pattern Mining". In: KDD Topics 2016.
- QUOTE: Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency no less than a user-specified threshold. For example, a set of items, such as milk and bread, that appear frequently together in a transaction data set, is a frequent itemset. A subsequence, such as buying first a PC, then a digital camera, and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential pattern. A substructure can refer to different structural forms, such as subgraphs, subtrees, or sublattices, which may be combined with itemsets or subsequences. If a substructure occurs frequently in a graph database, it is called a (frequent) structural pattern. Finding frequent patterns plays an essential role in mining associations, correlations, and many other interesting relationships among data. Moreover, it helps in data indexing, classification, clustering, and other data mining tasks as well. Frequent pattern mining is an important data mining task and a focused theme in data mining research. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications [1]. A few text books are available on this topic, e.g., [2].
- ↑ Frequent Pattern Mining: Current Status and Future Directions, by J. Han, H. Cheng, D. Xin and X. Yan, 2007 Data Mining and Knowledge Discovery archive, Vol. 15 Issue 1, pp. 55 – 86, 2007.
- ↑ Frequent Pattern Mining, Ed. Charu Aggarwal and Jiawei Han, Springer, 2014.
2010
- (Batal, 2010) ⇒ Iyad Batal (2010). "What Is Frequent Pattern Analysis?"
- QUOTE: A Frequent pattern is a pattern (a set of items, subsequences, subgraphs, etc.) that occurs frequently in a data set.
2006
- (Yin, Han & Yu, 2006) ⇒ Xiaoxin Yin, Jiawei Han, and Philip S. Yu. (2006). “LinkClus: efficient clustering via heterogeneous semantic links.” In: Proceedings of the 32nd International Conference on Very large data bases (VLDB 2006).
- QUOTE: The problem of finding groups of nodes with high tightness can be reduced to the problem of finding frequent patterns (Agrawal et al., 1993). A tight group is a set of nodes that are co-linked with many objects of other types, just like a frequent pattern is a set of items that co-appear in many transactions. …
2004
- (Han et al., 2004) ⇒ Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. (2004). “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach.” In: Journal Data Mining and Knowledge Discovery, 8(1). doi:10.1023/B:DAMI.0000005258.31418.83