Sequential Data Outlier Detection Task
A Sequential Data Outlier Detection Task is an outlier detection task that is restricted to anomalous sequence events in sequential data.
- AKA: Sequential Anomaly Detection.
- Context:
- It can range from being a Discrete Sequential Data Outlier Detection Task to being a Continuous Sequential Data Outlier Detection Task.
- It can range from being a Within-Sequence Outlier Detection Task to being a Entire-Sequence Outlier Detection Task.
- It can be solved by an Sequential Anomaly Detection System (that implements a sequential anomaly detection algorithm).
- Example(s):
- a Temporal Data Outlier Detection Task, such as:
- DNA Gene-Sequence Outlier Detection.
- Spatial Outlier Detection.
- a Black Swan Prediction Task, such as predict the likelihood of a stock market crash in the next year.
- …
- Counter-Example(s):
- See: Detection Task, Classification Task, Outlier.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/anomaly_detection Retrieved:2015-10-5.
- In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.
In particular in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular unsupervised methods) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.
Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then testing the likelihood of a test instance to be generated by the learnt model.
- In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.
2013
- (Hauskrecht et al., 2013) ⇒ Milos Hauskrecht, Iyad Batal, Michal Valko, Shyam Visweswaran, Gregory F Cooper, and Gilles Clermont. (2013). “Outlier Detection for Patient Monitoring and Alerting.” In: Journal of Biomedical Informatics, 46(1). doi:10.1016/j.jbi.2012.08.004
- QUOTE: Anomaly detection is an active area of current machine learning and data mining research. An outlier (or a deviation or an anomaly) is an observation or a pattern in the data that appears to deviate significantly from other observations or patterns in the same data [ 8 ] and [ 9 ]. Anomaly detection methods have been applied to problems as diverse as monitoring of credit card transactions, detection of network intrusions, and detection of technical system failures.
2012
- (Chandola et al., 2012) ⇒ Varun Chandola, Arindam Banerjee, and Vipin Kumar. (2012). “Anomaly Detection for Discrete Sequences: A Survey.” In: IEEE Transactions on Knowledge and Data Engineering Journal, 24(5). doi:10.1109/TKDE.2010.235
- QUOTE: Sequence data is found in a wide variety of application domains such as intrusion detection, bio-informatics, weather prediction, system health management, etc. Hence anomaly detection for sequence data is an important topic of research. There is extensive work on anomaly detection techniques [1–3] that look for individual objects that are different from normal objects. These techniques do not take the sequence structure of the data into consideration.
2009
- (Chandola et al., 2009) ⇒ Varun Chandola, Arindam Banerjee, and Vipin Kumar. (2009). “Anomaly Detection: A survey.” In: ACM Computing Surveys, 41(3) doi:10.1145/1541880.1541882
- QUOTE: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection.
2000
- (Maxion & Tan, 2000) ⇒ Roy A. Maxion, and Kymie M. C. Tan. (2000). “Benchmarking Anomaly-Based Detection Systems.” In: Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8). ISBN:0-7695-0707-7