Concept Name |
Concept Definition |
Synonyms
|
Accuracy Metric |
An Accuracy Metric is a Classification Model Performance Metric based on the Proportion of Classifier's Correct Predictions to Incorrect Predictions on Unseen Labeled Testing Records.
|
Accuracy Estimation |
An Accuracy Estimation Process is a Validation Process that approximates the true value of a Classification Model's Accuracy based on a Data Sample.
|
Association Learning Task |
An Association Learning Task is a Learning Task that requires the discovery of Associations.
|
Categorical Set |
A Categorical Set is an Unordered Set that is Finite Set.
|
Classification Function |
A Classification Function is a Function whose Function Range is a Categorical Set (with Categorical Data Values). |
Classifier
|
Confounding Variable |
A Confounding Variable is a Random Variable in a Statistical Model that Correlates with both a Dependent Variable and an Independent Variable. |
Confounder.
|
Confusion Matrix |
A Confusion Matrix is a Matrix that represents the count of Probabilistic Classification Function's Predictions with respect to the Actuals on some Labeled Learning Set.
|
Cost-Benefit Function |
A Cost-Benefit Function is an Ordinal-Valued Function that assigns a Value to each Choice.
|
Data Cleaning Task |
A Data Cleaning Task requires the Detection and Removal of Erroneous Data Values and Data Records. |
|
Data Mining Activity |
A Data Mining Activity is an Activity performed by a Data Mining Practitioner to solve a Data Mining Task. |
|
Data Mining Discipline |
A Data Mining Discipline is an Academic Discipline that focuses on Data Analysis of large datasets from real-world problems. |
|
Data Mining Task |
A Data Mining Task requires automated Discovery of Patterns typically to support human Decision making. |
|
Data Mining Practice |
A Data Mining Practice is the Applied Practice of solving Real-World Data Mining Tasks. |
|
Data Record Attribute |
A Data Record Attribute is a 2-Tuple composed of a Value and a Metadata Record that represents a single property of a Data Record.
|
Data Record Set |
A Data Record Set is a set of Data Records that share the same Data Record Schema. |
Dataset
|
Eager Learning Algorithm |
An Eager Learning Algorithm is a Learning Algorithm that involves a Training Phase (to induce a Total Predictive Function). |
|
Error Rate Metric |
An Error Rate Metric is the Inverse Function of an Accuracy Metric.
|
Exploratory Data Analysis Task |
An Exploratory Data Analysis Task is a Data Analysis Task that aims to formulate Hypotheses. |
|
|
False Negative Rate |
A False Negative Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will make the Incorrect Prediction of mapping a False Test Instance to a Negative Prediction.
|
False Positive Rate |
A False Positive Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will Incorrectly Predict that a False Test Instance is a True Test Instance (i.e. make a Positive Prediction). |
FPR, Type 1 Error Rate
|
Feature Vector |
See Vectorized Learning Record.
|
Finite Ordered Set |
A Finite Ordered Set is an Ordered Set that is a Finite Set. |
Ordinal Set
|
IID Sample |
An IID Random Variable Set is a Random Variable Set where all random variables are in a Statistical Independence Relation and in an Identical Distribution Relation. |
|
Information Extraction Task |
An Information Extraction Task requires the populating a Data Structure from the Data contained in a set of Artifacts. |
|
Information Retrieval Task |
An Information Retrieval Task requires the identification of Artifacts from a Corpus that are relevant to a specified Query.
|
Instance-based Learning Algorithm |
An Instance-based Learning Algorithm is a Learning Algorithm that does not generalize in terms of a higher language than the instances themselves. |
|
Lazy Learning Algorithm |
A Lazy Learning Algorithm is a Supervised Learning Algorithm that does not involve a Training Phase.
|
Learning Record Attribute |
A Learning Record Attribute is a Data Attribute of a Learning Record. |
Feature
|
Learning Record |
A Learning Record is a Data Record that can be used as Input to a Learning Task.
|
Example, Instance
|
Machine Learning Research |
A Machine Learning Research is a Research Domain that investigates Machines improving Performance over time (such as via Reasoning with Inductive Logic). |
|
Missing Data Value |
A Missing Data Value is Data Record Attribute with no Data Value. |
|
Model-based Learning Algorithm |
A Model-based Learning Algorithm is a Learning Algorithm that represent their Predictive Model in a Formal Language that is more general than the Formal Language used to describe the Data. |
|
Numeric Interval |
A Numeric Interval is a Contiguous Numeric Subsequence of a Formal Number Sequence.
|
OLAP Task |
Online Analytical Processing Task is an Interactive Data Analysis Task that is restricted the summarizing past behavior.
|
Optimization Task |
An Cost Function Optimization Task is a General Task Type where an Optimal Solution must be provided (that optimizes a Cost Function).
|
Posthoc Analysis Task |
A Posthoc Analysis Task analyzes collected Data Records that were not intentionally collected to test a Hypothesis. |
|
Precision Metric |
A Precision Metric is a Performance Metric of the Probability that a given Classification Model's Positive Prediction is a Correct Prediction. |
|
Predictive Function |
A Predictive Function is a Function that can Map a Learning Record to a Target Value. |
Model Target Function
|
Randomized Controlled Experiment |
A Randomized Controlled Experiment is a Scientific Experiment that tests a Treatment on a Randomly created Treatment Group and a Placebo on a Distinct and Randomly created Control Group. |
|
Recall Metric |
A Recall Metric (is a Performance Metric for a Predictive Relation that) Estimates the Probability of a True Positive Prediction (a Correct Prediction for True Test Instances). |
Sensitivity True Positive Rate.
|
Regression Algorithm |
A Regression Algorithm is a Supervised Learning Algorithm that can solve a Regression Task. |
Regressor
|
Sequence |
A Sequence is a Multiset of Sequence Members in a Partial Order Relation.
|
Semi-Supervised Learning Task |
A Semi-Supervised Learning Task is a Supervised Learning Task with access to an Unlabeled Training Records. |
|
set |
A set is an Abstract Entity that can Represent Zero or more Distinct Set Members.
|
Statistical Hypothesis Test |
A Statistical Hypothesis Test is a Data Analysis Task that seeks to Validate a Hypothesis. |
Confirmatory Data Analysis
|
Supervised Learning Task |
A Supervised Learning Task is a Learning Task where some Labeled Training Records are provided.
|
Target Attribute |
A Target Attribute is a Learning Record Attribute whose behavior is to be modeled by a Supervised Learning Task.
|
Testing Record |
A Testing Record is a Data Record with a Target Class that is a available during a Learning Task's Training Phase.
|
Text Mining Task |
A Text Mining Task is a Data Mining Task whose input largely involves Text Data. |
Text Analysis
|
Training Record |
A Training Record is a Data Record that is a available during a Learning Task's Training Phase. |
Case, Examplar, Example
|
True Negative Rate |
A True Negative Rate is the Probability that a Predictive Logic Relation will correctly map a False Test Instance to a Negative Prediction. |
Specificity
|
Tuple |
A Tuple is a Finite Sequence of Fixed Sequence Length |
|
Unsupervised Learning Task |
An Unsupervised Learning Task is a Learning Task where no Labeled Training Cases are provided. |
|
Vector |
A Vector is a Number Tuple that Represents a point in some Vector Space. |
|