Machine Learning (ML) Dataset
(Redirected from Learning Dataset)
Jump to navigation
Jump to search
A Machine Learning (ML) Dataset is a dataset used in a machine learning task.
- Context:
- Task Input: Raw Data Records, Data Schema
- Task Output: Processed Dataset, Feature Vectors
- Task Performance Measure: Dataset Quality Metrics such as:
- ...
- It can (typically) contain learning records for model training and evaluation.
- It can (typically) serve as a task input to a machine learning algorithm.
- It can (typically) be processed by a data preparation task before use.
- ...
- It can range from being a Numerical ML Dataset to being a Categorical ML Dataset, depending on its data type.
- It can range from being an Unlabeled ML Dataset to being a Labeled ML Dataset, depending on its label availability.
- It can range from being a Single-Predictor Learning Dataset to being a Multi-Predictor Learning Dataset, depending on its feature dimensionality.
- It can range from being a Training Dataset to being an Evaluation Dataset, depending on its usage purpose.
- It can range from being a Static ML Dataset to being a Streaming ML Dataset, depending on its data collection method.
- It can range from being a Balanced ML Dataset to being an Imbalanced ML Dataset, depending on its class distribution.
- It can range from being a Small Scale Dataset to being a Large Scale Dataset, depending on its data volume.
- It can range from being a Clean ML Dataset to being a Noisy ML Dataset, depending on its data quality.
- ...
- Examples:
- Standard ML Datasets, such as:
- Classification Datasets, such as:
- Regression Datasets, such as:
- Special Purpose Datasets, such as:
- Domain-Specific Datasets, such as:
- ...
- Standard ML Datasets, such as:
- Counter-Examples:
- User Profile Dataset, which is for user management rather than learning.
- System Log Dataset, which is for monitoring rather than learning.
- Reference Dataset, which is for lookup rather than learning.
- Backup Dataset, which is for data preservation rather than learning.
- See: Data Record Set, User-Interaction Data, Feature Engineering, Data Preprocessing, Model Training, Machine Learning System, Dataset Quality Assessment.
References
2017
- (Sammut & Webb, 2017) ⇒ (2017) "Data Set". In: Sammut & Webb, 2017.
- QUOTE: A data set is a collection of data used for some specific machine learning purpose. A training set is a data set that is used as input to a learning system, which analyzes it to learn a model. A test set or evaluation set is a data set containing data that are used to evaluate the model learned by a learning system. A training set may be divided further into a growing set and a pruning set. Where the training set and the test set contain disjoint sets of data, the test set is known as a holdout set.