Human-Labeled Dataset

Context:
- It can (typically) include a variety of data types, such as Text, Image, Audio, and Video.
- It can (often) be used to assess the performance of ML Systems.
- It can range from being a Small-Scale Human-Labeled Dataset to being a Large-Scale Human-Labeled Dataset.
- It can be created through processes such as crowdsourcing or by professional data annotation services.
- ...
Example(s):
- the ImageNet dataset, which is used extensively in Computer Vision to train and evaluate image recognition models.
- the CommonClaim dataset, a dataset of 20,000 statements labeled by humans as common-knowledge-true, common-knowledge-false, or neither, used for testing language models.
- ...
Counter-Example(s):
- AI-Labeled Dataset/Automatically Labeled Data.
- Synthetic Dataset.
- Unlabeled Datas, which do not have associated labels and are often used in unsupervised learning tasks.
- ...
See: Data Annotation, Supervised Learning, Ground Truth, Manual Labeling.

References

(Casper et al., 2023) ⇒ Stephen Casper, Jason Lin, Joe Kwon, Gatlen Culp, and Dylan Hadfield-Menell. (2023). "Explore, Establish, Exploit: Red Teaming Language Models from Scratch." In: arXiv:2306.09442. DOI:10.48550/arXiv.2306.09442
- NOTE: It highlights the creation of the CommonClaim dataset, a significant contribution that provides human-labeled data for evaluating the truthfulness of statements generated by language models.

(Radford et al., 2018) ⇒ Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. (2018). “Improving Language Understanding by Generative Pre-Training.”
- NOTE: It demonstrates the effectiveness of human-labeled datasets in refining the performance of language models through discriminative fine-tuning, which relies on supervised learning from precisely annotated examples.
- NOTE: It emphasizes the value of human-labeled data in transitioning from unsupervised pre-training to supervised fine-tuning stages, enhancing the model's ability to generalize from broad linguistic inputs to specific task-oriented outputs.