Biased Dataset
(Redirected from Data Bias)
Jump to navigation
Jump to search
A Biased Dataset is a dataset that is not representative data.
- See: Biased Sample.
References
2021
- https://www.fastcompany.com/90636406/aspiring-to-zero-bias-in-ai
- QUOTE: Understanding data bias is the first step in preventing faulty or compromised judgments that will be amplified and have unintended consequences. Good training data—the human-labeled data sets underpinning AI—is vital. But unconscious bias is pervasive. Here are a few ways it manifests:
- Sample bias: Sample bias—or selection bias—is a data set that doesn’t reflect the diversity of the environment in which the machine-learning model is going to be run. An example is when a facial-recognition system data set draws predominantly from white men. An algorithm trained from this data set will struggle to recognize women and people of different ethnicities.
- Exclusion bias: Exclusion bias often happens in the pre-processing stage when valuable data is deleted because it’s thought to be irrelevant.
- Measurement bias: This bias stems from inconsistency and distortion. For example, the training data for facial recognition may vary from camera to camera. The difference in measuring techniques could skew the results. A measurement bias can also occur when data is inconsistently labeled.
- Recall bias: A subset of measurement bias, recall bias occurs when there is a misunderstanding of labels. Consider a series of objects labeled as damaged, partially damaged, or undamaged. There could be a difference in perspective of what counts as damaged versus partially damaged.
- Observer bias or confirmation bias: This bias is the byproduct of seeing what we expect or want to see in data. In a human-centric process like data annotating, the bias arises when the labelers’ subjective thoughts dictate how they annotate.
- Racial bias: A large part of Buolamwini’s mission has been to tackle racial bias, a process where data skews in favor of particular demographics. Speech and facial recognition have been criticized for their inability to recognize people of color as accurately as they do white people.
- Association bias: A key driver of gender bias, association bias happens when a training model multiplies a cultural bias. Consider data that shows only men working in construction and women working as nurses. In a job-finding algorithm, this data could end up not identifying construction jobs for women or nursing jobs for men.
- QUOTE: Understanding data bias is the first step in preventing faulty or compromised judgments that will be amplified and have unintended consequences. Good training data—the human-labeled data sets underpinning AI—is vital. But unconscious bias is pervasive. Here are a few ways it manifests: