Generative Data Augmentation Framework
Jump to navigation
Jump to search
A Generative Data Augmentation Framework is a data augmentation framework that is a generative AI framework and can be used to create a synthetic data augmentation system to solve synthetic data augmentation tasks.
- Context:
- It can generate entirely new data points, not just modifications to the existing training data.
- It can (typically) aim to increase the size and diversity of the training data to improve model performance, especially in scenarios with limited data.
- It can (typically) include components such as Data Generation Models, Data Corruption Strategies, Model Training Schemes.
- …
- Example(s):
- DALE Framework, as proposed in (Ghosh et al., 2023), for low-resource legal NLP tasks.
- as proposed in (F Naaz et al., 2021), for battery condition evaluation.
- as proposed in (M Sajjad et al., 2021) for Alzheimer's disease classification using PET and synthetic data augmentation used in a
- …
- See: Synthetic Data Generation, Generative AI Framework, Data Augmentation Framework, Synthetic Data Augmentation System.
References
2023
- Claude2
- A generative data augmentation framework is a system that automatically generates synthetic data to augment the training data for machine learning models. The key aspects are:
- [[Generative: The synthetic data is generated by the system rather than just making modifications to the existing training data. This allows creating entirely new data points.
- [[Data Augmentation: The goal is to increase the size and diversity of the training data to improve model performance, especially in low-resource scenarios with limited data.
- [[Framework: It consists of components like generation models, corruption strategies, training schemes etc. that work together to enable controlled generation of augmented data.
- A generative data augmentation framework is a system that automatically generates synthetic data to augment the training data for machine learning models. The key aspects are:
2023
- (Ghosh et al., 2023) ⇒ Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar, S. Ramaneswaran, S. Sakshi, Utkarsh Tyagi, and Dinesh Manocha. (2023). “DALE: Generative Data Augmentation for Low-Resource Legal NLP.” doi:10.48550/arXiv.2310.15799
- NOTES:
- It presents DALE, a novel generative data augmentation framework for low-resource legal NLP tasks.
- It proposes a novel unsupervised text denoising objective for DALE based on selective masking of co-located spans in legal documents. This helps acquire knowledge about legal concepts, principles, and language usage.
- QUOTE: ... We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its specialized vocabulary and complex semantics, morphology, and syntax, does not benefit from data augmentations that merely rephrase the source sentence. ...
- NOTES:
2021
- (Naaz et al., 2021) ⇒ F Naaz, A Herle, J Channegowda, A Raj. (2021). “A generative adversarial network‐based synthetic data augmentation technique for battery condition evaluation." In: International Journal of Energy. [DOI Not Provided]
- NOTE: It discusses the use of a generative adversarial network-based synthetic data augmentation framework for evaluating battery conditions.
2021
- (Sajjad et al., 2021) ⇒ M Sajjad, F Ramzan, MUG Khan. (2021). “Deep convolutional generative adversarial network for Alzheimer's disease classification using positron emission tomography (PET) and synthetic data augmentation." In: Microscopy Research and Technique. [DOI Not Provided]
- NOTE: It describes the application of deep convolutional generative adversarial networks for Alzheimer's disease classification, using synthetic data augmentation techniques on Positron Emission Tomography (PET).
2020
- (Li et al., 2020) ⇒ X Li, J Luo, R Younes. (2020). “ActivityGAN: Generative adversarial networks for data augmentation in sensor-based human activity recognition." In: Adjunct Proceedings of the 2020 ACM Conference.
- NOTE: It explains the concept of ActivityGAN, a synthetic data augmentation framework, used for sensor-based human activity recognition.