Generative Data Augmentation Framework

Context:
- It can generate entirely new data points, not just modifications to the existing training data.
- It can (typically) aim to increase the size and diversity of the training data to improve model performance, especially in scenarios with limited data.
- It can (typically) include components such as Data Generation Models, Data Corruption Strategies, Model Training Schemes.
- …
Example(s):
- DALE Framework, as proposed in (Ghosh et al., 2023), for low-resource legal NLP tasks.
- as proposed in (F Naaz et al., 2021), for battery condition evaluation.
- as proposed in (M Sajjad et al., 2021) for Alzheimer's disease classification using PET and synthetic data augmentation used in a
- …
See: Synthetic Data Generation, Generative AI Framework, Data Augmentation Framework, Synthetic Data Augmentation System.

References

Claude2
- A generative data augmentation framework is a system that automatically generates synthetic data to augment the training data for machine learning models. The key aspects are:
  - [[Generative: The synthetic data is generated by the system rather than just making modifications to the existing training data. This allows creating entirely new data points.
  - [[Data Augmentation: The goal is to increase the size and diversity of the training data to improve model performance, especially in low-resource scenarios with limited data.
  - [[Framework: It consists of components like generation models, corruption strategies, training schemes etc. that work together to enable controlled generation of augmented data.

(Ghosh et al., 2023) ⇒ Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar, S. Ramaneswaran, S. Sakshi, Utkarsh Tyagi, and Dinesh Manocha. (2023). “DALE: Generative Data Augmentation for Low-Resource Legal NLP.” doi:10.48550/arXiv.2310.15799
- NOTES:
  - It presents DALE, a novel generative data augmentation framework for low-resource legal NLP tasks.
  - It proposes a novel unsupervised text denoising objective for DALE based on selective masking of co-located spans in legal documents. This helps acquire knowledge about legal concepts, principles, and language usage.
- QUOTE: ... We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its specialized vocabulary and complex semantics, morphology, and syntax, does not benefit from data augmentations that merely rephrase the source sentence. ...

(Naaz et al., 2021) ⇒ F Naaz, A Herle, J Channegowda, A Raj. (2021). “A generative adversarial network‐based synthetic data augmentation technique for battery condition evaluation." In: International Journal of Energy. [DOI Not Provided]
- NOTE: It discusses the use of a generative adversarial network-based synthetic data augmentation framework for evaluating battery conditions.

(Sajjad et al., 2021) ⇒ M Sajjad, F Ramzan, MUG Khan. (2021). “Deep convolutional generative adversarial network for Alzheimer's disease classification using positron emission tomography (PET) and synthetic data augmentation." In: Microscopy Research and Technique. [DOI Not Provided]
- NOTE: It describes the application of deep convolutional generative adversarial networks for Alzheimer's disease classification, using synthetic data augmentation techniques on Positron Emission Tomography (PET).

(Li et al., 2020) ⇒ X Li, J Luo, R Younes. (2020). “ActivityGAN: Generative adversarial networks for data augmentation in sensor-based human activity recognition." In: Adjunct Proceedings of the 2020 ACM Conference.
- NOTE: It explains the concept of ActivityGAN, a synthetic data augmentation framework, used for sensor-based human activity recognition.