Catastrophic Forgetting Scenario

A Catastrophic Forgetting Scenario is a Neural Network Behavior that involves the complete or substantial forgetting of previously learned information when a neural network is trained on new tasks.

Context:
- It can (often) occur in contexts where DNN Models undergo constant updates or need to learn in real-time environments.
- It can (often) be mitigated using strategies like Elastic Weight Consolidation (EWC), Progressive Neural Networks, Replay Techniques, and DNN Meta-learning.
- It can have implications for developing and deploying AI systems, especially in their ability to adapt and handle new situations.
- It can be a subject of ongoing research, aiming to develop more robust and versatile AI systems capable of dynamic, continual learning.
- It can be detected with DNN Memory Evaluation.
- ...
Example(s):
- The experiments by McCloskey and Cohen in 1989, which demonstrated catastrophic forgetting in backpropagation neural network models during tasks like single-digit addition.
- Ratcliff's 1990 studies using backpropagation models in standard recognition memory procedures, which also showed catastrophic forgetting as new information was learned.
- ...
Counter-Example(s):
- A Stable Learning in neural networks, where previously learned tasks are retained even after new learning.
- A Neural Network Generalization, where a neural network applies learned information to new but similar tasks without forgetting.
See: Neural Network Behavior, Elastic Weight Consolidation, Progressive Neural Networks, Replay Techniques, Meta-learning.

References

(Luo et al., 2023) ⇒ Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. (2023). “An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning.” arXiv preprint arXiv:2308.08747
- ABSTRACT: Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information as it learns new information. As large language models (LLMs) have shown excellent performance, it is interesting to uncover whether CF exists in the continual fine-tuning of LLMs. In this study, we empirically evaluate the forgetting phenomenon in LLMs' knowledge, from the perspectives of domain knowledge, reasoning, and reading comprehension. The experiments demonstrate that catastrophic forgetting is generally observed in LLMs ranging from 1b to 7b. Furthermore, as the scale increases, the severity of forgetting also intensifies. Comparing the decoder-only model BLOOMZ with the encoder-decoder model mT0, BLOOMZ suffers less forgetting and maintains more knowledge. We also observe that LLMs can mitigate language bias (e.g. gender bias) during continual fine-tuning. Moreover, we find that ALPACA can maintain more knowledge and capacity compared with LLAMA during the continual fine-tuning, which implies that general instruction tuning can help mitigate the forgetting phenomenon of LLMs in the further fine-tuning process.

(Kemker et al., 2018) ⇒ R. Kemker, M. McClure, A. Abitino, T. Hayes, ... (2018). “Measuring catastrophic forgetting in neural networks.” In: Proceedings of the AAAI Conference. [1]
- NOTE: It establishes new benchmarks and novel metrics for measuring catastrophic forgetting in neural networks.

(Kirkpatrick et al., 2017) ⇒ J. Kirkpatrick, R. Pascanu, ... (2017). “Overcoming catastrophic forgetting in neural networks.” In: Proceedings of the National Academy of Sciences. [2]
- NOTE: It presents methods to overcome catastrophic forgetting, suggesting that it is not an inevitable feature of connectionist models.

(French, 1999) ⇒ R.M. French. (1999). “Catastrophic forgetting in connectionist networks.” In: Trends in Cognitive Sciences. [3]
- NOTE: It discusses the causes, consequences, and various solutions to catastrophic forgetting in neural networks.

(Robins, 1995) ⇒ A. Robins. (1995). “Catastrophic forgetting, rehearsal and pseudorehearsal.” In: Connection Science. [4]
- NOTE: It reviews the problem of catastrophic forgetting and introduces 'sweep rehearsal' as an effective method to minimize it.