Exploding Gradient Problem

An Exploding Gradient Problem is a Neural Network Training Algorithm problem that arises when using gradient descent and backpropagation.

Context:
- It can be solved by using a Gradient Clipping Algorithm and Weight Regularization Algorithm.
- …
Example(s):
- Figure 3 in Grosse (2017).
- …
Counter-Example(s):
- Vanishing Gradient Problem.
See: Recurrent Neural Network, Deep Neural Network Training Algorithm.

References

2021

(DeepAI, 2021) ⇒ https://deepai.org/machine-learning-glossary-and-terms/exploding-gradient-problem Retrieved:2021-01-23.
- QUOTE: In machine learning, the exploding gradient problem is an issue found in training artificial neural networks with gradient-based learning methods and backpropagation. An artificial neural network is a learning algorithm, also called neural network or neural net, that uses a network of functions to understand and translate data input into a specific output. This type of learning algorithm is designed to mimic the way neurons function in the human brain. Exploding gradients are a problem when large error gradients accumulate and result in very large updates to neural network model weights during training. Gradients are used during training to update the network weights, but when the typically this process works best when these updates are small and controlled. When the magnitudes of the gradients accumulate, an unstable network is likely to occur, which can cause poor predicition results or even a model that reports nothing useful what so ever. There are methods to fix exploding gradients, which include gradient clipping and weight regularization, among others.
  (...)
  
  Exploding gradients can cause problems in the training of artificial neural networks. When there are exploding gradients, an unstable network can result and the learning cannot be completed. The values of the weights can also become so large as to overflow and result in something called NaN values. NaN values, which stands for not a number, are values that represent an undefined or unrepresentable values. It is useful to know how to identify exploding gradients in order to correct the training.

2017

(Grosse, 2017) ⇒ Roger Grosse (2017). "Lecture 15: Exploding and Vanishing Gradients". In: CSC321 Course, University of Toronto.
- QUOTE: To make this story more concrete, consider the following RNN, which uses the tanh activation function:

Figure 3 shows the function computed at each time step, as well as the function computed by the network as a whole. From this figure, you can see which regions have exploding or vanishing gradients.

**Figure 3:** (left) The function computed by the RNN at each time step, (right) the function computed by the network.

Exploding Gradient Problem

References

2021

2017

Navigation menu

Search