Rectified Linear Unit
(Redirected from ReLU)
Jump to navigation
Jump to search
A Rectified Linear Unit is an artificial neuron whose activation function is a rectified linear activation function.
- AKA: ReLU.
- Context:
- It can range from being a Noisy ReLU (from a Noisy ReLU Network) to being a ...
- It can range from being a Leaky ReLU (from a Leaky ReLU Network) to being a ...
- Example(s):
- Counter-Example(s):
- a tanh Activation Function (w/ tanh).
- a sigmoid Activation Function (w/ Sigmoid).
- a softmax Activation Function (w/ Softmax)
- See: Rectifier-based Neural Network, Convolutional Network.
References
2016
- http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html
- QUOTE: The Rectified Linear Unit (ReLU) has become very popular in the last few years. It computes the function f(x)=max(0,x), which is simply thresholded at zero.
There are several pros and cons to using the ReLUs:
- (Pros) Compared to sigmoid/tanh neurons that involve expensive operations (exponentials, etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero. Meanwhile, ReLUs does not suffer from saturating.
- (Pros) It was found to greatly accelerate (e.g., a factor of 6 in [1]) the convergence of stochastic gradient descent compared to the sigmoid/tanh functions. It is argued that this is due to its linear, non-saturating form.
- (Cons) Unfortunately, ReLU units can be fragile during training and can “die”. For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold. For example, you may find that as much as 40% of your network can be “dead” (i.e., neurons that never activate across the entire training dataset) if the learning rate is set too high. With a proper setting of the learning rate this is less frequently an issue.
- QUOTE: The Rectified Linear Unit (ReLU) has become very popular in the last few years. It computes the function f(x)=max(0,x), which is simply thresholded at zero.
2010
- (Nair & Hinton, 2010) ⇒ Vinod Nair, and Geoffrey E. Hinton. (2010). “Rectified Linear Units Improve Restricted Boltzmann Machines.” In: Proceedings of the 27th International Conference on Machine Learning (ICML-10).