Learning Rate

From GM-RKB
Jump to navigation Jump to search

A Learning Rate is a hyperparameter that is used for training a model or a artificial neural network.



References

2018a

2018b

2018c

  • (Wikipedia, 2018) ⇒ https://www.wikiwand.com/en/Stochastic_gradient_descent#/Background Retrieved: 2018-04-22.
    • QUOTE: The sum-minimization problem also arises for empirical risk minimization. In this case, [math]\displaystyle{ Q_i(w) }[/math] is the value of the loss function at [math]\displaystyle{ i }[/math]-th example, and [math]\displaystyle{ Q(w) }[/math] is the empirical risk.

      When used to minimize the above function, a standard (or "batch") gradient descent method would perform the following iterations:

      [math]\displaystyle{ w := w - \eta \nabla Q(w) = w - \eta \sum_{i=1}^n \nabla Q_i(w)/n, }[/math]

      where [math]\displaystyle{ \eta }[/math] is a step size (sometimes called the learning rate in machine learning).

      In many cases, the summand functions have a simple form that enables inexpensive evaluations of the sum-function and the sum gradient. For example, in statistics, one-parameter exponential families allow economical function-evaluations and gradient-evaluations.

      However, in other cases, evaluating the sum-gradient may require expensive evaluations of the gradients from all summand functions. When the training set is enormous and no simple formulas exist, evaluating the sums of gradients becomes very expensive, because evaluating the gradient requires evaluating all the summand functions' gradients. To economize on the computational cost at every iteration, stochastic gradient descent samples a subset of summand functions at every step. This is very effective in the case of large-scale machine learning problems[1].

2018d

2017

2013

2012

2010