class brainpy.optim.RMSProp(lr, train_vars=None, weight_decay=None, epsilon=1e-06, rho=0.9, name=None)[source]#

Optimizer that implements the RMSprop algorithm.

RMSprop [5] and Adadelta have both been developed independently around the same time stemming from the need to resolve Adagrad’s radically diminishing learning rates.

The gist of RMSprop is to:

  • Maintain a moving (discounted) average of the square of gradients

  • Divide the gradient by the root of this average

\[\begin{split}\begin{split}c_t &= \rho c_{t-1} + (1-\rho)*g^2\\ p_t &= \frac{\eta}{\sqrt{c_t + \epsilon}} * g \end{split}\end{split}\]

The centered version additionally maintains a moving average of the gradients, and uses that average to estimate the variance.


lr (float, Scheduler) – learning rate.