Adam

Contents

Adam#

class brainpy.optim.Adam(lr, train_vars=None, beta1=0.9, beta2=0.999, eps=1e-08, weight_decay=None, name=None)[source]#

Optimizer that implements the Adam algorithm.

Adam [6] - a stochastic gradient descent method (SGD) that computes individual adaptive learning rates for different parameters from estimates of first- and second-order moments of the gradients.

Parameters:
  • lr (float, Scheduler) – learning rate.

  • beta1 (optional, float) – A positive scalar value for beta_1, the exponential decay rate for the first moment estimates (default 0.9).

  • beta2 (optional, float) – A positive scalar value for beta_2, the exponential decay rate for the second moment estimates (default 0.999).

  • eps (optional, float) – A positive scalar value for epsilon, a small constant for numerical stability (default 1e-8).

  • name (optional, str) – The optimizer name.

References