brainpy.optim.Adan#

class brainpy.optim.Adan(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#

Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [1].

\[\begin{split}\begin{equation} \begin{aligned} & \mathbf{m}_k=\left(1-\beta_1\right) \mathbf{m}_{k-1}+\beta_1 \mathbf{g}_k \\ & \mathbf{v}_k=\left(1-\beta_2\right) \mathbf{v}_{k-1}+\beta_2\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right) \\ & \mathbf{n}_k=\left(1-\beta_3\right) \mathbf{n}_{k-1}+\beta_3\left[\mathbf{g}_k+\left(1-\beta_2\right)\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right)\right]^2 \\ & \boldsymbol{\eta}_k=\eta /\left(\sqrt{\mathbf{n}_k+\varepsilon}\right) \\ & \boldsymbol{\theta}_{k+1}=\left(1+\lambda_k \eta\right)^{-1}\left[\boldsymbol{\theta}_k-\boldsymbol{\eta}_k \circ\left(\mathbf{m}_k+\left(1-\beta_2\right) \mathbf{v}_k\right)\right] \\ \end{aligned} \end{equation}\end{split}\]
Parameters:
  • lr (float, Scheduler) – learning rate. Can be much higher than Adam, up to 5-10x. (default: 1e-3)

  • betas (tuple) – Coefficients used for computing running averages of gradient and its norm. (default: (0.02, 0.08, 0.01))

  • eps (float) – The term added to the denominator to improve numerical stability. (default: 1e-8)

  • weight_decay (float) – decoupled weight decay (L2 penalty) (default: 0)

  • no_prox (bool) –

    how to perform the decoupled weight decay (default: False). It determines the update rule of parameters with weight decay. By default, Adan updates the parameters in the way presented in Algorithm 1 in the paper:

    \[\boldsymbol{\theta}_{k+1} = ( 1+\lambda \eta)^{-1}\left[\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}k)\right],\]

    But one also can update the parameter like Adamw:

    \[\boldsymbol{\theta}_{k+1} = ( 1-\lambda \eta)\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}_k).\]

References

__init__(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#

Methods

__init__([lr, train_vars, betas, eps, ...])

check_grads(grads)

cpu()

Move all variable into the CPU device.

cuda()

Move all variables into the GPU device.

load_state_dict(state_dict[, warn, compatible])

Copy parameters and buffers from state_dict into this module and its descendants.

load_states(filename[, verbose])

Load the model states.

nodes([method, level, include_self])

Collect all children nodes.

register_implicit_nodes(*nodes[, node_cls])

register_implicit_vars(*variables[, var_cls])

register_train_vars([train_vars])

register_vars([train_vars])

save_states(filename[, variables])

Save the model states.

state_dict()

Returns a dictionary containing a whole state of the module.

to(device)

Moves all variables into the given device.

tpu()

Move all variables into the TPU device.

train_vars([method, level, include_self])

The shortcut for retrieving all trainable variables.

tree_flatten()

Flattens the object as a PyTree.

tree_unflatten(aux, dynamic_values)

Unflatten the data to construct an object of this class.

unique_name([name, type_])

Get the unique name for this object.

update(grads)

vars([method, level, include_self, ...])

Collect all variables in this node and the children nodes.

Attributes

name

Name of the model.