class brainpy.optim.Adan(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#

Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [1].

\begin{split}\begin{aligned} & \mathbf{m}_k=\left(1-\beta_1\right) \mathbf{m}_{k-1}+\beta_1 \mathbf{g}_k \\ & \mathbf{v}_k=\left(1-\beta_2\right) \mathbf{v}_{k-1}+\beta_2\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right) \\ & \mathbf{n}_k=\left(1-\beta_3\right) \mathbf{n}_{k-1}+\beta_3\left[\mathbf{g}_k+\left(1-\beta_2\right)\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right)\right]^2 \\ & \boldsymbol{\eta}_k=\eta /\left(\sqrt{\mathbf{n}_k+\varepsilon}\right) \\ & \boldsymbol{\theta}_{k+1}=\left(1+\lambda_k \eta\right)^{-1}\left[\boldsymbol{\theta}_k-\boldsymbol{\eta}_k \circ\left(\mathbf{m}_k+\left(1-\beta_2\right) \mathbf{v}_k\right)\right] \\ \end{aligned}\end{split}
Parameters:
• lr (float, Scheduler) – learning rate. Can be much higher than Adam, up to 5-10x. (default: 1e-3)

• betas (tuple) – Coefficients used for computing running averages of gradient and its norm. (default: (0.02, 0.08, 0.01))

• eps (float) – The term added to the denominator to improve numerical stability. (default: 1e-8)

• weight_decay (float) – decoupled weight decay (L2 penalty) (default: 0)

• no_prox (bool) –

how to perform the decoupled weight decay (default: False). It determines the update rule of parameters with weight decay. By default, Adan updates the parameters in the way presented in Algorithm 1 in the paper:

$\boldsymbol{\theta}_{k+1} = ( 1+\lambda \eta)^{-1}\left[\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}k)\right],$

But one also can update the parameter like Adamw:

$\boldsymbol{\theta}_{k+1} = ( 1-\lambda \eta)\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}_k).$

References

__init__(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#

Methods

 __init__([lr, train_vars, betas, eps, ...]) check_grads(grads) cpu() Move all variable into the CPU device. cuda() Move all variables into the GPU device. load_state_dict(state_dict[, warn, compatible]) Copy parameters and buffers from state_dict into this module and its descendants. load_states(filename[, verbose]) Load the model states. nodes([method, level, include_self]) Collect all children nodes. register_implicit_nodes(*nodes[, node_cls]) register_implicit_vars(*variables[, var_cls]) register_train_vars([train_vars]) register_vars([train_vars]) save_states(filename[, variables]) Save the model states. state_dict() Returns a dictionary containing a whole state of the module. to(device) Moves all variables into the given device. tpu() Move all variables into the TPU device. train_vars([method, level, include_self]) The shortcut for retrieving all trainable variables. tree_flatten() Flattens the object as a PyTree. tree_unflatten(aux, dynamic_values) Unflatten the data to construct an object of this class. unique_name([name, type_]) Get the unique name for this object. update(grads) vars([method, level, include_self, ...]) Collect all variables in this node and the children nodes.

Attributes

 name Name of the model.