brainpy.optim.Adan

brainpy.optim.Adan#

class brainpy.optim.Adan(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#

Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [1].

\[\begin{split}\begin{equation} \begin{aligned} & \mathbf{m}_k=\left(1-\beta_1\right) \mathbf{m}_{k-1}+\beta_1 \mathbf{g}_k \\ & \mathbf{v}_k=\left(1-\beta_2\right) \mathbf{v}_{k-1}+\beta_2\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right) \\ & \mathbf{n}_k=\left(1-\beta_3\right) \mathbf{n}_{k-1}+\beta_3\left[\mathbf{g}_k+\left(1-\beta_2\right)\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right)\right]^2 \\ & \boldsymbol{\eta}_k=\eta /\left(\sqrt{\mathbf{n}_k+\varepsilon}\right) \\ & \boldsymbol{\theta}_{k+1}=\left(1+\lambda_k \eta\right)^{-1}\left[\boldsymbol{\theta}_k-\boldsymbol{\eta}_k \circ\left(\mathbf{m}_k+\left(1-\beta_2\right) \mathbf{v}_k\right)\right] \\ \end{aligned} \end{equation}\end{split}\]

Parameters:

lr (float, Scheduler) – learning rate. Can be much higher than Adam, up to 5-10x. (default: 1e-3)
betas (tuple) – Coefficients used for computing running averages of gradient and its norm. (default: (0.02, 0.08, 0.01))
eps (float) – The term added to the denominator to improve numerical stability. (default: 1e-8)
weight_decay (float) – decoupled weight decay (L2 penalty) (default: 0)
no_prox (bool) –
how to perform the decoupled weight decay (default: False). It determines the update rule of parameters with weight decay. By default, Adan updates the parameters in the way presented in Algorithm 1 in the paper:

\[\boldsymbol{\theta}_{k+1} = ( 1+\lambda \eta)^{-1}\left[\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}k)\right],\]

But one also can update the parameter like Adamw:

\[\boldsymbol{\theta}_{k+1} = ( 1-\lambda \eta)\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}_k).\]

References

__init__(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#

Methods

`__init__`([lr, train_vars, betas, eps, ...])
`check_grads`(grads)
`cpu`()	Move all variable into the CPU device.
`cuda`()	Move all variables into the GPU device.
`load_state_dict`(state_dict[, warn, compatible])	Copy parameters and buffers from `state_dict` into this module and its descendants.
`load_states`(filename[, verbose])	Load the model states.
`nodes`([method, level, include_self])	Collect all children nodes.
`register_implicit_nodes`(*nodes[, node_cls])
`register_implicit_vars`(*variables[, var_cls])
`register_train_vars`([train_vars])
`register_vars`([train_vars])
`save_states`(filename[, variables])	Save the model states.
`state_dict`()	Returns a dictionary containing a whole state of the module.
`to`(device)	Moves all variables into the given device.
`tpu`()	Move all variables into the TPU device.
`train_vars`([method, level, include_self])	The shortcut for retrieving all trainable variables.
`tree_flatten`()	Flattens the object as a PyTree.
`tree_unflatten`(aux, dynamic_values)	Unflatten the data to construct an object of this class.
`unique_name`([name, type_])	Get the unique name for this object.
`update`(grads)
`vars`([method, level, include_self, ...])	Collect all variables in this node and the children nodes.

Attributes

name

Name of the model.

brainpy.optim.Adan

Contents

brainpy.optim.Adan#