brainpy.optim.Adan#
- class brainpy.optim.Adan(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#
Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [1].
\[\begin{split}\begin{equation} \begin{aligned} & \mathbf{m}_k=\left(1-\beta_1\right) \mathbf{m}_{k-1}+\beta_1 \mathbf{g}_k \\ & \mathbf{v}_k=\left(1-\beta_2\right) \mathbf{v}_{k-1}+\beta_2\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right) \\ & \mathbf{n}_k=\left(1-\beta_3\right) \mathbf{n}_{k-1}+\beta_3\left[\mathbf{g}_k+\left(1-\beta_2\right)\left(\mathbf{g}_k-\mathbf{g}_{k-1}\right)\right]^2 \\ & \boldsymbol{\eta}_k=\eta /\left(\sqrt{\mathbf{n}_k+\varepsilon}\right) \\ & \boldsymbol{\theta}_{k+1}=\left(1+\lambda_k \eta\right)^{-1}\left[\boldsymbol{\theta}_k-\boldsymbol{\eta}_k \circ\left(\mathbf{m}_k+\left(1-\beta_2\right) \mathbf{v}_k\right)\right] \\ \end{aligned} \end{equation}\end{split}\]- Parameters:
lr (float, Scheduler) – learning rate. Can be much higher than Adam, up to 5-10x. (default: 1e-3)
betas (tuple) – Coefficients used for computing running averages of gradient and its norm. (default: (0.02, 0.08, 0.01))
eps (float) – The term added to the denominator to improve numerical stability. (default: 1e-8)
weight_decay (float) – decoupled weight decay (L2 penalty) (default: 0)
no_prox (bool) –
how to perform the decoupled weight decay (default: False). It determines the update rule of parameters with weight decay. By default, Adan updates the parameters in the way presented in Algorithm 1 in the paper:
\[\boldsymbol{\theta}_{k+1} = ( 1+\lambda \eta)^{-1}\left[\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}k)\right],\]But one also can update the parameter like Adamw:
\[\boldsymbol{\theta}_{k+1} = ( 1-\lambda \eta)\boldsymbol{\theta}_k - \boldsymbol{\eta}_k \circ (\mathbf{m}_k+(1-{\color{blue}\beta_2})\mathbf{v}_k).\]
References
- __init__(lr=0.001, train_vars=None, betas=(0.02, 0.08, 0.01), eps=1e-08, weight_decay=0.02, no_prox=False, name=None)[source]#
Methods
__init__
([lr, train_vars, betas, eps, ...])check_grads
(grads)cpu
()Move all variable into the CPU device.
cuda
()Move all variables into the GPU device.
load_state_dict
(state_dict[, warn, compatible])Copy parameters and buffers from
state_dict
into this module and its descendants.load_states
(filename[, verbose])Load the model states.
nodes
([method, level, include_self])Collect all children nodes.
register_implicit_nodes
(*nodes[, node_cls])register_implicit_vars
(*variables[, var_cls])register_train_vars
([train_vars])register_vars
([train_vars])save_states
(filename[, variables])Save the model states.
state_dict
()Returns a dictionary containing a whole state of the module.
to
(device)Moves all variables into the given device.
tpu
()Move all variables into the TPU device.
train_vars
([method, level, include_self])The shortcut for retrieving all trainable variables.
tree_flatten
()Flattens the object as a PyTree.
tree_unflatten
(aux, dynamic_values)Unflatten the data to construct an object of this class.
unique_name
([name, type_])Get the unique name for this object.
update
(grads)vars
([method, level, include_self, ...])Collect all variables in this node and the children nodes.
Attributes
name
Name of the model.