class brainpy.layers.LayerNorm(normalized_shape, epsilon=1e-05, bias_initializer=ZeroInit, scale_initializer=OneInit(value=1.0), elementwise_affine=True, mode=None, name=None)[source]#

Layer normalization (

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

This layer normalizes data on each example, independently of the batch. More specifically, it normalizes data of shape (b, d1, d2, …, c) on the axes of the data dimensions and the channel (d1, d2, …, c). Different from batch normalization, scale and bias are assigned to each position (elementwise operation) instead of the whole channel. If users want to assign a single scale and bias to a whole example/whole channel, please use GroupNorm/ InstanceNorm.

  • normalized_shape (int, sequence of int) –

    The input shape from an expected input of size

    \[[* \times \text{normalized\_shape}[0] \times \text{normalized\_shape}[1] \times \ldots \times \text{normalized\_shape}[-1]]\]

    If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.

  • epsilon (float) – a value added to the denominator for numerical stability. Default: 1e-5

  • bias_initializer (Initializer, ArrayType, Callable) – an initializer generating the original translation matrix

  • scale_initializer (Initializer, ArrayType, Callable) – an initializer generating the original scaling matrix

  • elementwise_affine (bool) – A boolean value that when set to True, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default: True.


>>> import brainpy as bp
>>> import brainpy.math as bm
>>> # NLP Example
>>> batch, sentence_length, embedding_dim = 20, 5, 10
>>> embedding = bm.random.randn(batch, sentence_length, embedding_dim)
>>> layer_norm = bp.layers.LayerNorm(embedding_dim)
>>> # Activate module
>>> layer_norm(embedding)
>>> # Image Example
>>> N, C, H, W = 20, 5, 10, 10
>>> input = bm.random.randn(N, H, W, C)
>>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
>>> # as shown in the image below
>>> layer_norm = bp.layers.LayerNorm([H, W, C])
>>> output = layer_norm(input)
__init__(normalized_shape, epsilon=1e-05, bias_initializer=ZeroInit, scale_initializer=OneInit(value=1.0), elementwise_affine=True, mode=None, name=None)[source]#


__init__(normalized_shape[, epsilon, ...])



Move all variable into the CPU device.


Move all variables into the GPU device.

get_delay_data(identifier, delay_step, *indices)

Get delay data according to the provided delay steps.

load_state_dict(state_dict[, warn, compatible])

Copy parameters and buffers from state_dict into this module and its descendants.

load_states(filename[, verbose])

Load the model states.

nodes([method, level, include_self])

Collect all children nodes.

register_delay(identifier, delay_step, ...)

Register delay variable.

register_implicit_nodes(*nodes[, node_cls])

register_implicit_vars(*variables[, var_cls])

reset(*args, **kwargs)

Reset function which reset the whole variables in the model.


Reset local delay variables.

reset_state(*args, **kwargs)

Reset function which reset the states in the model.

save_states(filename[, variables])

Save the model states.


Returns a dictionary containing a whole state of the module.


Moves all variables into the given device.


Move all variables into the TPU device.

train_vars([method, level, include_self])

The shortcut for retrieving all trainable variables.


Flattens the object as a PyTree.

tree_unflatten(aux, dynamic_values)

Unflatten the data to construct an object of this class.

unique_name([name, type_])

Get the unique name for this object.


The function to specify the updating rule.


Update local delay variables.

vars([method, level, include_self, ...])

Collect all variables in this node and the children nodes.



Global delay data, which stores the delay variables and corresponding delay targets.


Mode of the model, which is useful to control the multiple behaviors of the model.


Name of the model.