brainpy.layers.LSTMCell#
- class brainpy.layers.LSTMCell(num_in, num_out, Wi_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=-2, out_axis=-1, distribution=truncated_normal, rng=[1534053657 822008870]), Wh_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=-2, out_axis=-1, distribution=truncated_normal, rng=[1534053657 822008870]), b_initializer=ZeroInit, state_initializer=ZeroInit, activation='tanh', mode=None, train_state=False, name=None)[source]#
Long short-term memory (LSTM) RNN core.
The implementation is based on (zaremba, et al., 2014) [1]. Given \(x_t\) and the previous state \((h_{t-1}, c_{t-1})\) the core computes
\[\begin{split}\begin{array}{ll} i_t = \sigma(W_{ii} x_t + W_{hi} h_{t-1} + b_i) \\ f_t = \sigma(W_{if} x_t + W_{hf} h_{t-1} + b_f) \\ g_t = \tanh(W_{ig} x_t + W_{hg} h_{t-1} + b_g) \\ o_t = \sigma(W_{io} x_t + W_{ho} h_{t-1} + b_o) \\ c_t = f_t c_{t-1} + i_t g_t \\ h_t = o_t \tanh(c_t) \end{array}\end{split}\]where \(i_t\), \(f_t\), \(o_t\) are input, forget and output gate activations, and \(g_t\) is a vector of cell updates.
The output is equal to the new hidden, \(h_t\).
Notes
Forget gate initialization: Following (Jozefowicz, et al., 2015) [2] we add 1.0 to \(b_f\) after initialization in order to reduce the scale of forgetting in the beginning of the training.
- Parameters:
num_out (int) – The number of hidden unit in the node.
state_initializer (callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The state initializer.
Wi_initializer (callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The input weight initializer.
Wh_initializer (callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The hidden weight initializer.
b_initializer (optional, callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The bias weight initializer.
activation (str, callable) – The activation function. It can be a string or a callable function. See
brainpy.math.activations
for more details.
References
- __init__(num_in, num_out, Wi_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=-2, out_axis=-1, distribution=truncated_normal, rng=[1534053657 822008870]), Wh_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=-2, out_axis=-1, distribution=truncated_normal, rng=[1534053657 822008870]), b_initializer=ZeroInit, state_initializer=ZeroInit, activation='tanh', mode=None, train_state=False, name=None)[source]#
Methods
__init__
(num_in, num_out[, Wi_initializer, ...])clear_input
()cpu
()Move all variable into the CPU device.
cuda
()Move all variables into the GPU device.
get_delay_data
(identifier, delay_step, *indices)Get delay data according to the provided delay steps.
load_state_dict
(state_dict[, warn, compatible])Copy parameters and buffers from
state_dict
into this module and its descendants.load_states
(filename[, verbose])Load the model states.
nodes
([method, level, include_self])Collect all children nodes.
register_delay
(identifier, delay_step, ...)Register delay variable.
register_implicit_nodes
(*nodes[, node_cls])register_implicit_vars
(*variables[, var_cls])reset
(*args, **kwargs)Reset function which reset the whole variables in the model.
reset_local_delays
([nodes])Reset local delay variables.
reset_state
([batch_size])Reset function which reset the states in the model.
save_states
(filename[, variables])Save the model states.
state_dict
()Returns a dictionary containing a whole state of the module.
to
(device)Moves all variables into the given device.
tpu
()Move all variables into the TPU device.
train_vars
([method, level, include_self])The shortcut for retrieving all trainable variables.
tree_flatten
()Flattens the object as a PyTree.
tree_unflatten
(aux, dynamic_values)Unflatten the data to construct an object of this class.
unique_name
([name, type_])Get the unique name for this object.
update
(x)The function to specify the updating rule.
update_local_delays
([nodes])Update local delay variables.
vars
([method, level, include_self, ...])Collect all variables in this node and the children nodes.
Attributes
c
Memory cell.
global_delay_data
Global delay data, which stores the delay variables and corresponding delay targets.
h
Hidden state.
mode
Mode of the model, which is useful to control the multiple behaviors of the model.
name
Name of the model.
pass_shared