brainpy.nn.nodes.ANN.LSTM#

class brainpy.nn.nodes.ANN.LSTM(num_unit, wi_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=- 2, out_axis=- 1, distribution=truncated_normal, seed=None), wh_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=- 2, out_axis=- 1, distribution=truncated_normal, seed=None), bias_initializer=ZeroInit, state_initializer=ZeroInit, **kwargs)[source]#

Long short-term memory (LSTM) RNN core.

The implementation is based on (zaremba, et al., 2014) 1. Given \(x_t\) and the previous state \((h_{t-1}, c_{t-1})\) the core computes

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_{ii} x_t + W_{hi} h_{t-1} + b_i) \\ f_t = \sigma(W_{if} x_t + W_{hf} h_{t-1} + b_f) \\ g_t = \tanh(W_{ig} x_t + W_{hg} h_{t-1} + b_g) \\ o_t = \sigma(W_{io} x_t + W_{ho} h_{t-1} + b_o) \\ c_t = f_t c_{t-1} + i_t g_t \\ h_t = o_t \tanh(c_t) \end{array}\end{split}\]

where \(i_t\), \(f_t\), \(o_t\) are input, forget and output gate activations, and \(g_t\) is a vector of cell updates.

The output is equal to the new hidden, \(h_t\).

Notes

Forget gate initialization: Following (Jozefowicz, et al., 2015) 2 we add 1.0 to \(b_f\) after initialization in order to reduce the scale of forgetting in the beginning of the training.

Parameters
  • num_unit (int) – The number of hidden unit in the node.

  • state_initializer (callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The state initializer.

  • wi_initializer (callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The input weight initializer.

  • wh_initializer (callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The hidden weight initializer.

  • bias_initializer (optional, callable, Initializer, bm.ndarray, jax.numpy.ndarray) – The bias weight initializer.

  • activation (str, callable) – The activation function. It can be a string or a callable function. See brainpy.math.activations for more details.

  • trainable (bool) – Whether set the node is trainable.

References

1

Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. “Recurrent neural network regularization.” arXiv preprint arXiv:1409.2329 (2014).

2

Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. “An empirical exploration of recurrent network architectures.” In International conference on machine learning, pp. 2342-2350. PMLR, 2015.

__init__(num_unit, wi_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=- 2, out_axis=- 1, distribution=truncated_normal, seed=None), wh_initializer=XavierNormal(scale=1.0, mode=fan_avg, in_axis=- 2, out_axis=- 1, distribution=truncated_normal, seed=None), bias_initializer=ZeroInit, state_initializer=ZeroInit, **kwargs)[source]#

Methods

__init__(num_unit[, wi_initializer, ...])

copy([name, shallow])

Returns a copy of the Node.

feedback(ff_output, **shared_kwargs)

The feedback computation function of a node.

forward(ff[, fb])

The feedforward computation function of a node.

init_fb_conn()

Initialize the feedback connections.

init_fb_output([num_batch])

Set the initial node feedback state.

init_ff_conn()

Initialize the feedforward connections.

init_state([num_batch])

Set the initial node state.

initialize([num_batch])

Initialize the node.

load_states(filename[, verbose])

Load the model states.

nodes([method, level, include_self])

Collect all children nodes.

offline_fit(targets, ffs[, fbs])

Offline training interface.

online_fit(target, ff[, fb])

Online training fitting interface.

online_init()

Online training initialization interface.

register_implicit_nodes(nodes)

register_implicit_vars(variables)

save_states(filename[, variables])

Save the model states.

set_fb_output(state)

Safely set the feedback state of the node.

set_feedback_shapes(fb_shapes)

set_feedforward_shapes(feedforward_shapes)

set_output_shape(shape)

set_state(state)

Safely set the state of the node.

train_vars([method, level, include_self])

The shortcut for retrieving all trainable variables.

unique_name([name, type_])

Get the unique name for this object.

vars([method, level, include_self])

Collect all variables in this node and the children nodes.

Attributes

c

Memory cell.

data_pass

Offline fitting method.

fb_output

rtype

Optional[TypeVar(Tensor, JaxArray, ndarray)]

feedback_shapes

Output data size.

feedforward_shapes

Input data size.

h

Hidden state.

is_feedback_input_supported

is_feedback_supported

is_initialized

rtype

bool

name

output_shape

Output data size.

state

Node current internal state.

state_trainable

Returns if the Node can be trained.

train_state

trainable

Returns if the Node can be trained.