Training a Recurrent Neural Network#

@Chaoming Wang

In recent years, we saw the revolution that training a dynamical system from data or tasks has provided important insights to understand brain functions. To support this, BrainPy porvides various interfaces to help users train dynamical systems.

import brainpy as bp
import brainpy.math as bm

bm.enable_x64()
bm.set_dfloat(bm.float64)

# bm.set_platform('cpu')
import matplotlib.pyplot as plt

General usage#

In BrainPy, we provide a general interface to build neural networks, supporting feedforward, recurrent, feedback connections.

Model Building#

In general, each model is treated as a node. Based on the node operations, like feedforward >>, feedback <<, etc., we can create arbitrary node graph we want. For example,


feedforward_net = data >> reservoir >> readout

create a simple network in which data first feedforward to reservoir node, then the output of reservoir is readout by a readout node. Further, if we try to create a feedback connection from readout to reservoir, we can use


feedback_net = reservoir << readout

After merging it with the previous defined feedforward_net, we can create a network with feedforward and feedback connections:


model = feedforward_net & feedback_net

Model running & training#

Moreover, BrainPy provides various interfaces for network running and training, including the commonly used Ridge Regression method, FORCE learning method, and back-progropagation through time algorithms. Users can create these runners and trainers with the following codes:


runner = bp.nn.RNNRunner(model, ...)

or,


trainer = bp.nn.RidgeTrainer(model, ...)

trainer = bp.nn.FORCELearning(model, ...)

trainer = bp.nn.BPTT(model, ...)

Bellow, we demonstrate these supports with several examples.

Echo state network#

We first illustrate the training interface of BrainPy using an echo state network.

For an echo state network, we have three components: an input node (“I”), a reservoir node (“R”) for dimension expansion, and an output node (“O”) for linear readout.

# create the components we need

i = bp.nn.Input(3)
r = bp.nn.Reservoir(400, spectral_radius=1.4)
o = bp.nn.LinearReadout(3)
# create the model we need

model = i >> r >> o
model.plot_node_graph(fig_size=(5, 5), node_size=2000)
../_images/e33df6ef1f0a0409aed4842e037cb8b87fdaa9b4e0fb20aa01be5693835c780e.png

We use this created network to predict the chaotic time series, named as Lorenz attractor. Particurlaly, we expect the network has the ability to predict \(P(t+l)\) from \(P(t)\), where \(l\) is the length of the prediction ahead.

dt = 0.01
data = bp.datasets.lorenz_series(100, dt=dt)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
plt.figure(figsize=(10, 5))
plt.subplot(311)
plt.plot(bm.as_numpy(data['ts']), bm.as_numpy(data['x'].flatten()))
plt.ylabel('x')
plt.subplot(312)
plt.plot(bm.as_numpy(data['ts']), bm.as_numpy(data['y'].flatten()))
plt.ylabel('y')
plt.subplot(313)
plt.plot(bm.as_numpy(data['ts']), bm.as_numpy(data['z'].flatten()))
plt.ylabel('z')
plt.show()
../_images/9492b5db560f9b93d40950076a5228efc4ba5a01eab3db249b6e97f4843a9771.png
def get_subset(data, start, end):
    res = {'x': data['x'][start: end],
           'y': data['y'][start: end],
           'z': data['z'][start: end]}
    res = bm.hstack([res['x'], res['y'], res['z']])
    return res.reshape((1, ) + res.shape)

To complish this task, we use Ridge Regression method to train the network. Before that, we first initialize the network with the batch size of 1, and then construct a Ridge Regression trainer.

model.initialize(num_batch=1)

trainer = bp.nn.RidgeTrainer(model, beta=1e-6)

We warm-up the network with 20 ms.

warmup_data = get_subset(data, 0, int(20/dt))

outs = trainer.predict(warmup_data)

outs.shape
(1, 2000, 3)

The training data is the time series from 20 ms to 80 ms. We want the network has the abilitty to forecast 1 time step ahead.

x_train = get_subset(data, int(20/dt), int(80/dt))
y_train = get_subset(data, int(20/dt)+1, int(80/dt)+1)

trainer.fit([x_train, y_train])

Then we test the trained network with the next 20 ms.

x_test = get_subset(data, int(80/dt), int(100/dt)-1)
y_test = get_subset(data, int(80/dt) + 1, int(100/dt))

predictions = trainer.predict(x_test)

bp.losses.mean_squared_error(y_test, predictions)
DeviceArray(0.00014552, dtype=float64)
def plot_difference(truths, predictions):
    truths = truths.numpy()
    predictions = predictions.numpy()

    plt.subplot(311)
    plt.plot(truths[0, :, 0], label='Ground Truth')
    plt.plot(predictions[0, :, 0], label='Prediction')
    plt.ylabel('x')
    plt.legend()
    plt.subplot(312)
    plt.plot(truths[0, :, 1], label='Ground Truth')
    plt.plot(predictions[0, :, 1], label='Prediction')
    plt.ylabel('y')
    plt.legend()
    plt.subplot(313)
    plt.plot(truths[0, :, 2], label='Ground Truth')
    plt.plot(predictions[0, :, 2], label='Prediction')
    plt.ylabel('z')
    plt.legend()
    plt.show()
plot_difference(y_test, predictions)
../_images/6e7fa7779767de5c0a1cf9ba36cdda05359c1adc69cd27a5d65de815a9a6fcb0.png

We can make the task harder to forecast 10 time step ahead.

warmup_data = get_subset(data, 0, int(20/dt))
outs = trainer.predict(warmup_data)

x_train = get_subset(data, int(20/dt), int(80/dt))
y_train = get_subset(data, int(20/dt)+10, int(80/dt)+10)
trainer.fit([x_train, y_train])

x_test = get_subset(data, int(80/dt), int(100/dt)-10)
y_test = get_subset(data, int(80/dt) + 10, int(100/dt))
predictions = trainer.predict(x_test)

plot_difference(y_test, predictions)
../_images/42801df247b1603785bab8b7a7161186b3f9b6ea600d5127bf7eb4847224e758.png

Or forecast 100 time step ahead.

warmup_data = get_subset(data, 0, int(20/dt))
outs = trainer.predict(warmup_data)

x_train = get_subset(data, int(20/dt), int(80/dt))
y_train = get_subset(data, int(20/dt)+100, int(80/dt)+100)
trainer.fit([x_train, y_train])

x_test = get_subset(data, int(80/dt), int(100/dt)-100)
y_test = get_subset(data, int(80/dt) + 100, int(100/dt))
predictions = trainer.predict(x_test)

plot_difference(y_test, predictions)
../_images/4cd3885698d1192d95b379961f06b07d64f60618135cb5ab300fd255cdc1827d.png

As you see, forecasting larger time step makes the learning more difficult.

Next generation RC#

(Gauthier, et. al., Nature Communications, 2021) has proposed a next generation reservoir computing (NG-RC) model by using nonlinear vector autoregression (NVAR).

(A) A traditional RC processes time-series data using an artificial recurrent neural network. (B) The NG-RC performs a forecast using a linear weight of time-delay states of the time series data and nonlinear functionals of this data.

In BrainPy, we can easily implement this kind of network. Here, let’s try to use NG-RC to infer the \(z\) variable according to \(x\) and \(y\) variables. This task is important for applications where it is possible to obtain high-quality information about a dynamical variable in a laboratory setting, but not in field deployment.

Let’s first initialize the data we need.

dt = 0.02
t_warmup = 10.  # ms
t_train = 20.  # ms
t_test = 50.  # ms
num_warmup = int(t_warmup / dt)  # warm up NVAR
num_train = int(t_train / dt)
num_test = int(t_test / dt)

lorenz_series = bp.datasets.lorenz_series(t_warmup + t_train + t_test,
                                          dt=dt,
                                          inits={'x': 17.67715816276679,
                                                 'y': 12.931379185960404,
                                                 'z': 43.91404334248268})
def get_subset(data, start, end):
  res = {'x': data['x'][start: end],
         'y': data['y'][start: end],
         'z': data['z'][start: end]}
  X = bm.hstack([res['x'], res['y']])
  X = X.reshape((1,) + X.shape)
  Y = res['z']
  Y = Y.reshape((1, ) + Y.shape)
  return X, Y


X_warmup, Y_warmup = get_subset(lorenz_series, 0, num_warmup)
X_train, Y_train = get_subset(lorenz_series, num_warmup, num_warmup + num_train)
X_test, Y_test = get_subset(lorenz_series, num_warmup + num_train, num_warmup + num_train + num_test)

The network architecture is the same with the above echo state network. Specifically, we have an input node, a reservoir node and an output node. To accomplish this task, (Gauthier, et. al., Nature Communications, 2021) used 4 delay history information with stride of 5, and their quadratic polynomial monomials. Therefore, we create the network as:

i = bp.nn.Input(2)
r = bp.nn.NVAR(delay=4, order=2, stride=5)
o = bp.nn.LinearReadout(1, trainable=True)
model = i >> r >> o
model.initialize(num_batch=1)

We train the network using the Ridge Regression method too.

trainer = bp.nn.RidgeTrainer(model, beta=0.05)

# warm-up
outputs = trainer.predict(X_warmup)
print('Warmup NMS: ', bp.losses.mean_squared_error(outputs, Y_warmup))

# training
trainer.fit([X_train, Y_train])

# prediction
outputs = trainer.predict(X_test)
print('Prediction NMS: ', bp.losses.mean_squared_error(outputs, Y_test))
Warmup NMS:  10729.250973138222
Prediction NMS:  0.3374043793562189
X_test = bm.asarray(X_test).numpy()[0]
Y_test = bm.asarray(Y_test).numpy().flatten()
outputs = bm.asarray(outputs).numpy().flatten()

plt.figure(figsize=(10, 5))
plt.subplot(311)
plt.plot(X_test[:, 0], color='b')
plt.ylabel('x')
plt.subplot(312)
plt.plot(X_test[:, 1], color='b')
plt.ylabel('y')
plt.subplot(313)
plt.plot(Y_test, color='b', label='Grund Truth')
plt.plot(outputs, color='r', label='Prediction')
plt.ylabel('y')
plt.legend()
plt.show()
../_images/d0dc0f77f3a35abfece1b45bd1c2b9b035a684e20485e834d28b82759dd10650.png

Recurrent neural network#

In recent years, artificial recurrent neural networks trained with back propagation through time (BPTT) have been a useful tool to study the network mechanism of brain functions. To support training networks with BPTT, BrainPy provides brainpy.nn.BPTT method.

Here, we demonstrate how to train an artificial recurrent neural network by using a white noise integration task. In this task, we want our trained RNN model has the ability to integrate white noise. For example, if we has a time series of noise data,

noises = bm.random.normal(0, 0.2, size=10)

plt.figure(figsize=(8, 2))
plt.plot(noises.to_numpy().flatten())
plt.show()
../_images/96c874043041a60b13b1bb8f94c06844a15e1f28d80fe8bb57fe42be59fd0a48.png

Now, we want to get a model which can integrate the noise bm.cumsum(noises) * dt:

dt = 0.1
integrals = bm.cumsum(noises) * dt

plt.figure(figsize=(8, 2))
plt.plot(integrals.to_numpy().flatten())
plt.show()
../_images/8d7995919faafe74df6dae818cd447d8cce92b46c508f610e31f3336e997e3fc.png

Here, we first define a task which generates the input data and the target integration results.

from functools import partial

dt = 0.04
num_step = int(1.0 / dt)
num_batch = 128


@partial(bm.jit,
         dyn_vars=bp.TensorCollector({'a': bm.random.DEFAULT}),
         static_argnames=['batch_size'])
def build_inputs_and_targets(mean=0.025, scale=0.01, batch_size=10):
  # Create the white noise input
  sample = bm.random.normal(size=(batch_size, 1, 1))
  bias = mean * 2.0 * (sample - 0.5)
  samples = bm.random.normal(size=(batch_size, num_step, 1))
  noise_t = scale / dt ** 0.5 * samples
  inputs = bias + noise_t
  targets = bm.cumsum(inputs, axis=1)
  return inputs, targets


def train_data():
  for _ in range(100):
    yield build_inputs_and_targets(batch_size=num_batch)

Then, we create and initialize the model. Note here we need the model train its initial state, so we need set state_trainable=True for the used VanillaRNN instance.

model = (
    bp.nn.Input(1)
    >>
    bp.nn.VanillaRNN(100, state_trainable=True)
    >>
    bp.nn.Dense(1)
)
model.initialize(num_batch=num_batch)

brainpy.nn.BPTT trainer receives a loss function setting, and an optimizer setting. Loss function can be selected from the brainpy.losses module, or it can be a callable function receives (predictions, targets) argument. Optimizer setting must be an instance of brainpy.optim.Optimizer.

Here we define a loss function which use Mean Squared Error (MSE) to measure the error between the targets and the predictions. We also apply a L2 regularization.

# define loss function
def loss(predictions, targets, l2_reg=2e-4):
    mse = bp.losses.mean_squared_error(predictions, targets)
    l2 = l2_reg * bp.losses.l2_norm(model.train_vars().unique().dict()) ** 2
    return mse + l2
# define optimizer
lr = bp.optim.ExponentialDecay(lr=0.025, decay_steps=1, decay_rate=0.99975)
opt = bp.optim.Adam(lr=lr, eps=1e-1)
# create a trainer
trainer = bp.nn.BPTT(model,
                     loss=loss,
                     optimizer=opt,
                     max_grad_norm=5.0)
# train the model
trainer.fit(train_data,
            num_batch=num_batch,
            num_train=30,
            num_report=500)
Train 500 steps, use 9.3755 s, train loss 0.03093
Train 1000 steps, use 6.7661 s, train loss 0.0275
Train 1500 steps, use 6.9309 s, train loss 0.02998
Train 2000 steps, use 6.6827 s, train loss 0.02409
Train 2500 steps, use 6.6528 s, train loss 0.02289
Train 3000 steps, use 6.6663 s, train loss 0.02187

The training losses is recorded in the .train_losses attribute.

plt.figure(figsize=(8, 3))
plt.plot(trainer.train_losses.numpy())
plt.xlabel('Number of Training Step')
plt.ylabel('Training Loss')
plt.show()
../_images/1305960ae601c12c6a8f26b323640d78985071f53b63839d1cc8223e86a51c5f.png

Finally, let’s try the trained network, and test whether it can generate the correct integration results.

model.initialize(1)
x, y = build_inputs_and_targets(batch_size=1)
predicts = trainer.predict(x)
plt.figure(figsize=(8, 2))
plt.plot(bm.as_numpy(y[0]).flatten(), label='Ground Truth')
plt.plot(bm.as_numpy(predicts[0]).flatten(), label='Prediction')
plt.legend()
plt.show()
../_images/7648def00894cd5264a04bbca55d02895309342c87ba4ffe3c606cbffadd64b6.png

Further reading#