Training a Recurrent Neural Network
Contents
Training a Recurrent Neural Network#
In recent years, we saw the revolution that training a dynamical system from data or tasks has provided important insights to understand brain functions. To support this, BrainPy porvides various interfaces to help users train dynamical systems.
import brainpy as bp
import brainpy.math as bm
bm.enable_x64()
bm.set_dfloat(bm.float64)
# bm.set_platform('cpu')
import matplotlib.pyplot as plt
General usage#
In BrainPy, we provide a general interface to build neural networks, supporting feedforward, recurrent, feedback connections.
Model Building#
In general, each model is treated as a node. Based on the node operations, like feedforward >>
, feedback <<
, etc., we can create arbitrary node graph we want. For example,
feedforward_net = data >> reservoir >> readout
create a simple network in which data
first feedforward to reservoir
node, then the output of reservoir
is readout by a readout
node. Further, if we try to create a feedback connection from readout
to reservoir
, we can use
feedback_net = reservoir << readout
After merging it with the previous defined feedforward_net
, we can create a network with feedforward and feedback connections:
model = feedforward_net & feedback_net
Model running & training#
Moreover, BrainPy provides various interfaces for network running and training, including the commonly used Ridge Regression method, FORCE learning method, and back-progropagation through time algorithms. Users can create these runners and trainers with the following codes:
runner = bp.nn.RNNRunner(model, ...)
or,
trainer = bp.nn.RidgeTrainer(model, ...)
trainer = bp.nn.FORCELearning(model, ...)
trainer = bp.nn.BPTT(model, ...)
Bellow, we demonstrate these supports with several examples.
Echo state network#
We first illustrate the training interface of BrainPy using an echo state network.
For an echo state network, we have three components: an input node (“I”), a reservoir node (“R”) for dimension expansion, and an output node (“O”) for linear readout.
# create the components we need
i = bp.nn.Input(3)
r = bp.nn.Reservoir(400, spectral_radius=1.4)
o = bp.nn.LinearReadout(3)
# create the model we need
model = i >> r >> o
model.plot_node_graph(fig_size=(5, 5), node_size=2000)
We use this created network to predict the chaotic time series, named as Lorenz attractor. Particurlaly, we expect the network has the ability to predict \(P(t+l)\) from \(P(t)\), where \(l\) is the length of the prediction ahead.
dt = 0.01
data = bp.datasets.lorenz_series(100, dt=dt)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
plt.figure(figsize=(10, 5))
plt.subplot(311)
plt.plot(bm.as_numpy(data['ts']), bm.as_numpy(data['x'].flatten()))
plt.ylabel('x')
plt.subplot(312)
plt.plot(bm.as_numpy(data['ts']), bm.as_numpy(data['y'].flatten()))
plt.ylabel('y')
plt.subplot(313)
plt.plot(bm.as_numpy(data['ts']), bm.as_numpy(data['z'].flatten()))
plt.ylabel('z')
plt.show()
def get_subset(data, start, end):
res = {'x': data['x'][start: end],
'y': data['y'][start: end],
'z': data['z'][start: end]}
res = bm.hstack([res['x'], res['y'], res['z']])
return res.reshape((1, ) + res.shape)
To complish this task, we use Ridge Regression method to train the network. Before that, we first initialize the network with the batch size of 1, and then construct a Ridge Regression trainer.
model.initialize(num_batch=1)
trainer = bp.nn.RidgeTrainer(model, beta=1e-6)
We warm-up the network with 20 ms.
warmup_data = get_subset(data, 0, int(20/dt))
outs = trainer.predict(warmup_data)
outs.shape
(1, 2000, 3)
The training data is the time series from 20 ms to 80 ms. We want the network has the abilitty to forecast 1 time step ahead.
x_train = get_subset(data, int(20/dt), int(80/dt))
y_train = get_subset(data, int(20/dt)+1, int(80/dt)+1)
trainer.fit([x_train, y_train])
Then we test the trained network with the next 20 ms.
x_test = get_subset(data, int(80/dt), int(100/dt)-1)
y_test = get_subset(data, int(80/dt) + 1, int(100/dt))
predictions = trainer.predict(x_test)
bp.losses.mean_squared_error(y_test, predictions)
DeviceArray(0.00014552, dtype=float64)
def plot_difference(truths, predictions):
truths = truths.numpy()
predictions = predictions.numpy()
plt.subplot(311)
plt.plot(truths[0, :, 0], label='Ground Truth')
plt.plot(predictions[0, :, 0], label='Prediction')
plt.ylabel('x')
plt.legend()
plt.subplot(312)
plt.plot(truths[0, :, 1], label='Ground Truth')
plt.plot(predictions[0, :, 1], label='Prediction')
plt.ylabel('y')
plt.legend()
plt.subplot(313)
plt.plot(truths[0, :, 2], label='Ground Truth')
plt.plot(predictions[0, :, 2], label='Prediction')
plt.ylabel('z')
plt.legend()
plt.show()
plot_difference(y_test, predictions)
We can make the task harder to forecast 10 time step ahead.
warmup_data = get_subset(data, 0, int(20/dt))
outs = trainer.predict(warmup_data)
x_train = get_subset(data, int(20/dt), int(80/dt))
y_train = get_subset(data, int(20/dt)+10, int(80/dt)+10)
trainer.fit([x_train, y_train])
x_test = get_subset(data, int(80/dt), int(100/dt)-10)
y_test = get_subset(data, int(80/dt) + 10, int(100/dt))
predictions = trainer.predict(x_test)
plot_difference(y_test, predictions)
Or forecast 100 time step ahead.
warmup_data = get_subset(data, 0, int(20/dt))
outs = trainer.predict(warmup_data)
x_train = get_subset(data, int(20/dt), int(80/dt))
y_train = get_subset(data, int(20/dt)+100, int(80/dt)+100)
trainer.fit([x_train, y_train])
x_test = get_subset(data, int(80/dt), int(100/dt)-100)
y_test = get_subset(data, int(80/dt) + 100, int(100/dt))
predictions = trainer.predict(x_test)
plot_difference(y_test, predictions)
As you see, forecasting larger time step makes the learning more difficult.
Next generation RC#
(Gauthier, et. al., Nature Communications, 2021) has proposed a next generation reservoir computing (NG-RC) model by using nonlinear vector autoregression (NVAR).
(A) A traditional RC processes time-series data using an artificial recurrent neural network. (B) The NG-RC performs a forecast using a linear weight of time-delay states of the time series data and nonlinear functionals of this data.
In BrainPy, we can easily implement this kind of network. Here, let’s try to use NG-RC to infer the \(z\) variable according to \(x\) and \(y\) variables. This task is important for applications where it is possible to obtain high-quality information about a dynamical variable in a laboratory setting, but not in field deployment.
Let’s first initialize the data we need.
dt = 0.02
t_warmup = 10. # ms
t_train = 20. # ms
t_test = 50. # ms
num_warmup = int(t_warmup / dt) # warm up NVAR
num_train = int(t_train / dt)
num_test = int(t_test / dt)
lorenz_series = bp.datasets.lorenz_series(t_warmup + t_train + t_test,
dt=dt,
inits={'x': 17.67715816276679,
'y': 12.931379185960404,
'z': 43.91404334248268})
def get_subset(data, start, end):
res = {'x': data['x'][start: end],
'y': data['y'][start: end],
'z': data['z'][start: end]}
X = bm.hstack([res['x'], res['y']])
X = X.reshape((1,) + X.shape)
Y = res['z']
Y = Y.reshape((1, ) + Y.shape)
return X, Y
X_warmup, Y_warmup = get_subset(lorenz_series, 0, num_warmup)
X_train, Y_train = get_subset(lorenz_series, num_warmup, num_warmup + num_train)
X_test, Y_test = get_subset(lorenz_series, num_warmup + num_train, num_warmup + num_train + num_test)
The network architecture is the same with the above echo state network. Specifically, we have an input node, a reservoir node and an output node. To accomplish this task, (Gauthier, et. al., Nature Communications, 2021) used 4 delay history information with stride of 5, and their quadratic polynomial monomials. Therefore, we create the network as:
i = bp.nn.Input(2)
r = bp.nn.NVAR(delay=4, order=2, stride=5)
o = bp.nn.LinearReadout(1, trainable=True)
model = i >> r >> o
model.initialize(num_batch=1)
We train the network using the Ridge Regression method too.
trainer = bp.nn.RidgeTrainer(model, beta=0.05)
# warm-up
outputs = trainer.predict(X_warmup)
print('Warmup NMS: ', bp.losses.mean_squared_error(outputs, Y_warmup))
# training
trainer.fit([X_train, Y_train])
# prediction
outputs = trainer.predict(X_test)
print('Prediction NMS: ', bp.losses.mean_squared_error(outputs, Y_test))
Warmup NMS: 10729.250973138222
Prediction NMS: 0.3374043793562189
X_test = bm.asarray(X_test).numpy()[0]
Y_test = bm.asarray(Y_test).numpy().flatten()
outputs = bm.asarray(outputs).numpy().flatten()
plt.figure(figsize=(10, 5))
plt.subplot(311)
plt.plot(X_test[:, 0], color='b')
plt.ylabel('x')
plt.subplot(312)
plt.plot(X_test[:, 1], color='b')
plt.ylabel('y')
plt.subplot(313)
plt.plot(Y_test, color='b', label='Grund Truth')
plt.plot(outputs, color='r', label='Prediction')
plt.ylabel('y')
plt.legend()
plt.show()
Recurrent neural network#
In recent years, artificial recurrent neural networks trained with back propagation through time (BPTT) have been a useful tool to study the network mechanism of brain functions. To support training networks with BPTT, BrainPy provides brainpy.nn.BPTT
method.
Here, we demonstrate how to train an artificial recurrent neural network by using a white noise integration task. In this task, we want our trained RNN model has the ability to integrate white noise. For example, if we has a time series of noise data,
noises = bm.random.normal(0, 0.2, size=10)
plt.figure(figsize=(8, 2))
plt.plot(noises.to_numpy().flatten())
plt.show()
Now, we want to get a model which can integrate the noise bm.cumsum(noises) * dt
:
dt = 0.1
integrals = bm.cumsum(noises) * dt
plt.figure(figsize=(8, 2))
plt.plot(integrals.to_numpy().flatten())
plt.show()
Here, we first define a task which generates the input data and the target integration results.
from functools import partial
dt = 0.04
num_step = int(1.0 / dt)
num_batch = 128
@partial(bm.jit,
dyn_vars=bp.TensorCollector({'a': bm.random.DEFAULT}),
static_argnames=['batch_size'])
def build_inputs_and_targets(mean=0.025, scale=0.01, batch_size=10):
# Create the white noise input
sample = bm.random.normal(size=(batch_size, 1, 1))
bias = mean * 2.0 * (sample - 0.5)
samples = bm.random.normal(size=(batch_size, num_step, 1))
noise_t = scale / dt ** 0.5 * samples
inputs = bias + noise_t
targets = bm.cumsum(inputs, axis=1)
return inputs, targets
def train_data():
for _ in range(100):
yield build_inputs_and_targets(batch_size=num_batch)
Then, we create and initialize the model. Note here we need the model train its initial state, so we need set state_trainable=True
for the used VanillaRNN
instance.
model = (
bp.nn.Input(1)
>>
bp.nn.VanillaRNN(100, state_trainable=True)
>>
bp.nn.Dense(1)
)
model.initialize(num_batch=num_batch)
brainpy.nn.BPTT
trainer receives a loss
function setting, and an optimizer
setting. Loss function can be selected from the brainpy.losses
module, or it can be a callable function receives (predictions, targets)
argument. Optimizer setting must be an instance of brainpy.optim.Optimizer
.
Here we define a loss function which use Mean Squared Error (MSE) to measure the error between the targets and the predictions. We also apply a L2 regularization.
# define loss function
def loss(predictions, targets, l2_reg=2e-4):
mse = bp.losses.mean_squared_error(predictions, targets)
l2 = l2_reg * bp.losses.l2_norm(model.train_vars().unique().dict()) ** 2
return mse + l2
# define optimizer
lr = bp.optim.ExponentialDecay(lr=0.025, decay_steps=1, decay_rate=0.99975)
opt = bp.optim.Adam(lr=lr, eps=1e-1)
# create a trainer
trainer = bp.nn.BPTT(model,
loss=loss,
optimizer=opt,
max_grad_norm=5.0)
# train the model
trainer.fit(train_data,
num_batch=num_batch,
num_train=30,
num_report=500)
Train 500 steps, use 9.3755 s, train loss 0.03093
Train 1000 steps, use 6.7661 s, train loss 0.0275
Train 1500 steps, use 6.9309 s, train loss 0.02998
Train 2000 steps, use 6.6827 s, train loss 0.02409
Train 2500 steps, use 6.6528 s, train loss 0.02289
Train 3000 steps, use 6.6663 s, train loss 0.02187
The training losses is recorded in the .train_losses
attribute.
plt.figure(figsize=(8, 3))
plt.plot(trainer.train_losses.numpy())
plt.xlabel('Number of Training Step')
plt.ylabel('Training Loss')
plt.show()
Finally, let’s try the trained network, and test whether it can generate the correct integration results.
model.initialize(1)
x, y = build_inputs_and_targets(batch_size=1)
predicts = trainer.predict(x)
plt.figure(figsize=(8, 2))
plt.plot(bm.as_numpy(y[0]).flatten(), label='Ground Truth')
plt.plot(bm.as_numpy(predicts[0]).flatten(), label='Prediction')
plt.legend()
plt.show()
Further reading#
More about Node specifications, please see Node Specification.
Details about Node operations, please see Node Operations.
Want to customize a Node, please see Node Customization.
More examples of training recurrent neural networks, please BrainPy Examples.