Multivariate multi-step time series forecasting using sequence models (4/4)
Models -
Option — 3
In this approach, we make use of stacked lstms so that we can compare the performance of the encoder-decoder based multi-step output models.
- A
In this approach, we make use of stacked lstms to predict just a single-step output.
The model architecture looks like -
Although, it doesn’t make much sense to initialize the second lstm’s initial states with the first lstm’s last states, but it gave me marginally better results. You can try it both ways.
Train code -
Inference code -
During inference, we iterate over the model’s output for the same number of steps as the earlier models for a fair comparison.
- B
In this approach, we make use of stacked lstms to predict a multi-step output.
The model architecture looks like -
Although, it doesn’t make much sense to initialize the second lstm’s initial states with the first lstm’s last states, but it gave me marginally better results. You can try it both ways.
Also, in this model the input and outputs steps need to be equal because the second lstm has return_sequences=True.
Train code -
Inference code -
Option — 4
These models are a hybrid between the previous models.
Although, these models are slow to converge and inefficient, using these we can compare the performance of the encoder-decoder based multi-step output models.
- With teacher forcing
In this approach, we make use of stacked lstms along with teacher forcing to predict a multi-step output.
Train code -
Inference code -
- Without teacher forcing
In this approach, we make use of stacked lstms without teacher forcing to predict a multi-step output.
Train code -
Inference code -
- Per feature per time step performance
I’ve left out per feature per time step performance analysis out of the notebooks to keep them concise, but here’s how it can be added to the existing code -
Here, gt_list and pred_list are two lists of length n_out = 5, where each element is an array of shape — (x,features), where x and features depend on our data, which in this case is — (5954, 24).
Thus, we have divided the expected output and the output we got from the models into separate buckets corresponding to each time step at the decoder side for easy comparison.
In this code snippet —
plt.plot(gt_list[i][:,j], c='blue', label="gt")
plt.plot(pred_list[i][:,j],c='red', label="pred")
i represents time steps which goes from 0–4 and j stands for features which goes from 0–23. Thus, we will have 5 * 24 = 120 such comparisons per model.
- Future work
Apart from teacher forcing and the normal approach, there is another method which is a combination of the two, called scheduled sampling, where during the training process, we feed the model ground truth values from the previous time steps during the initial epochs and gradually more of its predicted outputs during the last epochs.
- References
https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/
https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/