Recurrent neural networks are a type of neural network that add the explicit handling of order in input observations.

This capability suggests that the promise of recurrent neural networks is to learn the temporal context of input sequences in order to make better predictions. That is, that the suite of lagged observations required to make a prediction no longer must be diagnosed and specified as in traditional time series forecasting, or even forecasting with classical neural networks. Instead, the temporal dependence can be learned, and perhaps changes to this dependence can also be learned.

In this post, you will discover the promised capability of recurrent neural networks for time series forecasting. After reading this post, you will know:

- The focus and implicit, if not explicit, limitations on traditional time series forecasting methods.
- The capabilities provided in using traditional feed-forward neural networks for time series forecasting.
- The additional promise that recurrent neural networks make on top of traditional neural nets and hints of what this may mean in practice.

Let’s get started.

## Time Series Forecasting

Time series forecasting is difficult.

Unlike the simpler problems of classification and regression, time series problems add the complexity of order or temporal dependence between observations.

This can be difficult as the specialized handling of the data is required when fitting and evaluating models. It also aids in modeling, providing additional structure like trends and seasonality that can be leveraged to improve model skill.

Traditionally, time series forecasting has been dominated by linear methods like ARIMA because they are well understood and effective on many problems. But these traditional methods also suffer from some limitations, such as:

**Focus on complete data**: missing or corrupt data is generally unsupported.**Focus on linear relationships**: assuming a linear relationship excludes more complex joint distributions.**Focus on fixed temporal dependence**: the relationship between observations at different times, and in turn the number of lag observations provided as input, must be diagnosed and specified.**Focus on univariate data**: many real-world problems have multiple input variables.**Focus on one-step forecasts**: many real-world problems require forecasts with a long time horizon.

Existing techniques often depended on hand-crafted features that were expensive to create and required expert knowledge of the field.

— John Gamboa, Deep Learning for Time-Series Analysis, 2017

Note that some specialized techniques have been developed to address some of these limitations.

## Neural Networks for Time Series

Neural networks approximate a mapping function from input variables to output variables.

This general capability is valuable for time series for a number of reasons.

**Robust to Noise**. Neural networks are robust to noise in input data and in the mapping function and can even support learning and prediction in the presence of missing values.**Nonlinear**. Neural networks do not make strong assumptions about the mapping function and readily learn linear and nonlinear relationships.

… one important contribution of neural networks – namely their elegant ability to approximate arbitrary non-linear functions. This property is of high value in time series processing and promises more powerful applications, especially in the subfeld of forecasting …

— Georg Dorffner, Neural Networks for Time Series Processing, 1996.

More specifically, neural networks can be configured to support an arbitrary defined but fixed number of inputs and outputs in the mapping function. This means that:

**Multivariate Inputs**. An arbitrary number of input features can be specified, providing direct support for multivariate forecasting.**Multi-Step Forecasts**. An arbitrary number of output values can be specified, providing direct support for multi-step and even multivariate forecasting.

For these capabilities alone, feed-forward neural networks are widely used for time series forecasting.

Implicit in the usage of neural networks is the requirement that there is indeed a meaningful mapping from inputs to outputs to learn. Modeling a mapping of a random walk will perform no better than a persistence model (e.g. using the last seen observation as the forecast).

This expectation of a learnable mapping function also makes one of the limitations clear: the mapping function is fixed or static.

**Fixed inputs**. The number of lag input variables is fixed, in the same way as traditional time series forecasting methods.**Fixed outputs**. The number of output variables is also fixed; although a more subtle issue, it means that for each input pattern, one output must be produced.

Sequences pose a challenge for [deep neural networks] because they require that the dimensionality of the inputs and outputs is known and fixed.

— Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Sequence to Sequence Learning with Neural Networks, 2014

Feed-forward neural networks do offer great capability but still suffer from this key limitation of having to specify the temporal dependence upfront in the design of the model.

This dependence is almost always unknown and must be discovered and teased out from detailed analysis in a fixed form.

## Recurrent Neural Networks for Time Series

Recurrent neural networks like the Long Short-Term Memory network add the explicit handling of order between observations when learning a mapping function from inputs to outputs.

The addition of sequence is a new dimension to the function being approximated. Instead of mapping inputs to outputs alone, the network is capable of learning a mapping function for the inputs over time to an output.

This capability unlocks time series for neural networks.

Long Short-Term Memory (LSTM) is able to solve many time series tasks unsolvable by feed-forward networks using fixed size time windows.

— Felix A. Gers, Douglas Eck, Jürgen Schmidhuber, Applying LSTM to Time Series Predictable through Time-Window Approaches, 2001

In addition to the general benefits of using neural networks for time series forecasting, recurrent neural networks can also learn the temporal dependence from the data.

**Learned Temporal Dependence**. The context of observations over time is learned.

That is, in the simplest case, the network is shown one observation at a time from a sequence and can learn what observations it has seen previously are relevant and how they are relevant to forecasting.

Because of this ability to learn long term correlations in a sequence, LSTM networks obviate the need for a pre-specified time window and are capable of accurately modelling complex multivariate sequences.

— Pankaj Malhotra, et al., Long Short Term Memory Networks for Anomaly Detection in Time Series, 2015

The promise of recurrent neural networks is that the temporal dependence in the input data can be learned. That a fixed set of lagged observations does not need to be specified.

Implicit within this promise is that a temporal dependence that varies with circumstance can also be learned.

But, recurrent neural networks may be capable of more.

It is good practice to manually identify and remove such systematic structures from time series data to make the problem easier to model (e.g. make the series stationary), and this may still be a best practice when using recurrent neural networks. But, the general capability of these networks suggests that this may not be a requirement for a skillful model.

Technically, the available context may allow recurrent neural networks to learn:

**Trend**. An increasing or decreasing level to a time series and even variation in these changes.**Seasonality**. Consistently repeating patterns over time.

What do you think the promise is for LSTMs on time series forecasting problems?

## Summary

In this post, you discovered the promise of recurrent neural networks for time series forecasting.

Specifically, you learned:

- Traditional time series forecasting methods focus on univariate data with linear relationships and fixed and manually-diagnosed temporal dependence.
- Neural networks add the capability to learn possibly noisy and nonlinear relationships with arbitrarily defined but fixed numbers of inputs and outputs supporting multivariate and multi-step forecasting.
- Recurrent neural networks add the explicit handling of ordered observations and the promise of learning temporal dependence from context.

Do you disagree with my thoughts on the promise of LSTMs for time series forecasting?

Leave a comment below and join the discussion.