The use of machine learning methods on time series data requires feature engineering.
A univariate time series dataset is only comprised of a sequence of observations. These must be transformed into input and output features in order to use supervised learning algorithms.
The problem is that there is little limit to the type and number of features you can engineer for a time series problem. Classical time series analysis tools like the correlogram can help with evaluating lag variables, but do not directly help when selecting other types of features, such as those derived from the timestamps (year, month or day) and moving statistics, like a moving average.
In this tutorial, you will discover how you can use the machine learning tools of feature importance and feature selection when working with time series data.
After completing this tutorial, you will know:
 How to create and interpret a correlogram of lagged observations.
 How to calculate and interpret feature importance scores for time series features.
 How to perform feature selection on time series input variables.
Let’s get started.
Tutorial Overview
This tutorial is broken down into the following 5 steps:
 Monthly Car Sales Dataset: That describes the dataset we will be working with.
 Make Stationary: That describes how to make the dataset stationary for analysis and forecasting.
 Autocorrelation Plot: That describes how to create a correlogram of the time series data.
 Feature Importance of Lag Variables: That describes how to calculate and review feature importance scores for time series data.
 Feature Selection of Lag Variables: That describes how to calculate and review feature selection results for time series data.
Let’s start off by looking at a standard time series dataset.
Stop learning Time Series Forecasting the slow way
Signup and get a FREE 7day Time Series Forecasting MiniCourse
You will get:
…one lesson each day delivered to your inbox
…exclusive PDF ebook containing all lessons
…confidence and skills to work through your own projects
Download Your FREE MiniCourse
Monthly Car Sales Dataset
In this tutorial, we will use the Monthly Car Sales dataset.
This dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.
The units are a count of the number of sales and there are 108 observations. The source data is credited to Abraham and Ledolter (1983).
You can download the dataset from DataMarket.
Download the dataset and save it into your current working directory with the filename “carsales.csv“. Note, you may need to delete the footer information from the file.
The code below loads the dataset as a Pandas Series object.

# line plot of time series from pandas import Series from matplotlib import pyplot # load dataset series = Series.from_csv(‘carsales.csv’, header=0) # display first few rows print(series.head(5)) # line plot of dataset series.plot() pyplot.show() 
Running the example prints the first 5 rows of data.

Month 19600101 6550 19600201 8728 19600301 12026 19600401 14395 19600501 14587 Name: Sales, dtype: int64 
A line plot of the data is also provided.
Make Stationary
We can see a clear seasonality and increasing trend in the data.
The trend and seasonality are fixed components that can be added to any prediction we make. They are useful, but need to be removed in order to explore any other systematic signals that can help make predictions.
A time series with seasonality and trend removed is called stationary.
To remove the seasonality, we can take the seasonal difference, resulting in a socalled seasonally adjusted time series.
The period of the seasonality appears to be one year (12 months). The code below calculates the seasonally adjusted time series and saves it to the file “seasonallyadjusted.csv“.

# seasonally adjust the time series from pandas import Series from matplotlib import pyplot # load dataset series = Series.from_csv(‘carsales.csv’, header=0) # seasonal difference differenced = series.diff(12) # trim off the first year of empty data differenced = differenced[12:] # save differenced dataset to file differenced.to_csv(‘seasonally_adjusted.csv’) # plot differenced dataset differenced.plot() pyplot.show() 
Because the first 12 months of data have no prior data to be differenced against, they must be discarded.
The stationary data is stored in “seasonallyadjusted.csv“. A line plot of the differenced data is created.
The plot suggests that the seasonality and trend information was removed by differencing.
Autocorrelation Plot
Traditionally, time series features are selected based on their correlation with the output variable.
This is called autocorrelation and involves plotting autocorrelation plots, also called a correlogram. These show the correlation of each lagged observation and whether or not the correlation is statistically significant.
For example, the code below plots the correlogram for all lag variables in the Monthly Car Sales dataset.

from pandas import Series from statsmodels.graphics.tsaplots import plot_acf from matplotlib import pyplot series = Series.from_csv(‘seasonally_adjusted.csv’, header=None) plot_acf(series) pyplot.show() 
Running the example creates a correlogram, or Autocorrelation Function (ACF) plot, of the data.
The plot shows lag values along the xaxis and correlation on the yaxis between 1 and 1 for negatively and positively correlated lags respectively.
The dots above the blue area indicate statistical significance. The correlation of 1 for the lag value of 0 indicates 100% positive correlation of an observation with itself.
The plot shows significant lag values at 1, 2, 12, and 17 months.
This analysis provides a good baseline for comparison.
Time Series to Supervised Learning
We can convert the univariate Monthly Car Sales dataset into a supervised learning problem by taking the lag observation (e.g. t1) as inputs and using the current observation (t) as the output variable.
We can do this in Pandas using the shift function to create new columns of shifted observations.
The example below creates a new time series with 12 months of lag values to predict the current observation.
The shift of 12 months means that the first 12 rows of data are unusable as they contain NaN values.

from pandas import Series from pandas import DataFrame # load dataset series = Series.from_csv(‘seasonally_adjusted.csv’, header=None) # reframe as supervised learning dataframe = DataFrame() for i in range(12,0,–1): dataframe[‘t‘+str(i)] = series.shift(i) dataframe[‘t’] = series.values print(dataframe.head(13)) dataframe = dataframe[13:] # save to new file dataframe.to_csv(‘lags_12months_features.csv’, index=False) 
Running the example prints the first 13 rows of data showing the unusable first 12 rows and the usable 13th row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

t12 t11 t10 t9 t8 t7 t6 t5 19610101 NaN NaN NaN NaN NaN NaN NaN NaN 19610201 NaN NaN NaN NaN NaN NaN NaN NaN 19610301 NaN NaN NaN NaN NaN NaN NaN NaN 19610401 NaN NaN NaN NaN NaN NaN NaN NaN 19610501 NaN NaN NaN NaN NaN NaN NaN NaN 19610601 NaN NaN NaN NaN NaN NaN NaN 687.0 19610701 NaN NaN NaN NaN NaN NaN 687.0 646.0 19610801 NaN NaN NaN NaN NaN 687.0 646.0 189.0 19610901 NaN NaN NaN NaN 687.0 646.0 189.0 611.0 19611001 NaN NaN NaN 687.0 646.0 189.0 611.0 1339.0 19611101 NaN NaN 687.0 646.0 189.0 611.0 1339.0 30.0 19611201 NaN 687.0 646.0 189.0 611.0 1339.0 30.0 1645.0 19620101 687.0 646.0 189.0 611.0 1339.0 30.0 1645.0 276.0
t4 t3 t2 t1 t 19610101 NaN NaN NaN NaN 687.0 19610201 NaN NaN NaN 687.0 646.0 19610301 NaN NaN 687.0 646.0 189.0 19610401 NaN 687.0 646.0 189.0 611.0 19610501 687.0 646.0 189.0 611.0 1339.0 19610601 646.0 189.0 611.0 1339.0 30.0 19610701 189.0 611.0 1339.0 30.0 1645.0 19610801 611.0 1339.0 30.0 1645.0 276.0 19610901 1339.0 30.0 1645.0 276.0 561.0 19611001 30.0 1645.0 276.0 561.0 470.0 19611101 1645.0 276.0 561.0 470.0 3395.0 19611201 276.0 561.0 470.0 3395.0 360.0 19620101 561.0 470.0 3395.0 360.0 3440.0 
The first 12 rows are removed from the new dataset and results are saved in the file “lags_12months_features.csv“.
This process can be repeated with an arbitrary number of time steps, such as 6 months or 24 months, and I would recommend experimenting.
Feature Importance of Lag Variables
Ensembles of decision trees, like bagged trees, random forest, and extra trees, can be used to calculate a feature importance score.
This is common in machine learning to estimate the relative usefulness of input features when developing predictive models.
We can use feature importance to help to estimate the relative importance of contrived input features for time series forecasting.
This is important because we can contrive not only the lag observation features above, but also features based on the timestamp of observations, rolling statistics, and much more. Feature importance is one method to help sort out what might be more useful in when modeling.
The example below loads the supervised learning view of the dataset created in the previous section, fits a random forest model (RandomForestRegressor), and summarizes the relative feature importance scores for each of the 12 lag observations.
A largeish number of trees is used to ensure the scores are somewhat stable. Additionally, the random number seed is initialized to ensure that the same result is achieved each time the code is run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

from pandas import read_csv from sklearn.ensemble import RandomForestRegressor from matplotlib import pyplot # load data dataframe = read_csv(‘lags_12months_features.csv’, header=0) array = dataframe.values # split into input and output X = array[:,0:–1] y = array[:,–1] # fit random forest model model = RandomForestRegressor(n_estimators=500, random_state=1) model.fit(X, y) # show importance scores print(model.feature_importances_) # plot importance scores names = dataframe.columns.values[0:–1] ticks = [i for i in range(len(names))] pyplot.bar(ticks, model.feature_importances_) pyplot.xticks(ticks, names) pyplot.show() 
Running the example first prints the importance scores of the lagged observations.

[ 0.21642244 0.06271259 0.05662302 0.05543768 0.07155573 0.08478599 0.07699371 0.05366735 0.1033234 0.04897883 0.1066669 0.06283236] 
The scores are then plotted as a bar graph.
The plot shows the high relative importance of the observation at t12 and, to a lesser degree, the importance of observations at t2 and t4.
It is interesting to note a difference with the outcome from the correlogram above.
This process can be repeated with different methods that can calculate importance scores, such as gradient boosting, extra trees, and bagged decision trees.
Feature Selection of Lag Variables
We can also use feature selection to automatically identify and select those input features that are most predictive.
A popular method for feature selection is called Recursive Feature Selection (RFE).
RFE works by creating predictive models, weighting features, and pruning those with the smallest weights, then repeating the process until a desired number of features are left.
The example below uses RFE with a random forest predictive model and sets the desired number of input features to 4.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

from pandas import read_csv from sklearn.feature_selection import RFE from sklearn.ensemble import RandomForestRegressor from matplotlib import pyplot # load dataset dataframe = read_csv(‘lags_12months_features.csv’, header=0) # separate into input and output variables array = dataframe.values X = array[:,0:–1] y = array[:,–1] # perform feature selection rfe = RFE(RandomForestRegressor(n_estimators=500, random_state=1), 4) fit = rfe.fit(X, y) # report selected features print(‘Selected Features:’) names = dataframe.columns.values[0:–1] for i in range(len(fit.support_)): if fit.support_[i]: print(names[i]) # plot feature rank names = dataframe.columns.values[0:–1] ticks = [i for i in range(len(names))] pyplot.bar(ticks, fit.ranking_) pyplot.xticks(ticks, names) pyplot.show() 
Running the example prints the names of the 4 selected features.
Unsurprisingly, the results match features that showed a high importance in the previous section.

Selected Features: t12 t6 t4 t2 
A bar graph is also created showing the feature selection rank (smaller is better) for each input feature.
This process can be repeated with different numbers of features to select more than 4 and different models other than random forest.
Summary
In this tutorial, you discovered how to use the tools of applied machine learning to help select features from time series data when forecasting.
Specifically, you learned:
 How to interpret a correlogram for highly correlated lagged observations.
 How to calculate and review feature importance scores in time series data.
 How to use feature selection to identify the most relevant input variables in time series data.
Do you have any questions about feature selection with time series data?
Ask your questions in the comments and I will do my best to answer.