Making outofsample forecasts can be confusing when getting started with time series data.
The statsmodels Python API provides functions for performing onestep and multistep outofsample forecasts.
In this tutorial, you will clear up any confusion you have about making outofsample forecasts with time series data in Python.
After completing this tutorial, you will know:
 How to make a onestep outofsample forecast.
 How to make a multistep outofsample forecast.
 The difference between the forecast() and predict() functions.
Let’s get started.
Tutorial Overview
This tutorial is broken down into the following 5 steps:
 Dataset Description
 Split Dataset
 Develop Model
 OneStep OutofSample Forecast
 MultiStep OutofSample Forecast
Stop learning Time Series Forecasting the slow way
Signup and get a FREE 7day Time Series Forecasting MiniCourse
You will get:
…one lesson each day delivered to your inbox
…exclusive PDF ebook containing all lessons
…confidence and skills to work through your own projects
Download Your FREE MiniCourse
1. Minimum Daily Temperatures Dataset
This dataset describes the minimum daily temperatures over 10 years (19811990) in the city of Melbourne, Australia.
The units are in degrees Celsius and there are 3,650 observations. The source of the data is credited as the Australian Bureau of Meteorology.
Learn more about the dataset on Data Market.
Download the Minimum Daily Temperatures dataset to your current working directory with the filename “dailyminimumtemperatures.csv”.
Note: The downloaded file contains some question mark (“?”) characters that must be removed before you can use the dataset. Open the file in a text editor and remove the “?” characters. Also, remove any footer information in the file.
The example below loads the dataset as a Pandas Series.

# line plot of time series from pandas import Series from matplotlib import pyplot # load dataset series = Series.from_csv(‘dailyminimumtemperatures.csv’, header=0) # display first few rows print(series.head(20)) # line plot of dataset series.plot() pyplot.show() 
Running the example prints the first 20 rows of the loaded dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Date 19810101 20.7 19810102 17.9 19810103 18.8 19810104 14.6 19810105 15.8 19810106 15.8 19810107 15.8 19810108 17.4 19810109 21.8 19810110 20.0 19810111 16.2 19810112 13.3 19810113 16.7 19810114 21.5 19810115 25.0 19810116 20.7 19810117 20.6 19810118 24.8 19810119 17.7 19810120 15.5 
A line plot of the time series is also created.
2. Split Dataset
We can split the dataset into two parts.
The first part is the training dataset that we will use to prepare an ARIMA model. The second part is the test dataset that we will pretend is not available. It is these time steps that we will treat as out of sample.
The dataset contains data from January 1st 1981 to December 31st 1990.
We will hold back the last 7 days of the dataset from December 1990 as the test dataset and treat those time steps as out of sample.
Specifically 19901225 to 19901231:

19901225,12.9 19901226,14.6 19901227,14.0 19901228,13.6 19901229,13.5 19901230,15.7 19901231,13.0 
The code below will load the dataset, split it into the training and validation datasets, and save them to files dataset.csv and validation.csv respectively.

# split the dataset from pandas import Series series = Series.from_csv(‘dailyminimumtemperatures.csv’, header=0) split_point = len(series) – 7 dataset, validation = series[0:split_point], series[split_point:] print(‘Dataset %d, Validation %d’ % (len(dataset), len(validation))) dataset.to_csv(‘dataset.csv’) validation.to_csv(‘validation.csv’) 
Run the example and you should now have two files to work with.
The last observation in the dataset.csv is Christmas Eve 1990:
That means Christmas Day 1990 and onwards are outofsample time steps for a model trained on dataset.csv.
3. Develop Model
In this section, we are going to make the data stationary and develop a simple ARIMA model.
The data has a strong seasonal component. We can neutralize this and make the data stationary by taking the seasonal difference. That is, we can take the observation for a day and subtract the observation from the same day one year ago.
This will result in a stationary dataset from which we can fit a model.

# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] – dataset[i – interval] diff.append(value) return numpy.array(diff) 
We can invert this operation by adding the value of the observation one year ago. We will need to do this to any forecasts made by a model trained on the seasonally adjusted data.

# invert differenced value def inverse_difference(history, yhat, interval=1): return yhat + history[–interval] 
We can fit an ARIMA model.
Fitting a strong ARIMA model to the data is not the focus of this post, so rather than going through the analysis of the problem or grid searching parameters, I will choose a simple ARIMA(7,0,7) configuration.
We can put all of this together as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

from pandas import Series from statsmodels.tsa.arima_model import ARIMA import numpy
# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] – dataset[i – interval] diff.append(value) return numpy.array(diff)
# load dataset series = Series.from_csv(‘dataset.csv’, header=None) # seasonal difference X = series.values days_in_year = 365 differenced = difference(X, days_in_year) # fit model model = ARIMA(differenced, order=(7,0,1)) model_fit = model.fit(disp=0) # print summary of fit model print(model_fit.summary()) 
Running the example loads the dataset, takes the seasonal difference, then fits an ARIMA(7,0,7) model and prints the summary of the fit model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

ARMA Model Results ============================================================================== Dep. Variable: y No. Observations: 3278 Model: ARMA(7, 1) Log Likelihood 8673.748 Method: cssmle S.D. of innovations 3.411 Date: Mon, 20 Feb 2017 AIC 17367.497 Time: 10:28:38 BIC 17428.447 Sample: 0 HQIC 17389.322
============================================================================== coef std err z P>z [0.025 0.975] —————————————————————————— const 0.0132 0.132 0.100 0.921 0.246 0.273 ar.L1.y 1.1424 0.287 3.976 0.000 0.579 1.706 ar.L2.y 0.4346 0.154 2.829 0.005 0.736 0.133 ar.L3.y 0.0961 0.042 2.289 0.022 0.014 0.178 ar.L4.y 0.0125 0.029 0.434 0.664 0.044 0.069 ar.L5.y 0.0101 0.029 0.343 0.732 0.068 0.047 ar.L6.y 0.0119 0.027 0.448 0.654 0.040 0.064 ar.L7.y 0.0089 0.024 0.368 0.713 0.038 0.056 ma.L1.y 0.6157 0.287 2.146 0.032 1.178 0.053 Roots ============================================================================= Real Imaginary Modulus Frequency —————————————————————————– AR.1 1.2234 0.0000j 1.2234 0.0000 AR.2 1.2561 1.0676j 1.6485 0.1121 AR.3 1.2561 +1.0676j 1.6485 0.1121 AR.4 0.0349 2.0160j 2.0163 0.2472 AR.5 0.0349 +2.0160j 2.0163 0.2472 AR.6 2.5770 1.3110j 2.8913 0.4251 AR.7 2.5770 +1.3110j 2.8913 0.4251 MA.1 1.6242 +0.0000j 1.6242 0.0000 —————————————————————————– 
We are now ready to explore making outofsample forecasts with the model.
4. OneStep OutofSample Forecast
ARIMA models are great for onestep forecasts.
A onestep forecast is a forecast of the very next time step in the sequence from the available data used to fit the model.
In this case, we are interested in a onestep forecast of Christmas Day 1990:
Forecast Function
The statsmodel ARIMAResults object provides a forecast() function for making predictions.
By default, this function makes a single step outofsample forecast. As such, we can call it directly and make our forecast. The result of the forecast() function is an array containing the forecast value, the standard error of the forecast, and the confidence interval information. Now, we are only interested in the first element of this forecast, as follows.

# onestep outof sample forecast forecast = model_fit.forecast()[0] 
Once made, we can invert the seasonal difference and convert the value back into the original scale.

# invert the differenced forecast to something usable forecast = inverse_difference(X, forecast, days_in_year) 
The complete example is listed below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

from pandas import Series from statsmodels.tsa.arima_model import ARIMA import numpy
# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] – dataset[i – interval] diff.append(value) return numpy.array(diff)
# invert differenced value def inverse_difference(history, yhat, interval=1): return yhat + history[–interval]
# load dataset series = Series.from_csv(‘dataset.csv’, header=None) # seasonal difference X = series.values days_in_year = 365 differenced = difference(X, days_in_year) # fit model model = ARIMA(differenced, order=(7,0,1)) model_fit = model.fit(disp=0) # onestep outof sample forecast forecast = model_fit.forecast()[0] # invert the differenced forecast to something usable forecast = inverse_difference(X, forecast, days_in_year) print(‘Forecast: %f’ % forecast) 
Running the example prints 14.8 degrees, which is close to the expected 12.9 degrees in the validation.csv file.
Predict Function
The statsmodel ARIMAResults object also provides a predict() function for making forecasts.
The predict function can be used to predict arbitrary insample and outofsample time steps, including the next outofsample forecast time step.
The predict function requires a start and an end to be specified, these can be the indexes of the time steps relative to the beginning of the training data used to fit the model, for example:

# onestep out of sample forecast start_index = len(differenced) end_index = len(differenced) forecast = model_fit.predict(start=start_index, end=end_index) 
The start and end can also be a datetime string or a “datetime” type; for example:

start_index = ‘19901225’ end_index = ‘19901225’ forecast = model_fit.predict(start=start_index, end=end_index) 
and

from pandas import datetime start_index = datetime(1990, 12, 25) end_index = datetime(1990, 12, 26) forecast = model_fit.predict(start=start_index, end=end_index) 
Using anything other than the time step indexes results in an error on my system, as follows:

AttributeError: ‘NoneType’ object has no attribute ‘get_loc’ 
Perhaps you will have more luck; for now, I am sticking with the time step indexes.
The complete example is listed below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

from pandas import Series from statsmodels.tsa.arima_model import ARIMA import numpy from pandas import datetime
# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] – dataset[i – interval] diff.append(value) return numpy.array(diff)
# invert differenced value def inverse_difference(history, yhat, interval=1): return yhat + history[–interval]
# load dataset series = Series.from_csv(‘dataset.csv’, header=None) # seasonal difference X = series.values days_in_year = 365 differenced = difference(X, days_in_year) # fit model model = ARIMA(differenced, order=(7,0,1)) model_fit = model.fit(disp=0) # onestep out of sample forecast start_index = len(differenced) end_index = len(differenced) forecast = model_fit.predict(start=start_index, end=end_index) # invert the differenced forecast to something usable forecast = inverse_difference(X, forecast, days_in_year) print(‘Forecast: %f’ % forecast) 
Running the example prints the same forecast as above when using the forecast() function.
You can see that the predict function is more flexible. You can specify any point or contiguous forecast interval in or out of sample.
Now that we know how to make a onestep forecast, we can now make some multistep forecasts.
5. MultiStep OutofSample Forecast
We can also make multistep forecasts using the forecast() and predict() functions.
It is common with weather data to make one week (7day) forecasts, so in this section we will look at predicting the minimum daily temperature for the next 7 outofsample time steps.
Forecast Function
The forecast() function has an argument called steps that allows you to specify the number of time steps to forecast.
By default, this argument is set to 1 for a onestep outofsample forecast. We can set it to 7 to get a forecast for the next 7 days.

# multistep outofsample forecast forecast = model_fit.forecast(steps=7)[0] 
We can then invert each forecasted time step, one at a time and print the values. Note that to invert the forecast value for t+2, we need the inverted forecast value for t+1. Here, we add them to the end of a list called history for use when calling inverse_difference().

# invert the differenced forecast to something usable history = [x for x in X] day = 1 for yhat in forecast: inverted = inverse_difference(history, yhat, days_in_year) print(‘Day %d: %f’ % (day, inverted)) history.append(inverted) day += 1 
The complete example is listed below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

from pandas import Series from statsmodels.tsa.arima_model import ARIMA import numpy
# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] – dataset[i – interval] diff.append(value) return numpy.array(diff)
# invert differenced value def inverse_difference(history, yhat, interval=1): return yhat + history[–interval]
# load dataset series = Series.from_csv(‘dataset.csv’, header=None) # seasonal difference X = series.values days_in_year = 365 differenced = difference(X, days_in_year) # fit model model = ARIMA(differenced, order=(7,0,1)) model_fit = model.fit(disp=0) # multistep outofsample forecast forecast = model_fit.forecast(steps=7)[0] # invert the differenced forecast to something usable history = [x for x in X] day = 1 for yhat in forecast: inverted = inverse_difference(history, yhat, days_in_year) print(‘Day %d: %f’ % (day, inverted)) history.append(inverted) day += 1 
Running the example prints the forecast for the next 7 days.

Day 1: 14.861669 Day 2: 15.628784 Day 3: 13.331349 Day 4: 11.722413 Day 5: 10.421523 Day 6: 14.415549 Day 7: 12.674711 
Predict Function
The predict() function can also forecast the next 7 outofsample time steps.
Using time step indexes, we can specify the end index as 6 more time steps in the future; for example:

# multistep outofsample forecast start_index = len(differenced) end_index = start_index + 6 forecast = model_fit.predict(start=start_index, end=end_index) 
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

from pandas import Series from statsmodels.tsa.arima_model import ARIMA import numpy
# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] – dataset[i – interval] diff.append(value) return numpy.array(diff)
# invert differenced value def inverse_difference(history, yhat, interval=1): return yhat + history[–interval]
# load dataset series = Series.from_csv(‘dataset.csv’, header=None) # seasonal difference X = series.values days_in_year = 365 differenced = difference(X, days_in_year) # fit model model = ARIMA(differenced, order=(7,0,1)) model_fit = model.fit(disp=0) # multistep outofsample forecast start_index = len(differenced) end_index = start_index + 6 forecast = model_fit.predict(start=start_index, end=end_index) # invert the differenced forecast to something usable history = [x for x in X] day = 1 for yhat in forecast: inverted = inverse_difference(history, yhat, days_in_year) print(‘Day %d: %f’ % (day, inverted)) history.append(inverted) day += 1 
Running the example produces the same results as calling the forecast() function in the previous section, as you would expect.

Day 1: 14.861669 Day 2: 15.628784 Day 3: 13.331349 Day 4: 11.722413 Day 5: 10.421523 Day 6: 14.415549 Day 7: 12.674711 
Summary
In this tutorial, you discovered how to make outofsample forecasts in Python using statsmodels.
Specifically, you learned:
 How to make a onestep outofsample forecast.
 How to make a 7day multistep outofsample forecast.
 How to use both the forecast() and predict() functions when forecasting.
Do you have any questions about outofsample forecasts, or about this post? Ask your questions in the comments and I will do my best to answer.