The Python ecosystem is growing and may become the dominant platform for applied machine learning.
The primary rationale for adopting Python for time series forecasting is because it is a generalpurpose programming language that you can use both for R&D and in production.
In this post, you will discover the Python ecosystem for time series forecasting.
After reading this post, you will know:
 The three standard Python libraries that are critical for time series forecasting.
 How to install and setup the Python and SciPy environment for development.
 How to confirm your environment is working correctly and ready for time series forecasting.
Let’s get started.
Why Python?
Python is a generalpurpose interpreted programming language (unlike R or Matlab).
It is easy to learn and use primarily because the language focuses on readability.
It is a popular language in general, consistently appearing in the top 10 programming languages in surveys on StackOverflow (for example, the 2015 survey results).
Python is a dynamic language and very suited to interactive development and quick prototyping with the power to support the development of large applications.
Python is also widely used for machine learning and data science because of the excellent library support. It has quickly become one of the dominant platforms for machine learning and data science practitioners and is in greater demand than even the R platform by employers (see the graph below).
This is a simple and very important consideration.
It means that you can perform your research and development (figuring out what models to use) in the same programming language that you use in operations, greatly simplifying the transition from development to operations.
Stop learning Time Series Forecasting the slow way
Signup and get a FREE 7day Time Series Forecasting MiniCourse
You will get:
…one lesson each day delivered to your inbox
…exclusive PDF ebook containing all lessons
…confidence and skills to work through your own projects
Download Your FREE MiniCourse
Python Libraries for Time Series
SciPy is an ecosystem of Python libraries for mathematics, science, and engineering. It is an addon to Python that you will need for time series forecasting.
Two SciPy libraries provide a foundation for most others; they are NumPy for providing efficient array operations and Matplotlib for plotting data.There are three higherlevel SciPy libraries that provide the key features for time series forecasting in Python.
They are pandas, statsmodels, and scikitlearn for data handling, time series modeling, and machine learning respectively.
Let’s take a closer look at each in turn.
Library: pandas
The pandas library provides highperformance tools for loading and handling data in Python.
It is built upon and requires the SciPy ecosystem and uses primarily NumPy arrays under the covers but provides convenient and easy to use data structures like DataFrame and Series for representing data.
Pandas provides a special focus on support for time series data.
Key features relevant for time series forecasting in pandas include:
 The Series object for representing a univariate time series.
 Explicit handling of datetime indexes in data and datetime ranges.
 Transforms such as shifting, lagging, and filling.
 Resampling methods such as upsampling, downsampling, and aggregation.
Library: statsmodels
The statsmodels library provides tools for statistical modeling.
It is built upon and requires the SciPy ecosystem and supports data in the form of NumPy arrays and Pandas Series objects.
It provides a suite of statistical test and modeling methods, as well as tools dedicated to time series analysis that can also be used for forecasting.
Key features of statsmodels relevant to time series forecasting include:
 Statistical tests for stationarity such as the Augmented DickeyFuller unit root test.
 Time series analysis plots such as autocorrelation function (ACF) and partial autocorrelation function (PACF).
 Linear time series models such as autoregression (AR), moving average (MA), autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA).
Library: scikitlearn
The scikitlearn library is how you can develop and practice machine learning in Python.
It is built upon and requires the SciPy ecosystem. The name “sckit” suggests that it is a SciPy plugin or toolkit. You can review a full list of available SciKits.
The focus of the library is machine learning algorithms for classification, regression, clustering, and more. It also provides tools for related tasks such as evaluating models, tuning parameters, and preprocessing data.
Key features relevant for time series forecasting in scikitlearn include:
 The suite of data preparation tools, such as scaling and imputing data.
 The suite of machine learning algorithms that could be used to model data and make predictions.
 The resampling methods for estimating the performance of a model on unseen data, specifically the TimeSeriesSplit.
Python Ecosystem Installation
This section will provide you general advice for setting up your Python environment for time series forecasting.
We will cover:
 Automatic installation with Anaconda.
 Manual installation with your platform’s package management.
 Confirmation of the installed environment.
If you already have a functioning Python environment, skip to the confirmation step to check if your software libraries are uptodate.
Let’s dive in.
1. Automatic Installation
If you are not confident at installing software on your machine or you’re on Microsoft Windows, there is an easy option for you.
There is a distribution called Anaconda Python that you can download and install for free.
It supports the three main platforms of Microsoft Windows, Mac OS X, and Linux.
It includes Python, SciPy, and scikitlearn: everything you need to learn, practice, and use time series forecasting with the Python Environment.
You can get started with Anaconda Python here:
2. Manual Installation
There are multiple ways to install the Python ecosystem specific to your platform.
In this section, we cover how to install the Python ecosystem for time series forecasting.
How To Install Python
The first step is to install Python. I prefer to use and recommend Python 2.7 or Python 3.5.
Installation of Python will be specific to your platform. For instructions see:
On Mac OS X with macports, I would type:

sudo port install python35 sudo port select —set python python35 sudo port select —set python3 python35 
How To Install SciPy
There are many ways to install SciPy.
For example, two popular ways are to use package management on your platform (e.g. dnf on RedHat or macports on OS X) or use a Python package management tool, like pip.
The SciPy documentation is excellent and covers howto instructions for many different platforms on the page Installing the SciPy Stack.
When installing SciPy, ensure that you install the following packages as a minimum:
 scipy
 numpy
 matplotlib
 pandas
 statsmodels
On Mac OS X with macports, I would type:

sudo port install py35numpy py35scipy py35matplotlib py35pandas py35statsmodels py35pip sudo port select –set pip pip35 
On Fedora Linux with dnf, I would type:

sudo dnf install python3numpy python3scipy python3pandas python3matplotlib python3statsmodels 
How To Install scikitlearn
The scikitlearn library must be installed separately.
I would suggest that you use the same method to install scikitlearn as you used to install SciPy.
There are instructions for installing scikitlearn, but they are limited to using the Python pip package manager.
On Linux and Mac OS X, I installed scikitlearn by typing:

sudo pip install U scikitlearn 
3. Confirm Your Environment
Once you have setup your environment, you must confirm that it works as expected.
Let’s first check that Python was installed successfully. Open a command line and type:
You should see a response like the following:
or
Now, confirm that the libraries were installed successfully.
Create a new file called versions.py and copy and paste the following code snippet into it and save the file as versions.py.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

# scipy import scipy print(‘scipy: %s’ % scipy.__version__) # numpy import numpy print(‘numpy: %s’ % numpy.__version__) # matplotlib import matplotlib print(‘matplotlib: %s’ % matplotlib.__version__) # pandas import pandas print(‘pandas: %s’ % pandas.__version__) # statsmodels import statsmodels print(‘statsmodels: %s’ % statsmodels.__version__) # scikitlearn import sklearn print(‘sklearn: %s’ % sklearn.__version__) 
Run the file on the command line or in your favorite Python editor. For example, type:
This will print the version of each key library you require.
For example, on my system at the time of writing, I got the following results:

scipy: 0.18.1 numpy: 1.11.3 matplotlib: 1.5.3 pandas: 0.19.1 statsmodels: 0.6.1 sklearn: 0.18.1 
If you have an error, stop now and fix it. You may need to consult the documentation specific for your platform.
Summary
In this post, you discovered the Python ecosystem for time series forecasting.
You learned about:
 The pandas, statsmodels, and scikitlearn libraries that are the top time series forecasting with Python.
 How to automatically and manually setup a Python SciPy environment for development.
 How to confirm your environment is installed correctly and that you’re ready to started developing models.
You also learned how to install the Python ecosystem for machine learning on your workstation.
Do you have any questions about Python for time series forecasting, or this post? Ask your question in the comments and I will do my best to answer.