You may have observations at the wrong frequency.
Maybe they are too granular or not granular enough. The Pandas library in Python provides the capability to change the frequency of your time series data.
In this tutorial, you will discover how to use Pandas in Python to both increase and decrease the sampling frequency of time series data.
After completing this tutorial, you will know:
 About time series resampling, the two types of resampling, and the 2 main reasons why you need to use them.
 How to use Pandas to downsample time series data to a higher frequency and interpolate the new observations.
 How to use Pandas to upsample time series data to a lower frequency and summarize the higher frequency observations.
Let’s get started.
Resampling
Resampling involves changing the frequency of your time series observations.
Two types of resampling are:
 Downsampling: Where you increase the frequency of the samples, such as from minutes to seconds.
 Upsampling: Where you decrease the frequency of the samples, such as from days to months.
In both cases, data must be invented.
In the case of downsampling, care may be needed in determining how the finegrained observations are calculated using interpolation. In the case of upsampling, care may be needed in selecting the summary statistics used to calculate the new aggregated values.
There are perhaps two main reasons why you may be interested in resampling your time series data:
 Problem Framing: Resampling may be required if your data is available at the same frequency that you want to make predictions.
 Feature Engineering: Resampling can also be used to provide additional structure or insight into the learning problem for supervised learning models.
There is a lot of overlap between these two cases.
For example, you may have daily data and want to predict a monthly problem. You could use the daily data directly or you could upsample it to monthly data and develop your model.
A feature engineering perspective may use observations and summaries of observations from both time scales and more in developing a model.
Let’s make resampling more concrete by looking at a real dataset and some examples.
Shampoo Sales Dataset
This dataset describes the monthly number of sales of shampoo over a 3 year period.
The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).
Below is a sample of the first 5 rows of data, including the header row.

“Month”,”Sales” “101”,266.0 “102”,145.9 “103”,183.1 “104”,119.3 “105”,180.3 
Below is a plot of the entire dataset taken from Data Market.
The dataset shows an increasing trend and possibly some seasonal components.
Download and learn more about the dataset here.
Load the Shampoo Sales Dataset
Download the dataset and place it in the current working directory with the filename “shampoosales.csv“.
The timestamps in the dataset do not have an absolute year, but do have a month. We can write a custom date parsing function to load this dataset and pick an arbitrary year, such as 1900, to baseline the years from.
Below is a snippet of code to load the Shampoo Sales dataset using the custom date parsing function from read_csv().

from pandas import read_csv from pandas import datetime from matplotlib import pyplot
def parser(x): return datetime.strptime(‘190’+x, ‘%Y%m’)
series = read_csv(‘shampoosales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) print(series.head()) series.plot() pyplot.show() 
Running this example loads the dataset and prints the first 5 rows. This shows the correct handling of the dates, baselined from 1900.

Month 19010101 266.0 19010201 145.9 19010301 183.1 19010401 119.3 19010501 180.3 Name: Sales of shampoo over a three year period, dtype: float64 
We also get a plot of the dataset, showing the rising trend in sales from month to month.
Downsample Shampoo Sales
The observations in the Shampoo Sales are monthly.
Imagine we wanted daily sales information. We would have to downsample the frequency from monthly to daily and use an interpolation scheme to fill in the new daily frequency.
The Pandas library provides a function called resample() on the Series and DataFrame objects. This can be used to group records when upsampling and making space for new observations when downsampling.
We can use this function to transform our monthly dataset into a daily dataset by calling resampling and specifying the preferred frequency of calendar day frequency or “D”.
Pandas is clever and you could just as easily specify the frequency as “1D” or even something domain specific, such as “5D.” See the further reading section at the end of the tutorial for the list of aliases that you can use.

from pandas import read_csv from pandas import datetime
def parser(x): return datetime.strptime(‘190’+x, ‘%Y%m’)
series = read_csv(‘shampoosales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) downsampled = series.resample(‘D’).mean() print(downsampled.head(32)) 
Running this example prints the first 32 rows of the downsampled dataset, showing each day of January and the first day of February.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Month 19010101 266.0 19010102 NaN 19010103 NaN 19010104 NaN 19010105 NaN 19010106 NaN 19010107 NaN 19010108 NaN 19010109 NaN 19010110 NaN 19010111 NaN 19010112 NaN 19010113 NaN 19010114 NaN 19010115 NaN 19010116 NaN 19010117 NaN 19010118 NaN 19010119 NaN 19010120 NaN 19010121 NaN 19010122 NaN 19010123 NaN 19010124 NaN 19010125 NaN 19010126 NaN 19010127 NaN 19010128 NaN 19010129 NaN 19010130 NaN 19010131 NaN 19010201 145.9 
We can see that the resample() function has created the rows by putting NaN values in the new values. We can see we still have the sales volume on the first of January and February from the original data.
Next, we can interpolate the missing values at this new frequency.
The Series Pandas object provides an interpolate() function to interpolate missing values, and there is a nice selection of simple and more complex interpolation functions. You may have domain knowledge to help choose how values are to be interpolated.
A good starting point is to use a linear interpolation. This draws a straight line between available data, in this case on the first of the month, and fills in values at the chosen frequency from this line.

from pandas import read_csv from pandas import datetime
def parser(x): return datetime.strptime(‘190’+x, ‘%Y%m’)
series = read_csv(‘shampoosales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) downsampled = series.resample(‘D’).mean() interpolated = downsampled.interpolate(method=‘linear’) print(interpolated.head(32)) 
Running this example, we can see interpolated values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Month 19010101 266.000000 19010102 262.125806 19010103 258.251613 19010104 254.377419 19010105 250.503226 19010106 246.629032 19010107 242.754839 19010108 238.880645 19010109 235.006452 19010110 231.132258 19010111 227.258065 19010112 223.383871 19010113 219.509677 19010114 215.635484 19010115 211.761290 19010116 207.887097 19010117 204.012903 19010118 200.138710 19010119 196.264516 19010120 192.390323 19010121 188.516129 19010122 184.641935 19010123 180.767742 19010124 176.893548 19010125 173.019355 19010126 169.145161 19010127 165.270968 19010128 161.396774 19010129 157.522581 19010130 153.648387 19010131 149.774194 19010201 145.900000 
Looking at a line plot, we see no difference from plotting the original data as the plot already interpolated the values between points to draw the line.
Another common interpolation method is to use a polynomial or a spline to connect the values.
This creates more curves and can look more natural on many datasets. Using a spline interpolation requires you specify the order (number of terms in the polynomial); in this case, an order of 2 is just fine.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot
def parser(x): return datetime.strptime(‘190’+x, ‘%Y%m’)
series = read_csv(‘shampoosales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) downsampled = series.resample(‘D’).mean() interpolated = downsampled.interpolate(method=‘spline’, order=2) print(interpolated.head(32)) interpolated.plot() pyplot.show() 
Running the example, we can first review the raw inteprolated values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Month 19010101 266.000000 19010102 258.630160 19010103 251.560886 19010104 244.720748 19010105 238.109746 19010106 231.727880 19010107 225.575149 19010108 219.651553 19010109 213.957094 19010110 208.491770 19010111 203.255582 19010112 198.248529 19010113 193.470612 19010114 188.921831 19010115 184.602185 19010116 180.511676 19010117 176.650301 19010118 173.018063 19010119 169.614960 19010120 166.440993 19010121 163.496161 19010122 160.780465 19010123 158.293905 19010124 156.036481 19010125 154.008192 19010126 152.209039 19010127 150.639021 19010128 149.298139 19010129 148.186393 19010130 147.303783 19010131 146.650308 19010201 145.900000 
Reviewing the line plot, we can see the more natural curves on the interpolated values.
Generally, interpolation is a useful tool when you have missing observations.
Next, we will consider resampling in the other direction and decreasing the frequency of observations.
Upsample Shampoo Sales
The sales data is monthly, but perhaps we would prefer the data to be quarterly.
The year can be divided into 4 business quarters, 3 months a piece.
Instead of creating new rows between existing observations, the resample() function in Pandas will group all observations by the new frequency.
We could use an alias like “3M” to create groups of 3 months, but this might have trouble if our observations did not start in January, April, July, or October. Pandas does have a quarteraware alias of “Q” that we can use for this purpose.
We must now decide how to create a new quarterly value from each group of 3 records. A good starting point is to calculate the average monthly sales numbers for the quarter. For this, we can use the mean() function.
Putting this all together, we get the following code example.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot
def parser(x): return datetime.strptime(‘190’+x, ‘%Y%m’)
series = read_csv(‘shampoosales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) resample = series.resample(‘Q’) quarterly_mean_sales = resample.mean() print(quarterly_mean_sales.head()) quarterly_mean_sales.plot() pyplot.show() 
Running the example prints the first 5 rows of the quarterly data.

Month 19010331 198.333333 19010630 156.033333 19010930 216.366667 19011231 215.100000 19020331 184.633333 Freq: QDEC, Name: Sales, dtype: float64 
We also plot the quarterly data, showing Q1Q4 across the 3 years of original observations.
Perhaps we want to go further and turn the monthly data into yearly data, and perhaps later use that to model the following year.
We can upsample the data using the alias “A” for yearend frequency and this time use sum to calculate the total sales each year.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot
def parser(x): return datetime.strptime(‘190’+x, ‘%Y%m’)
series = read_csv(‘shampoosales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) resample = series.resample(‘A’) quarterly_mean_sales = resample.sum() print(quarterly_mean_sales.head()) quarterly_mean_sales.plot() pyplot.show() 
Running the example shows the 3 records for the 3 years of observations.
We also get a plot, correctly showing the year along the xaxis and the total number of sales per year along the yaxis.
Further Reading
This section provides links and further reading for the Pandas functions used in this tutorial.
Summary
In this tutorial, you discovered how to resample your time series data using Pandas in Python.
Specifically, you learned:
 About time series resampling and the difference and reasons between downsampling and upsampling observation frequencies.
 How to downsample time series data using Pandas and how to use different interpolation schemes.
 How to upsample time series data using Pandas and how to summarize grouped data.
Do you have any questions about resampling or interpolating time series data or about this tutorial?
Ask your questions in the comments and I will do my best to answer them.