A key component of time series data is times and dates, and Python offers robust tools for effective manipulation. This article will provide a basic exploration of the different tools you have available for those purposes such as indexing, frequency adjustments, parsing dates, and more.

Essential Libraries

Let’s import the essential libraries: pandas for data manipulation and datetime for efficient date and time handling.

Python
import pandas as pd
from datetime import datetime

Creating Dynamic Date Ranges

Generate date ranges effortlessly using the date_range function in pandas. This function yields a fixed-frequency DatetimeIndex, allowing you to define the date range by specifying the start and end parameters. Optionally, you can set the frequency using the freq parameter.

Python
dates = pd.date_range(start='01/01/2020', end='01/01/2024', freq='D')
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10',
               ...
               '2023-12-23', '2023-12-24', '2023-12-25', '2023-12-26',
               '2023-12-27', '2023-12-28', '2023-12-29', '2023-12-30',
               '2023-12-31', '2024-01-01'],
              dtype='datetime64[ns]', length=1462, freq='D')

Dates as DataFrame Index

In time series analysis, leveraging dates as the DataFrame index streamlines data manipulation. Achieve this by setting the date column as the index using the set_index method.

Python
df = pd.DataFrame(dates, columns=['date'])
df = df.set_index('date')

Frequency Adjustments

Pandas allows you to modify the frequency of your time series data with the asfreq method. For instance, convert daily data to monthly data by providing the desired frequency string (‘M’ for month end frequency). Bear in mind that the dataframe must have a datetime index.

Python
df.asfreq('M')

Parsing Dates

When reading datasets, dates might not be in the correct format. Utilize the to_datetime function in pandas to convert the date column to the datetime format. It’s important to specify the format of the date string to ensure accurate parsing.

Python
# Assuming the date column is in the format 'YYYY-MM-DD'
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

By providing the format parameter ('%Y-%m-%d' in this example), you guide pandas on how to interpret the date string correctly.

If you don’t provide the format argument, the function will attempt to infer the datetime format automatically. While pandas is usually quite effective at recognizing common date formats, it might struggle with unconventional or ambiguous date representations.

Leaving out the format argument may also result in slower processing times, especially for large datasets, as the function has to perform additional checks to determine the date format. In some cases, automatic inference may not be accurate, leading to incorrect datetime conversions.

Besides the format argument, there are other interesting parameters you can use with the to_datetime function:

  • errors: This parameter determines how errors during the conversion process should be handled. Options include ‘raise’ (default), ‘coerce’ (to replace errors with NaT), and ‘ignore’ (to skip errors).
Python
df['date'] = pd.to_datetime(df['date'], errors='coerce')

  • infer_datetime_format: Setting this parameter to True attempts to infer the datetime format based on the first non-null element in the column, potentially improving performance.
Python
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

Including these considerations and parameter options enhances the flexibility and robustness of your code when working with datetime conversions in pandas.

Consistent parsing is essential for maintaining data integrity in time series analyses.

Resampling Time Series Data

Resampling involves adjusting the frequency of time series observations. Two common types are:

  1. Upsampling: Increase sample frequency (e.g., from minutes to seconds).
  2. Downsampling: Decrease sample frequency (e.g., from days to months).

Let’s see an example. We have daily sales data like this:

Both methods involve inventing data.

The first method, upsampling will increase the frequency, in our case, we will convert daily data to data with a 12-hours frequency. Missing values will appear, that’s why we need to choose an aggregation function. We decided that the missing values would be replaced by the previous value. You could also use interpolation, backfill, etc. It all depends on your data.

Python
# Upsampling
df.resample('12H').ffill()

Downsampling achieves the opposite, it moves to a lower frequency. In our case, we will convert daily data to monthly data. We will aggregate results by the summation function, but the mean could be also a good option. Again, this depends on your problem and data.

Python
# Downsampling
df.resample('M').sum()

Remember that after resampling you need to aggregate the data, two common functions are the mean and the sum, depending on your specific case.

Durations

You can’t simply add or subtract a number of days, hours to your datetime column or index. First, you need to convert it to timedelta. This represents the duration or difference between two dates or times, and it is essential to let pandas know if the number is a day, hour, month, etc. It is really useful to add or subtract time from a datetime object.

Python
from datetime import timedelta

# create a timedelta of 7 days
delta = timedelta(days=7)

date = datetime.now()

# add 7 days to the current date
new_date = date + delta
print(new_date)

# subtract 7 days from the current date
old_date = date - delta
print(old_date)
2024-02-09 07:07:09.175455
2024-01-26 07:07:09.175455

We can do the same with the index of our dataframe, and add 7 days to each date:

Python
# delta taken from the previous snippet
df.index = df.index + delta

This article equips you with the skills to do some basic date manipulation operations in Python. In the upcoming weeks, we’ll deal with more complex operations.

Categories: Time Series

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *