Numerous countries across the globe gear up for Christmas celebrations, and what better way to celebrate it than with a festive Data Science project?

Let’s forecast the popularity of the “All I Want for Christmas” search by Mariah Carey on YouTube in the upcoming weeks.

We can get the data from Google Trends. We will use data from the last 5 years. This data comes in weekly periods, so a year of data will consist of 52 samples.

This is what the data looks like. The first thing we observe is that the popularity column has some non-numeric values to show that on a week the value was greater than zero but lower than one. We can simply replace those values with zero:

Python
df.popularity.replace('<1', 0, inplace=True)

Let’s check for missing values:

Python
df.isna().sum()

We have no missing values! That’s great, we can continue.

Let’s check for the data types of our features:

Python
df.dtypes

Both of our features are object types… We need to convert week to datetime and popularity to numeric (integer or float):

Python
df['week'] = pd.to_datetime(df['week'], format='%Y-%m-%d')
df['popularity'] = df['popularity'].astype(int)

We will also set the index of the dataframe as the week:

df = df.set_index('week')

Let’s finally visualize the data:

Python
df.popularity.plot(figsize=(12, 5))
plt.grid(True, alpha=0.5)
plt.xlabel('Date')
plt.ylabel('Popularity')
plt.show()

We will use autoARIMA to train our SARIMA model. Why SARIMA? We can see that there is a clear seasonal component in our data!

Python
# Install the pmdarima if you don't have it
!pip install pmdarima==2.0.3

# Import the library
from pmdarima.arima import auto_arima

Before proceeding let’s split our data into train and test sets:

Python
samples_train = int(df.shape[0] * 0.9)
train = df.iloc[:samples_train]
test = df.iloc[samples_train:]

We have weekly data. From the previous graph, we can observe annual seasonality. Since a year has 52 weeks, we will select the seasonal period m as 52. Let’s train a SARIMA model using the train set.

You can check the full article in our newsletter:

Issue #40 – All I Want for Christmas is SARIMA
Welcome to a Christmas Special issue! Numerous countries across the globe gear up for Christmas celebrations, and what better way to celebrate it than with a festive Data Science project? Let’s forecast the popularity of the “All I Want for Christmas” search by Mariah Carey on YouTube in the upcoming weeks.
mlpills.substack.com

This was the final prediction:

That means that during this Christmas we can expect a popularity peak of around 60%! It seems like “All I Want for Christmas“‘ popularity has been decreasing year by year, right?

Enjoy it and Merry Christmas!

Categories: Time Series

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *